pscan/ 0000755 0001751 0001751 00000000000 11755206111 010603 5 ustar clap clap pscan/example_matrix_file.wil 0000644 0001751 0001751 00000000773 11153214401 015336 0 ustar clap clap >NFY
34 16 7 58 51 0 2 112 116 0 14 66 13 39 36 25
37 33 51 14 4 116 113 0 0 1 65 6 20 43 9 35
27 26 25 41 56 0 1 1 0 0 33 42 73 22 47 29
18 41 33 3 5 0 0 3 0 115 4 2 10 12 24 27
>TBP
61 16 352 3 354 268 360 222 155 56 83 82 82 68 77
145 46 0 10 0 0 3 2 44 135 147 127 118 107 101
152 18 2 2 5 0 10 44 157 150 128 128 128 139 140
31 309 35 374 30 121 6 121 33 48 31 52 61 75 71 pscan/REFERENCE.txt 0000644 0001751 0001751 00000000434 11755204203 012643 0 ustar clap clap If you find Pscan useful in your research please cite our paper:
F.Zambelli, G.Pesole, G.Pavesi
Pscan: Finding Over-represented Transcription Factor Binding Site Motifs in Sequences from Co-Regulated or Co-Expressed Genes.
Nucleic Acids Research 2009 37(Web Server issue):W247-W252.
pscan/HELP.txt 0000644 0001751 0001751 00000012545 11755204153 012107 0 ustar clap clap SYNOPSIS
pscan -q multifastafile -p multifastafile [options]
pscan -p multifastafile [options]
pscan -q multifastafile -M matrixfile [options]
OPTIONS
[-q file] | Specify the multifasta file containing the foreground sequences.
[-p file] | Specify the multifasta file containing the background sequences.
[-m file] | Use it if the background data are already available in a file (see -g option).
[-M file] | Scan the foreground sequences using only the Jaspar/Transfac matrix file contained in the specified file.
[-l file] | Use the matrices contained in that file (for matrix file format see below).
[-N name] | Use only the matrix with that name (usable only in association with -l).
[-ss] | Perform single strand only analysis.
[-rs] | Perform single strand only analysis on the reverse strand.
[-split num1 num2] | Sequences are scanned only from position num1 and for num2 nucleotides.
[-trashn] | Discards sequences containing "N".
[-n] | Oligos containing "N" will not be discarded. Instead a "N" will obtain an "average" score.
[-g] | If a background sequences file is used than a file will be written containing the data calculated
for that background sequences and the current set of matrices.
From now on one can use that file (-m option) instead of the sequences file for faster processing.
[-ui file] | An index of the background file will be used to avoid duplicated sequences.
[-bi] | Build an index of the background sequences file (to be used later with the -ui option).
This is useful when you have duplicated sequences in your background that may introduce a bias in your results.
[-h] | Display this help.
NOTES
The sequences to be used with Pscan have to be promoter sequences.
To obtain meaningful results it's critical that the background and the foreground sequences are consistent between them either in size
and in position (with respect to the transcription start site). For optimal results the foreground set should be a subset of the background set.
If the "-l" option is not used Pscan will try to find Jaspar/Transfac matrix files in the current folder.
Jaspar files have ".pfm" extension while Transfac ones have ".pro" extension.
If Jaspar matrix files are used than a file called "matrix_list.txt" must be present in the same folder.
That file contains required info about the matrices in the ".pfm" files.
For info on how Pscan works pleare refer to the paper.
EXAMPLES
1) pscan -p human_450_50.fasta -bi
This command will scan the file "human_450_50.fasta" using the matrices in the current folder.
It is handy to use that command the first time one uses a set of matrices with a given background sequences file.
A file called human_450_50.short_matrix will be written and it can be used from now on every time you want to use
the same background sequences with the same set of matrices. A file called human_450_50.index will be written too
and it will be useful every time you will use the same background file.
2) pscan -q human_nfy_targets.fasta -m human_450_50.short_matrix -ui human_450_50.index
This command will scan the file human_nfy_targets.fasta searching for over-represented binding sites (with respect
to the preprocessed background contained in the "human_450_50.short_matrix" file) using the matrices in the current folder.
Please note that the query file "human_nfy_targets.fasta" must be a subset of the sequences contained in the background file "human_450_50.fasta"
in order to use the index file with the "-ui" option. This means that both the sequences and their FASTA headers used in the query file must appear
in the background file as well. Using the "-ui" option when the sequences contained in the query file are not a subset of the background file will
have undefined/unpredictable outcomes.
The output will be a file called "human_nfy_targets.fasta.res" where you will find all the used matrices sorted by ascending P-value.
The lower the P-value obtained by a matrix, the higher are the chances that the transcription factor associated to that matrix
is a regulator of the input promoter sequences.
The fields of the output are the following: "Transcription Factor Name", "Matrix ID", "Z Score", "Pvalue", "Foreground Average", "Background Average".
3) pscan -q human_nfy_targets.fasta -M MA0108.pfm
This command will scan the sequences file "human_nfy_targets.fasta" using the matrix contained in "MA0108.pfm".
The result will be written in a file called "human_nfy_targets.fasta.ris" where you will find the sequences in input
sorted by a descending score (between 1 and 0). The higher the score, the better is the oligo found with respect to the used matrix.
The fields of the output are the following: "Sequence Header", "Score", "Position from the end of sequence", "Oligo that obtained the score",
"Strand where the oligo was found".
4) pscan -p human_450_50.fasta -bi -l matrixfile.wil
This command is like Example #1 with the difference that the matrices set to be used is the one contained in the "matrixfile.wil" file.
Please look at the "example_matrix_file.wil" file included in this Pscan distribution to see the correct format for matrices file.
5) pscan -q human_nfy_targets.fasta -l matrixfile.wil -N MATRIX1
This command is like Example #3 but it will use the matrix called "MATRIX1" contained in the "matrixfile.wil" file.
pscan/pscan.cpp 0000644 0001751 0001751 00000376056 12341335502 012433 0 ustar clap clap // Pscan 1.2.2
// Copyright (C) 2009 Federico Zambelli and Giulio Pavesi
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program. If not, see .
//
// If you find Pscan useful in your research please cite our paper:
//
// F.Zambelli, G.Pesole, G.Pavesi
// Pscan: Finding Over-represented Transcription Factor Binding Site Motifs in Sequences from Co-Regulated or Co-Expressed Genes.
// Nucleic Acids Research 2009 37(Web Server issue):W247-W252.
#include
#include
#include
#include
#include
#include
#include
#include