Programs-5.1.1/0000755000175000001560000000000012175673303012161 5ustar bneronsisPrograms-5.1.1/pscan.xml0000644000175000001560000001331312072525233014002 0ustar bneronsis pscan EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net pscan Scans protein sequence(s) with fingerprints from the PRINTS database http://bioweb2.pasteur.fr/docs/EMBOSS/pscan.html http://emboss.sourceforge.net/docs/themes sequence:protein:motifs pscan e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_emin Minimum number of elements per fingerprint (value from 1 to 20) Integer 2 ("", " -emin=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 Value less than or equal to 20 is required value <= 20 2 e_emax Maximum number of elements per fingerprint (value less than or equal to 20) Integer 20 ("", " -emax=" + str(value))[value is not None and value!=vdef] Value less than or equal to 20 is required value <= 20 3 e_output Output section e_outfile Name of the output file (e_outfile) Filename pscan.e_outfile ("" , " -outfile=" + str(value))[value is not None] 4 e_outfile_out outfile_out option PscanReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/rnacofold.xml0000644000175000001560000005025411672710655014663 0ustar bneronsis rnacofold RNAcofold Calculate secondary structures of two RNAs with dimerization Ivo L Hofacker, Peter F Stadler, Stephan Bernhart I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994) Fast Folding and Comparison of RNA Secondary Structures. Monatshefte f. Chemie 125: 167-188 S.H.Bernhart, Ch. Flamm, P.F. Stadler, I.L. Hofacker Partition Function and Base Pairing Probabilities of RNA Heterodimers Algorithms Mol. Biol. (2006) D.H. Mathews, J. Sabina, M. Zuker and H. Turner "Expanded Sequence Dependence of Thermodynamic Parameters Provides Robust Prediction of RNA Secondary Structure" JMB, 288, pp 911-940, 1999 http://www.tbi.univie.ac.at/RNA/RNAcofold.html RNAcofold works much like RNAfold, but allows to specify two RNA sequences wich are then allowed to form a dimer structure. RNA sequences are read from infile in the usual format, i.e. each line of file corresponds to one sequence, except for lines starting with ">" which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. RNAcofold can compute minimum free energy (mfe) structures, as well as partition function (pf) and base pairing probability matrix (using the -p switch). Since dimer formation is concentration dependent, RNAcofold can be used to compute equilibrium concentrations for all five monomer and (homo/hetero)-dimer species, given input concentrations for the monomers. Output consists of the mfe structure in bracket notation as well as PostScript structure plots and "dot plot" files containing the pair probabilities, see the RNAfold man page for details. In the dot plots a cross marks the chain break between the two concatenated sequences. sequence:nucleic:2D_structure structure:2D_structure RNAcofold seq RNA Sequence File Nucleic CofoldSequence AbstractText " < $value" " < " + str(value) 1000 Each line of file corresponds to one sequence, except for lines starting with ">" which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. >Seq1 ACGAUCAGAGAUCAGAGCAUACGACAGCAG&ACGAAAAAAAGAGCAUACGACAGCAG >seq2 AAAAAAAAAAAAAAAAAAAAAAAAAAA&UUUUUUUUUUUUUUUUUUUUUUUUUUUUU control Control options 2 partition_free Compute the partition function and free energies (-a) Boolean 0 ($value)? " -a" : "" ( "" , " -a" )[ value ] Compute the partition function and free energies not only of the hetero-dimer consisting of the two input sequences (the "AB dimer"), but also of the homodimers AA and BB as well as A and B monomers. The output will contain the free energies for each of these species, as well as 5 dot plots containing the conditional pair probabilities, called ABname5.ps, AAname5.ps and so on. For later use, these dot plot files also contain the free energy of the ensemble as a comment. Using -a automatically toggles the -p option. equilibrium Compute the expected equilibrium concentrations of AB, AA, BB, A, B. (-c) Boolean 0 ($value)? " -c" : "" ( "" , " -c" )[ value ] In addition to everything listed under the -a option, read in initial monomer concentrations and compute the expected equilibrium concentrations of the 5 possible species (AB, AA, BB, A, B). Start concentrations are read from stdin (unless the -f option is used) in [mol/l], equilibrium concentrations are given realtive to the sum of the two inputs. An arbitrary number of initial concentrations can be specified (one pair of concentrations per line). partition Calculate the partition function and base pairing probability matrix (-p) Boolean 0 ($value)? " -p" : "" ( "" , " -p" )[ value ] Calculate the partition function and base pairing probability matrix in addition to the mfe structure. Default is calculation of mfe structure only. Prints a coarse representation of the pair probabilities in form of a pseudo bracket notation, the ensemble free energy, the frequency of the mfe structure, and the structural diversity. See the description of pf_fold() and mean_bp_dist() in the RNAlib documentation for details. Note that unless you also specify -d2 or -d0, the partition function and mfe calculations will use a slightly different energy model. See the dis- cussion of dangling end options below. pf Calculate the pf without pairing matrix (-p0) Boolean 0 ($value)? " -p0" : "" ( "" , " -p0" )[ value ] Calculate the partition function but not the pair probabilities, saving about 50% in runtime. Prints the ensemble free energy -kT ln(Z). temperature Rescale energy parameters to a temperature of temp C. (-T) Float 37.0 (defined $value and $value != $vdef)? " -T $value" : "" ( "" , " -T " + str(value) )[ value is not None and value != vdef] tetraloops Do not include special stabilizing energies for certain tetraloops (-4) Boolean 0 ($value)? " -4" : "" ( "" , " -4" )[ value ] dangling How to treat dangling end energies for bases adjacent to helices in free ends and multiloops (-d) Choice -d1 -d1 -d -d2 (defined $value and $value ne $vdef)? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] How to treat 'dangling end' energies for bases adjacent to helices in free ends and multiloops: Normally only unpaired bases can participate in at most one dangling end. With -d2 this check is ignored, this is the default for partition function folding (-p). -d ignores dangling ends altogether. Note that by default pf and mfe folding treat dangling ends differently, use -d2 (or -d) in addition to -p to ensure that both algorithms use the same energy model. The -d2 options is available for RNAfold, RNAeval, and RNAinverse only. scale Use scale*mfe as an estimate for the free energy (-S) Integer (defined $value)? " -S $value" : "" ( "" , " -S " + str(value) )[ value is not None ] In the calculation of the pf use scale*mfe as an estimate for the ensemble free energy (used to avoid overflows). The default is 1.07, usefull values are 1.0 to 1.2. Occasionally needed for long sequences. input Input parameters 2 constraints Calculate structures subject to constraints (-C) Boolean 0 ($value)? " -C" : "" ( "" , " -C" )[ value ] The programm reads first the sequence then the a string containg constraints on the structure encoded with the symbols: | (the corresponding base has to be paired x (the base is unpaired) < (base i is paired with a base j>i) > (base i is paired with a base j<i) matching brackets ( ) (base i pairs base j) Pf folding ignores constraints of type '|' '<' and '>', but disallow all pairs conflicting with a constraint of type 'x' or '( )'. This is usually sufficient to enforce the constraint. >Seq1 ACGAUCAGAGAUCAGAGCAUACGACAGCAG&ACGAAAAAAAGAGCAUACGACAGCAG .....................|||||||||||||||xxxxxxxxx............ >seq2 AAAAAAAAAAAAAAAAAAAAAAAAAAA&UUUUUUUUUUUUUUUUUUUUUUUUUUUUU |||||||||...........XXXXXXXXXXX.......||||||||||||||||||| >seq3 GGGGGGGGGGGGGGGG&CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCUUUUUUUUUU >>>>>>>>>>>...............<<<<<<<<<<xxxxxx............... noLP Avoid lonely pairs (helices of length 1) (-noLP) Boolean 0 ($value)? " -noLP" : "" ( "" , " -noLP" )[ value ] Produce structures without lonely pairs (helices of length 1). For partition function folding this only disallows pairs that can only occur isolated. Other pairs may still occasionally occur as helices of length 1. noGU Do not allow GU pairs (-noGU) Boolean 0 ($value)? " -noGU" : "" ( "" , " -noGU" )[ value ] noCloseGU Do not allow GU pairs at the end of helices (-noCloseGU) Boolean 0 ($value)? " -noCloseGU" : "" ( "" , " -noCloseGU" )[ value ] nsp Non standard pairs (comma seperated list) (-nsp) String (defined $value)? " -nsp $value" : "" ( "" , " -nsp " + str(value) )[ value is not None ] Allow other pairs in addition to the usual AU,GC,and GU pairs. pairs is a comma seperated list of additionally allowed pairs. If a the first character is a '-' then AB will imply that AB and BA are allowed pairs. e.g. RNAfold -nsp -GA will allow GA and AG pairs. Nonstandard pairs are given 0 stacking energy. parameter Energy parameter file (-P) EnergyParameterFile AbstractText (defined $value)? " -P $value" : "" ( "" , " -P " + str(value) )[ value is not None ]

Read energy parameters from paramfile, instead of using the default parameter set. ( documentation for details on the file format.)

output_options Output options no_ps Do not produce postscript drawing of the mfe structure. Boolean 0 ($value)? " -noPS ": "" ( "" , " -noPS ")[ value ] psfiles Poscript output file PostScript Binary "*.ps" "*.ps"
Programs-5.1.1/detect_cnv.xml0000644000175000001560000015332111767572177015043 0ustar bneronsis detect_cnv detect_cnv Copy Number Variation (CNV) detection from signal intensity data (Illumina file) genetics:detection kcolumn.pl preprocess Preprocessing of Illumina intensity file for Copy Number Variation (CNV) detection infile Input signal intensity file SignalIntensity AbstractText " $value" " " + str(value) The input signal intensity file is a text file that contains information for one marker per line, and all fields in each line are tab-delimited The first line of the file specifies the meaning for each tab-delimited column. For example, there are six fields in each line in the file, corresponding to SNP name, chromosome, position, genotype, Log R Ratio (LRR) and B Allele Frequency (BAF), respectively. The CNV calling only requires the SNP Name, LRR and BAF values. Note that the relative position of LRR and BAF is different from the previous file; again the header line tells the program that the second column represents BAF values, yet the third column is LRR values. 1 split Number of tab-delimited column (field) per individual genotyping in intensity file Integer " split $value -heading 3 --name_by_header -tab -out $filename" " split " + str(value) + " -heading 3 --name_by_header -tab --out "+ str(infile) 2 output_split_file Split files Split AbstractText defined $infile infile is not None "$infile.*" infile + ".*" analyse Penncnv analyses 10 type Analyse type without cnv calls file Choice null null --test --joint --validate --summary (defined $value and $value ne $vdef) ? " && detect_cnv.pl $value " : "" ( "" , " && detect_cnv.pl " + str(value) )[ value is not None and value !=vdef] Can not handle both "Analyse type WITH and WITHOUT cnv calls file at the same time (defined $value and not defined $rawcnv) or (not defined $value and defined $rawcnv) (value is not None and rawcnv is None) or (value is None and rawcnv is not None) --test: test a signal intensity file to generate CNV calls. --joint: New in July 2008: generate CNV calls for a father-mother-off- spring trio via a one-step procedure. It is considerably slower than the --trio argument, but generates more accurate CNV calls with reduced false negative rates in simulation studies. --summary: generate summary statistics on signal quality for each input file. Usually the summary is provided when calling CNVs and can be written to a log file via the --log argument; however, some- times users forget to use --log, such that the signal quality information is lost. The --summary argument can calculate the signal quality again quickly without calling CNVs. rawcnv Analyse type with cnv calls file Choice null null --trio --quartet --cctest --exclude_heterosomic (defined $value and $value ne $vdef) ? " && detect_cnv.pl $value " : "" ( "" , " && detect_cnv.pl " + str(value) )[ value is not None and value !=vdef] A file containing CNV calls is Mandatory for this parameter (defined $value and defined $cnvfile and not defined $type) or (not defined $value and defined $type) (value is not None and cnvfile is not None and type is None) or (value is None and type is not None) --trio: generate CNV calls for a father-mother-offspring trio, given a CNV file containing calls generated on each individual separately, a HMM model file, a PFB file, and the three signal intensity files. --quartet: jointly generate CNV calls for a father-mother-offspring1-offp- spring2 quartet, given a CNV file containing calls generated on each individual separately, a HMM model file, a PFB file, and the four signal intensity files. --cctest: perform a case-control test on the frequency of having CNVs for each marker within CNVs. A separate phenotype file must be specified via the --phenofile argument for this to work. The actual test is a two-sided Fisher exact test. The --onesided argument can be specified for performing one-sided test, and the --type_filter argument can be specified so that only "dup" or "del" is compared between cases and controls. --exclude_heterosomic: exclude CNV calls from chromosomes showing evidence of heterosomic abberations from a given file containing CNV calls. An purely empirical method is applied in this procedure, although I recommended always manually examine the patterns of BAF to determine whether heterosomic abberation is present in a particular sample, if the sample size is relatively small (<100). 10 infile_name Input file name Filename $type eq '--test' or $type eq '--joint' or $type eq '--validate' or $type eq '--summary' or $rawcnv eq '--trio' or $rawcnv eq '--quartet' or $rawcnv eq '--cctest' or $rawcnv eq '--exclude_heterosomic' type =='--test' or type =='--joint' or type =='--validate' or type =='--summary' or rawcnv =='--trio' or rawcnv =='--quartet' or rawcnv =='--cctest' or rawcnv =='--exclude_heterosomic' " $infile.*" " " + infile + ".*" 18 Specify an output file prefix. cnvfile CNV calls file (cnv) defined $rawcnv rawcnv is not None Cnv AbstractText (defined $value) ? "--cnv $value " : "" ( "" , " --cnv " + str(value) )[ value is not None] A file containing CNV calls, that could be generated by the -test operation of this program: trio, quartet, exclude_heterosomic, cctest 12 out_cnv_filename Output CNV file name Filename $type eq '--test' and $infile type == '--test' and infile " --out $infile_rawcnv" " --out " + infile + "_rawcnv " 12 Specify an output file prefix. By default the output filename starts with "gengen". output_cnv_file CNV file Cnv AbstractText $type eq '--test' and $infile type == '--test' and infile "$infile_rawcnv" infile + "_rawcnv" hmmmodel HMM model (hmm) Choice ($type is not None or $rawcnv is not None) and ($type and $rawcnv and $rawcnv ne '--cctest' and $rawcnv ne '--exclude_heterosomic' (type is not None or rawcnv is not None) and (type != '--summary' and (rawcnv !='--cctest' and rawcnv !='--exclude_heterosomic')) null null hhall.hmm hh550.hmm (defined $value) ? "--hmm $value " : "" ( "" , " --hmm " + str(value) )[ value is not None] 11 Specify a HMM model file containing elements necessary for specifying the hidden Markov model for CNV calling: test, validate, joint, trio, quartet pfb Population frequency for B allelel file (pfb) Choice ($type is not None or $rawcnv is not None) and ($rawcnv ne '--exclude_heterosomic') (type is not None or rawcnv is not None) and (rawcnv !='--exclude_heterosomic') null null hhall.hg18.pfb hh550.hg18.pfb hc12v1.hg18.pfb ho1v1.hg18.pfb (defined $value) ? "--pfb $value " : "" ( "" , " --pfb " + str(value) )[ value is not None] 11 A population frequency of B allele file containing chromosome coordinates of each SNP, as well as the frequency of B allele in a large reference population for this SNP: test, validate, joint, summary, trio, quartet, cctest gcmodel A file containing GC model for wave adjustment (gcmodel) Choice $type eq '--test' or $rawcnv eq '--trio' or $rawcnv eq '--quartet' or $type eq '--joint' or $type eq '--validate' (type == '--test' or type =='--joint' or type =='--validate') or (rawcnv =='--trio' or rawcnv =='--quartet') null null hhall.hg18.gcmodel hh550.hg18.gcmodel hc12v1.hg18.gcmodel ho1v1.hg18.gcmodel (defined $value) ? "--gcmodel $value " : "" ( "" , " --gcmodel " + str(value) )[ value is not None] A file that contains the GC percentage in the 1Mb region around each marker for the GC-model based signal adjustment: test, joint, validate, trio, quartet 12 cnvoutput CNV output format 20 outputformat Output format Choice ($type is not None or $rawcnv is not None) (type is not None or rawcnv is not None) output output bed tab (defined $value and $value ne $vdef) ? " && visualize_cnv.pl --format $value " : "" ( "" , " && visualize_cnv.pl --format %s " % str(value) )[ value is not None and value !=vdef] bed_cnv_file CNV calls file Cnv AbstractText $type eq '--test' and $infile and $outputformat ne 'output' type == '--test' and infile and outputformat != 'output' "$infile.$outputformatcnv" infile + "_" + outputformat + "cnv" bed_file Output file Cnv AbstractText ($type is not None or $rawcnv is not None) and $type ne '--test' and $outputformat ne 'output' (type is not None or rawcnv is not None) and type != '--test' and outputformat != 'output' "$infile.$outputformat" infile + "_" + outputformat bed_cnv_infile Output file name for visualize_cnv (.rawcnv) Filename $type eq '--test' and $infile and $outputformat ne 'output' type == '--test' and infile and outputformat != 'output' " --out $infile.$outputformatcnv $infile.rawcnv " " --out " + infile + "_" + outputformat + "cnv " + infile + "_rawcnv " 21 bed_infile Output file name for visualize_cnv (.out) Filename ($type is not None or $rawcnv is not None) and $type eq '--test' and $outputformat ne 'output' (type is not None or rawcnv is not None) and type != '--test' and outputformat != 'output' " --out $infile.$outputformat detect_cnv.out " " --out " + infile + "_" + outputformat + " detect_cnv.out " 21 cnvcontrol CNV output control 12 minsnp Minimum number of SNPs within CNV (minsnp) Integer $type eq '--test' or $rawcnv eq '--trio' or $rawcnv eq '--quartet' or $type eq '--joint' or $type eq '--validate' type =='--test' or rawcnv =='--trio' or rawcnv =='--quartet' or type =='--joint' or type =='--validate' 3 (defined $value and $value != $vdef) ? "--minsnp $value " : "" ( "" , " --minsnp " + str(value) )[ value is not None and value != vdef] The minimum number of SNPs that a CNV call must contain to be in output: test, joint, validate, trio, quartet minlength Minimum length of bp within CNV (minlength) Integer $type eq '--test' or $rawcnv eq '--trio' or $rawcnv eq '--quartet' or $type eq '--joint' or $type eq '--validate' type =='--test' or rawcnv =='--trio' or rawcnv =='--quartet' or type =='--joint' or type =='--validate' (defined $value) ? "--minlength $value " : "" ( "" , " --minlength " + str(value) )[ value is not None ] --minlength argument should be a positive integer $minlength =~ m/^\d+(k|m)?$/i minlength > 0 The minimum length of base pairs that a CNV call must contain to be in output: test, joint, validate, trio, quartet minconf Minimum confidence score of CNV (minconf) Integer $type eq '--test' or $type eq '--validate' type =='--test' or type =='--validate' (defined $value) ? "--minconf $value " : "" ( "" , " --minconf " + str(value) )[ value is not None ] Minimum confidence score for a CNV call to be in output. This is an experimental feature, and the actual definition of "confidence score" may change in the future: test, validate confidence Calculate confidence for each CNV (confidence) Boolean $type eq '--test' or $type eq '--validate' type =='--test' or type =='--validate' 0 ($value) ? "--confidence" : "" ( "" , " --confidence " )[ value ] Calculate a confidence score for each CNV call. This is an experimental feature, and the actual definition of "confidence score" may change in the future: test, validate chrx Use chromosomeX-specific treatment (chrx) Boolean $type eq '--test' or $rawcnv eq '--trio' or $rawcnv eq '--quartet' or $type eq '--joint' or $type eq '--validate' type =='--test' or rawcnv =='--trio' or rawcnv =='--quartet' or type =='--joint' or type =='--validate' 0 ($value) ? "--chrx" : "" ( "" , " --chrx " )[ value ] Process chromosome X specifically. By default only autosomes will be processed by this program: test, joint, validate, trio, quartet. sexfile Filename and sex (male/female) for chromosomeX (sex) Sex AbstractText $chrx and not $bafxhet chrx and not bafxhet (defined $value) ? "--sex $value " : "" ( "" , " --sex " + str(value) )[ value is not None] A 2-column file containing filename and sex (male/female) for sex chromosome calling with chromosomeX-specific (chrx) argument. The first tab- delimited column should be the input signal file name, while the second tab-delimited column should be male or female. Alternatively, abbreviations including m (male), f (female), 1 (male) or 2 (female) are also fine. 12 bafxhet Minimum BAF heterozygosity rate to predict female gender when file is not supplied (bafxhet) Float $chrx and not $sexfile chrx and not sexfile 0.1 (defined $value and $value != $vdef) ? "--bafxhet $value" : "" ( "" , " --bafxhet " + str(value) )[ value is not None and value != vdef] Bafxhet argument should be between 0 and 1 $bafxhet_threshold > 0 and $bafxhet_threshold < 1 bafxhet > 0 and bafxhet_threshold < 1 This argument specifies the BAF heterozygosity rate in chrX to predict the sex for a sample. Note that this rate is based on BAF values so it is not genotype heterozygosity rate and indeed quite different/smaller than that genotype heterozygosity rate. By default if >10% chrX markers have BAF values around 0.5, the sample is predicted as female. This threshold however does not work for Affymetrix genome-wide arrays (instead a 5% threshold is better used). For chrX CNV calling, rather than relying on PennCNV prediction of gender, it is always best to explicitely specify the sample sex using the -sexfile argument. validateCalling Specific Validation-calling arguments (validate) $type eq '--validate' type == '--validate' 12 startsnp Start SNP of a pre-specified region (startsnp) String not candlist not candlist (defined $value) ? "--startsnp $value" : "" ( "" , " --startsnp " + str(value) )[ value is not None ] Specify the start SNP of a pre-specified region used in --validate operation endsnp End SNP of a pre-specified region (endsnp) String not candlist not candlist (defined $value) ? "--endsnp $value" : "" ( "" , " --endsnp " + str(value) )[ value is not None ] Specify the end SNP of a pre-specified region used in --validate operation delfreq Prior deletion frequency of a pre-specified region (delfreq) Float not candlist not candlist (defined $value) ? "--delfreq $value" : "" ( "" , " --delfreq " + str(value) )[ value is not None ] Delfreq must be between 0 and 1 $delfreq < 1 and $delfreq >=0 and (delfrep+dupfreq) <1 delfreq < 1 and delfreq >=0 Specify the prior deletion allele frequency of a pre-specified region used in --validate operation (this frequency can be estimated from CNV calls by --test operation) dupfreq Prior duplication frequency of a pre-specified region (dupfreq) Float not candlist not candlist (defined $value) ? "--dupfreq $value" : "" ( "" , " --dupfreq " + str(value) )[ value is not None ] Must be between 0 and 1 $dupfreq < 1 and $dupfreq >=0 dupfreq < 1 and dupfreq >=0 Specify the prior duplication allele frequency of a pre-specified region used in --validate operation (this frequency can be estimated from CNV calls by --test operation) backfreq Background CNV probability for any loci (backfreq) Float not $delfreq and not $dupfreq not delfreq and not dupfreq 0.0001 (defined $value) ? "--backfreq $value" : "" ( "" , " --backfreq " + str(value) )[ value is not None ] --backfreq argument should be less than 0.5 $backfreq > 0 and $backfreq < 0.5 backfreq > 0 and backfreq < 0.5 Background CNV probability for any loci, with default value as 0.0001. This argument is useful in validation calling. When -delfreq/-dupfreq is not specified, the background frequency is used to calculate the prior probability of different copy number states. candlist A file containing all candidate CNV regions to be validated (candlist) CandidateRegion AbstractText not $startsnp and not $endsnp and not $delfreq and not $dupfreq not startsnp and not endsnp and not delfreq and not dupfreq (defined $value) ? "--candlist $value " : "" ( "" , " --candlist " + str(value) )[ value is not None] cctestCalling Specific Case-control comparison arguments (cctest) $rawcnv eq '--cctest' rawcnv == '--cctest' 12 phenofile A file containing phenotype information for each input file (phenofile) Phenotype AbstractText $rawcnv eq '--cctest' rawcnv == '--cctest' (defined $value) ? "--phenofile $value " : "" ( "" , " --phenofile " + str(value) )[ value is not None] A file containing phenotype informatoin for each individual, so that --cctest can be used to compare the frequency between cases and controls. Each line has two tab-delimited fields: file name and the phenotype. By default, "control" means control subjects, and other words means cases; however, the user can use --control_label argument to change the phenotype label for controls. 12 control_label The phenotype label for control subjects in the phenotype file (control_label) String (defined $value) ? "--control_label $value " : "" ( "" , " --control_label " + str(value) )[ value is not None ] Specify the text label for control subjects in the phenotype file specified by the --phenofile argument. Normally the "control" is used to specify controls, and all other individuals are treated as cases. However, some times users may use 1 to denote controls and 2 to denote cases; in such situations the "--control_label 1" should be used for the --cctest operation. onesided Performed one-sided test (onesided) Boolean 0 ($value) ? "--onesided" : "" ( "" , " --onesided " )[ value ] type_filter Used together to specify types of CNVs to be tested (type_filter) Choice null null dup del (defined $value and $value ne $vdef) ? "--type_filter $value " : "" ( "" , " --type_filter " + str(value) )[ value is not None and value !=vdef ] Specify the particular types of CNVs to be used in the --cctest operation. By default both duplications and deletions are treated as a single group of CNVs and be used to compare cases and controls. miscOpt Misc options 12 fmprior Prior belief on CN state for regions with CNV calls. Six numbers separated by a comma (fmprior) String $rawcnv eq '--trio' or $rawcnv eq '--quartet' rawcnv =='--trio' or rawcnv =='--quartet' (defined $value) ? "--fmprior $value " : "" ( "" , " --fmprior " + str(value) )[ value is not None ] The --fmprior argument should be 6 comma-separated numbers that sum up to 1. $value ~= /\d+(,\d+){5}/ len(value.split(',')) == 6 The prior probability of 6 hidden states a given CNV call in father or mother. This is used for joint calling of trios or quartets. It is specified as six numbers separated by a comma that sum up to 1. The empirically derived default values actually work well: trio, quartet. denovo_rate Prior belief on genome-wide de novo event rate (denovo_rate) Float $rawcnv eq '--trio' or $rawcnv eq '--quartet' rawcnv =='--trio' or rawcnv =='--quartet' 0.0001 (defined $value and $value != $vdef) ? "--denovo_rate $value" : "" ( "" , " --denovo_rate " + str(value) )[ value is not None and value != vdef] Specify the probability that a given CNV is a de novo event for family-based CNV calling. The default is 0.0001. trio, quartet medianadjust Adjust genome-wide LRR such that median equal 0 (nomedianadjust) Boolean $type eq '--test' or $rawcnv eq '--trio' or $rawcnv eq '--quartet' or $type eq '--joint' or $type eq '--validate' type =='--test' or rawcnv =='--trio' or rawcnv =='--quartet' or type =='--joint' or type =='--validate' 1 (not $value) ? "--nomedianadjust" : "" ( "" , " --nomedianadjust " )[ not value ] This option is turned on by default. It adjust the log R Ratio values of the entire genome by a constant so that the median is zero: test, trio, quartet, joint, validate. bafadjust Adjust genome-wide BAF such that median equal 0.5 (nobafadjust) Boolean $type eq '--test' or $rawcnv eq '--trio' or $rawcnv eq '--quartet' or $type eq '--joint' or $type eq '--validate' type =='--test' or rawcnv =='--trio' or rawcnv =='--quartet' or type =='--joint' or type =='--validate' 1 (not $value) ? "--nobafadjust" : "" ( "" , " --nobafadjust " )[ not value ] This option is turned ON by default (new July 2008): it adjust the BAF values genome-wide such that the median value is 0.5. sdadjust Adjust SD of hidden Markov model based on input signal (nosdadjust) Boolean $type eq '--test' or $rawcnv eq '--trio' or $rawcnv eq '--quartet' or $type eq '--joint' or $type eq '--validate' type =='--test' or rawcnv =='--trio' or rawcnv =='--quartet' or type =='--joint' or type =='--validate' 1 (not $value) ? "--nosdadjust" : "" ( "" , " --nosdadjust " )[ not value ] This option is turned ON by default: it adjust the SD values in HMM model such that the model fits the signal quality of the testing sample to reduce false positive calls flush Flush input/output buffer (noflush) Boolean 1 (not $value) ? "--noflush" : "" ( "" , " --noflush " )[ not value ] This argument is turned ON by default. It requires the input/output buffer to flush immediately (that is, no input/output is buffered). When PennCNV is running remotely (for example, through a SSH connection) or when the output is redirected, this argument cause the program to report progress in real-time. When running PennCNV in parallel with many processes accessing disks simultaneously, this option should be turned off to decrease system overhead. outfile_name Outfile for detect_cnv and Input file for visualize_cnv Filename ($type is not None or $rawcnv is not None) and $type ne '--test' (type is not None or rawcnv is not None) and type != '--test' " --out detect_cnv.out " " --out detect_cnv.out " 17 output_file Output file Cnv AbstractText $type ne '--test' type != '--test' "detect_cnv.out" "detect_cnv.out" Programs-5.1.1/clustalO-multialign.xml0000644000175000001560000007732312104230615016633 0ustar bneronsis clustalO-multialign Clustal-Omega: Multiple alignment Align a set of protein sequences alignment:multiple clustalo input Data Input sequences_input Unaligned set of sequences not $alignment_input or ($sequences_input and $alignment_input) not alignment_input or (sequences_input and alignment_input) Protein Sequence FASTA SWISSPROT CODATA NBRF 2,n " --infile=$value" " --infile=" + str( value ) Can not handle both Sequence and Alignment at the same time not $alignment_input not alignment_input Use this option to make a multiple alignment from a set of sequences. A sequence file must contain more than one sequence (at least two sequences) alignment_input Aligned sequences not $sequences_input or ($sequences_input and $alignment_input) not sequences_input or (sequences_input and alignment_input) Protein Alignment FASTA CLUSTAL STOCKHOLM MSF PHYLPI 1 " --infile=$value" " --infile=" + str( value ) Can not handle both Sequence and Alignment at the same time not $sequences_input not sequences_input When the sequences are aligned (all sequences have the same length and at least one sequence has at least one gap), then the alignment is turned into a HMM, the sequences are de-aligned and the now un-aligned sequences are aligned using the HMM as an External Profile for External Profile Alignment (EPA). If no EPA is desired use the dealign Option. Clustal-Omega reads the file of aligned sequences. It converts the alignment into a HMM, de-aligns the sequences and re-aligns them, transferring pseudo-count information to the sequences/profiles during the MSA. The guide tree is constructed using a full distance matrix of Kimura distances. seqtype type of sequences Choice auto auto Protein RNA DNA (defined $value and $value neq $vdef)? " --seqtype=$value" : "" ("", " --seqtype="+str(value))[value is not None and value != vdef] Since version 1.1.0 the Clustal-Omega alignment engine can process DNA/RNA. Clustal-Omega tries to guess the sequence type (protein, DNA/RNA), but this can be over-ruled with this flag. dealign Dealign input sequences $alignment_input bool( alignment_input ) Boolean 0 (defined $value and $value) " --dealign " : "" ( "" , " --dealign ")[ value is not None and value !=vdef ] When the sequences are aligned (all sequences have the same length and at least one sequence has at least one gap), then the alignment is turned into a HMM, the sequences are de-aligned and the now un-aligned sequences are aligned using the HMM as an External Profile for External Profile Alignment (EPA). If no EPA is desired use turn on this option. Clustal-Omega reads the file of aligned sequences. It de-aligns the sequences and then re-aligns them. No HMM is produced in the process, no pseudo-count information is transferred. Consequently, the output must be the same as for unaligned output. hmm-in HMM input files Protein HmmProfile AbstractText HMMER2 HMMER3 1 (defined $value)?" --hmm-in=$value" : "" ( "" , " --hmm-in=" + str( value ))[value is not None ] the un-aligned sequences will be aligned to form a profile, using the HMM as an External Profile. So far only one HMM can be input and only HMMer2 and HMMer3 formats are allowed. The alignment will be written out; the HMM information is discarded. As, at the moment, only one HMM can be used, no HMM is produced if the sequences are already aligned. Use the -i flag in conjunction with the --hmm-in flag for this mode. Multiple HMMs can be inputted, however, in the current version all but the first HMM will be ignored. Use this option to make a new multiple alignment of sequences from the input file and use the HMM as a guide (EPA). Clustal-Omega reads the sequences file and the HMM file (in HMMer2 or HMMer3 format). It then performs the alignment, transferring pseudo-count information contained in hmm to the sequences/profiles during the MSA. clustering Clustering In order to produce a multiple alignment Clustal-Omega requires a guide tree which defines the order in which sequences/profiles are aligned. A guide tree in turn is constructed, based on a distance matrix. Conventionally, this distance matrix is comprised of all the pair-wise distances of the sequences. The distance measure Clustal-Omega uses for pair-wise distances of un-aligned sequences is the k-tuple measure [4], which was also implemented in Clustal 1.83 and ClustalW2 [5,6]. If the sequences inputted via -i are aligned Clustal-Omega uses the Kimura-corrected pairwise aligned identities [7]. The computational effort (time/memory) to calculate and store a full distance matrix grows quadratically with the number of sequences. Clustal-Omega can improve this scalability to N*log(N) by employing a fast clustering algorithm called mBed [2]; this option is automatically invoked (default). If a full distance matrix evaluation is desired, then the --full flag has to be set. The mBed mode calculates a reduced set of pair-wise distances. These distances are used in a k-means algorithm, that clusters at most 100 sequences. For each cluster a full distance matrix is calculated. No full distance matrix (of all input sequences) is calculated in mBed mode. If there are less than 100 sequences in the input, then in effect a full distance matrix is calculated in mBed mode, however, no distance matrix can be outputted (see below). Clustal-Omega uses Muscle's [8] fast UPGMA implementation to construct its guide trees from the distance matrix. By default, the distance matrix is used internally to construct the guide tree and is then discarded. By specifying --distmat-out the internal distance matrix can be written to file. This is only possible in --full mode. The guide trees by default are used internally to guide the multiple alignment and are then discarded. By specifying the --guidetree-out option these internal guide trees can be written out to file. Conversely, the distance calculation and/or guide tree building stage can be skipped, by reading in a pre-calculated distance matrix and/or pre-calculated guide tree. These options are invoked by specifying the --distmat-in and/or --guidetree-in flags, respectively. However, distance matrix reading is disabled in the current version. By default, distance matrix and guide tree files are not over-written, if a file with the specified name already exists. In this case Clustal-Omega aborts during the command-line processing stage. In mBed mode a full distance matrix cannot be outputted, distance matrix output is only possible in --full mode. mBed or --full distance mode do not affect the ability to write out guide-trees. Guide trees can be iterated to refine the alignment (see section ITERATION). Clustal-Omega takes the alignment, that was produced initially and constructs a new distance matrix from this alignment. The distance measure used at this stage is the Kimura distance [7]. By default, Clustal-Omega constructs a reduced distance matrix at this stage using the mBed algorithm, which will then be used to create an improved (iterated) new guide tree. To turn off mBed-like clustering at this stage the --full-iter flag has to be set. While Kimura distances in general are much faster to calculate than k-tuple distances, time and memory requirements still scale quadratically with the number of sequences and --full-iter clustering should only be considered for smaller cases ( << 10,000 sequences). [2] Blackshields G, Sievers F, Shi W, Wilm A, Higgins DG. Sequence embedding for fast construction of guide trees for multiple sequence alignment. Algorithms Mol Biol. 2010 May 14;5:21. [4] Wilbur and Lipman, 1983; PMID 6572363 [5] Thompson JD, Higgins DG, Gibson TJ. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673-4680. [6] Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG. (2007). Clustal W and Clustal X version 2.0. Bioinformatics, 23, 2947-2948. [7] Kimura M (1980). "A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences". Journal of Molecular Evolution 16: 111–120. distmat_out Pairwise distance matrix output file Filename (defined $value and $value)? " --distmat-out=$value ":"" ( "" , " --distmat-out="+str(value))[ value is not None ] the full option must be set $full full guidetree_in Guide tree input file (--guidetree-in) Tree NEWICK (defined $value )? " --guidetree-in= $value" : "" ( "" , " --guidetree-in="+str(value))[ value is not None ] guidetree_out Guide tree output file (--guidetree-out) Filename (defined $value and $value)? " --guidetree-out=$value ":"" ( "" , " --guidetree-out="+str(value))[ value is not None ] full Use full distance matrix for guide-tree calculation (slow; mBed is default) (--full) Boolean 0 (defined $full and $ full)? " --full ": "" ( "" , " --full ")[ value is not None and value ] full_iter Use full distance matrix for guide-tree calculation during iteration (mBed is default) (--full-iter) Boolean 0 (defined $full and $ full)? " --full-iter ": "" ( "" , " --full-iter ")[ value is not None and value ] output_format Alignment Output output_format alignment output format Choice fa fa clustal msf phylip stockholm vienna (defined $value and $value ne $vdef)? " --outfmt=$value" : "" ( "" , " --outfmt=" + value )[ value is not None and value != vdef ] iteration Iteration By default, Clustal-Omega calculates (or reads in) a guide tree and performs a multiple alignment in the order specified by this guide tree. This alignment is then outputted. Clustal-Omega can 'iterate' its guide tree. The hope is that the (Kimura) distances, that can be derived from the initial alignment, will give rise to a better guide tree, and by extension, to a better alignment. A similar rationale applies to HMM-iteration. MSAs in general are very 'vulnerable' at their early stages. Sequences that are aligned at an early stage remain fixed for the rest of the MSA. Another way of putting this is: 'once a gap, always a gap'. This behaviour can be mitigated by HMM iteration. An initial alignment is created and turned into a HMM. This HMM can help in a new round of MSA to 'anticipate' where residues should align. This is using the HMM as an External Profile and carrying out iterative EPA. In practice, individual sequences and profiles are aligned to the External HMM, derived after the initial alignment. Pseudo-count information is then transferred to the (internal) HMM, corresponding to the individual sequence/profile. The now somewhat 'softened' sequences/profiles are then in turn aligned in the order specified by the guide tree. Pseudo-count transfer is reduced with the size of the profile. Individual sequences attain the greatest pseudo-count transfer, larger profiles less so. Pseudo-count transfer to profiles larger than, say, 10 is negligible. The effect of HMM iteration is more pronounced in larger test sets (that is, with more sequences). Both, HMM- and guide tree-iteration come at a cost of increasing the run-time. One round of guide tree iteration adds on (roughly) the time it took to construct the initial alignment. If, for example, the initial alignment took 1min, then it will take (roughly) 2min to iterate the guide tree once, 3min to iterate the guide tree twice, and so on. HMM-iteration is more costly, as each round of iteration adds three times the time required for the alignment stage. For example, if the initial alignment took 1min, then each additional round of HMM iteration will add on 3min; so 4 iterations will take 13min (=1min+4*3min). The factor of 3 stems from the fact that at every stage both intermediate profiles have to be aligned with the background HMM, and finally the (softened) HMMs have to be aligned as well. All times are quoted for single processors. By default, guide tree iteration and HMM-iteration are coupled. This means, at each iteration step both, guide tree and HMM, are re-calculated. This is invoked by setting the --iter flag. For example, if --iter=1, then first an initial alignment is produced (without external HMM background information and using k-tuple distances to calculate the guide tree). This initial alignment is then used to re-calculate a new guide tree (using Kimura distances) and to create a HMM. The new guide tree and the HMM are then used to produce a new MSA. Iteration of guide tree and HMM can be de-coupled. This means that the number of guide tree iterations and HMM iterations can be different. This can be done by combining the --iter flag with the --max-guidetree-iterations and/or the --max-hmm-iterations flag. The number of guide tree iterations is the minimum of --iter and --max-guidetree-iterations, while the number of HMM iterations is the minimum of --iter and --max-hmm-iterations. If, for example, HMM iteration should be performed 5 times but guide tree iteration should be performed only 3 times, then one should set --iter=5 and --max-guidetree-iterations=3. All three flags can be specified at the same time (however, this makes no sense). It is not sufficient just to specify --max-guidetree-iterations and --max-hmm-iterations but not --iter. If any iteration is desired --iter has to be set. iterations Number of (combined guide-tree/HMM) iterations (--iter) Integer (defined $value)? " --iter=$value ": "" ( "" , " --iter="+str(value) )[ value is not None ] if iterations= 2. Clustal-Omega reads the input file, creates a UPGMA guide tree built from k-tuple distances, and performs an initial alignment. This initial alignment is converted into a HMM and a new guide tree is built from the Kimura distances of the initial alignment. The un-aligned sequences are then aligned (for the second time but this time) using pseudo-count information from the HMM created after the initial alignment (and using the new guide tree). This second alignment is then again converted into a HMM and a new guide tree is constructed. The un-aligned sequences are then aligned (for a third time), again using pseudo-count information of the HMM from the previous step and the most recent guide tree. The final alignment is written to screen. max_guidetree_iterations Maximum number guidetree iterations (--max-guidetree-iterations) Integer (defined $value)? " --max-guidetree-iterations=$value ": "" ( "" , " --max-guidetree-iterations="+str(value) )[ value is not None ] If iterations= 5 and the "Maximum number guidetree iterations" is set to 1. Clustal-Omega reads the input file, creates a UPGMA guide tree built from k-tuple distances, and performs an initial alignment. This initial alignment is converted into a HMM and a new guide tree is built from the Kimura distances of the initial alignment. The un-aligned sequences are then aligned (for the second time but this time) using pseudo-count information from the HMM created after the initial alignment (and using the new guide tree). For the last 4 iterations the guide tree is left unchanged and only HMM iteration is performed. This means that intermediate alignments are converted to HMMs, and these intermediate HMMs are used to guide the MSA during subsequent iteration stages. max_hmm_iterations Maximum number of HMM iterations (--max-hmm-iterations) Integer (defined $value)? " --max-hmm-iterations=$value ": "" ( "" , " --max-hmm-iterations="+str(value) )[ value is not None ] miscellaneous Miscellaneous auto Set options automatically (might overwrite some of your options) (--auto) Boolean 0 (defined $value and $value)? " --auto ": "" ( "" , " --auto ")[value is not None and value] Users may feel unsure which options are appropriate in certain situations even though using ClustalO without any special options should give you the desired results. The --auto flag tries to alleviate this problem and selects accuracy/speed flags according to the number of sequences. For all cases will use mBed and thereby possibly overwrite the --full option. For more than 1,000 sequences the iteration is turned off as the effect of iteration is more noticeable for 'larger' problems. Otherwise iterations are set to 1 if not already set to a higher value by the user. Expert users may want to avoid this flag and exercise more fine tuned control by selecting the appropriate options manually. verbosity 100 String " -v --force --log=clustalO_log" " -v --force --log=clustalO_log" alignment_output Multiple Sequence Alignment Protein Alignment FASTA CLUSTAL MSF PHILIPI STOCKHOLM FASTA "clustalO-multialign.out" "clustalO-multialign.out" guidetree_outfile Guide tree output file defined $guidetree_out guidetree_out is not None Tree NEWICK $guidetree_out guidetree_out distmat_outfile Pairwise distance matrix output file defined $distmat_out distmat_out is not None DistanceMatrix AbstractText $distmat_out distmat_out logfile Clustal omega log file ClustalOReport Report "clustalO_log" "clustalO_log" Programs-5.1.1/ChangeLog0000644000175000001560000021552212133262352013732 0ustar bneronsis2013-04-16 Bertrand Néron * saxs_merge: change the default of bnocomp enocomp change default of aalpha parameter add hidden parameter verbosity 2013-04-08 Bertrand Néron * saxs_merge: change title. set parameters blimit_fitting, elimit_fitting blimit_hessian, elimit_hessian to be hidden and with vdef = 80 2013-04-05 Bertrand Néron * golden.xml: bump version to 1.1a 2013-04-05 Bertrand Néron * saxs_merge: update to new version r1725. replace the 10 parameter of input by one Mutiple parameter 2013-04-02 Bertrand Néron * phyml.xml, morePhyml.xml: replace input dataFormat of input alignment by PHYLIP-RELAXED PHYLIP-RELAXED is the natural format describe in phyml. It accept long name (more than 10 chars) but not spaces in name. 2013-03-29 Corinne Maufrais * rankoptimizer.xml: new option: lower common ancestral (-a) * blast2taxonomy.xml: new option: krona report of blast result (-k) 2013-03-27 Bertrand Néron * forest2consense.xml: fix map_file parameter datatyping to be more expressive than Report 2013-03-27 Bertrand Néron * phyml.xml: change xml (add lot of parameters) according to new version (3.0 (20122408)) 2013-03-05 Bertrand Néron * melting.xml: improve inline documentation 2013-03-05 Bertrand Néron * Env/protdbs.xml: remove the following banks: genpept, genpept_new, gpbct, gppri, gpmam, gprod, gpvrt, gpinv, gppln, gpvrl, gpphg, gpsts, gpsyn, gppat, gpuna, gphtg they are not maintained anymore 2013-02-08 Bertrand Néron * squizz_checker.xml, squizz_convert.xml, squizz_package.xml: bump to version 0.99b 2013-02-05 Bertrand Néron * pdb23.xml. add pdb23 interface 2013-02-05 Bertrand Néron * clustalO-multialign.xml, clustalO-profile.xml, clustalO-sequence.xml. upgrade to version 1.1.0 (add support of DNA/RNA) 2013-02-04 Corinne Maufrais * rankoptimizer.xml, kronaextract.xml: new xml * taxoptimizer.xml: output type modification for further analyzes pipeline * blast2taxoclass.xml: bug fix. duplicated nbofhit parameter 2013-01-28 Bertrand Néron * golden.xml: the database entity is not in Entitites but in Env. remove the fallback as golden service without database is meaningless. 2013-01-08 Bertrand Néron * CONJscan-T4SSscan.xml. add issimple to the following parameters: prog_desc, seqfile, E_value_cutoff, incE 2013-01-08 Bertrand Néron * clustalO-multialign.xml, clustalO-sequence.xml, clustalw-profile.xml, clustalw-sequence.xml mview_alignment.xml, mview_blast.xml, pima.xml, squizz_convert.xml replace PIR format by CODATA. from squizz version0.99b the 2 formats PIR and NBRF are tagged respectively CODATA and NBRF 2013-01-08 Bertrand Néron * clustalw-profile.xml, clustalw-sequence.xml. rewrite typing of output. fasta is considered as alignment instead of sequences fix several bugs in filenames element of the out parameter 2013-01-08 Bertrand Néron * mview_alignment.xml, mview_blast.xml. fix bug in filenames of several out parameters. the ouput file name is not anymore mview.out but mview_alignment.out or mview_blast.out, the dinamic typing was refered to outformat parameter but this parameter was named out. 2013-01-07 Corinne Maufrais * abiview.xml, acdrelations.xml, antigenic.xml, backtranambig.xml, backtranseq.xml, banana.xml, biosed.xml, btwisted.xml, cai.xml, chaos.xml, charge.xml, checktrans.xml, chips.xml, cirdna.xml, codcmp.xml, coderet.xml, compseq.xml, cons.xml, cpgplot.xml, cpgreport.xml, cusp.xml, cutseq.xml, dan.xml, degapseq.xml, density.xml, descseq.xml, diffseq.xml, digest.xml, distmat.xml, dotmatcher.xml, dotpath.xml, dottup.xml, dreg.xml, edialign.xml, einverted.xml, emowse.xml, entret.xml, epestfind.xml, eprimer3.xml, equicktandem.xml, est2genome.xml, etandem.xml, extractalign.xml, extractfeat.xml, extractseq.xml, featcopy.xml, featreport.xml, findkm.xml, freak.xml, fuzznuc.xml, fuzzpro.xml, fuzztran.xml, garnier.xml, geecee.xml, getorf.xml, helixturnhelix.xml, hmoment.xml, iep.xml, infoalign.xml, isochore.xml, jaspscan.xml, lindna.xml, listor.xml, makenucseq.xml, makeprotseq.xml, marscan.xml, maskfeat.xml, maskseq.xml, matcher.xml, megamerger.xml, merger.xml, msbar.xml, mwfilter.xml, needle.xml, needleall.xml, newcpgreport.xml, newcpgseek.xml, newseq.xml, notseq.xml, nthseq.xml, octanol.xml, oddcomp.xml, palindrome.xml, pasteseq.xml, patmatdb.xml, patmatmotifs.xml, pepcoil.xml, pepinfo.xml, pepnet.xml, pepstats.xml, pepwheel.xml, pepwindow.xml, pepwindowall.xml, plotcon.xml, plotorf.xml, polydot.xml, preg.xml, prettyplot.xml, prettyseq.xml, primersearch.xml, profit.xml, prophecy.xml, prophet.xml, pscan.xml, recoder.xml, redata.xml, remap.xml, restover.xml, restrict.xml, revseq.xml, seqmatchall.xml, seqret.xml, seqretsetall.xml, seqretsplit.xml, showalign.xml, showfeat.xml, showorf.xml, showpep.xml, showseq.xml, shuffleseq.xml, sigcleave.xml, silent.xml, sirna.xml, sixpack.xml, skipseq.xml, splitter.xml, stretcher.xml, stssearch.xml, supermatcher.xml, syco.xml, tcode.xml, textsearch.xml, tfscan.xml, tmap.xml, tranalign.xml, transeq.xml, trimest.xml, trimseq.xml, twofeat.xml, union.xml, vectorstrip.xml, water.xml, wobble.xml, wordcount.xml, wordfinder.xml, wordmatch.xml, yank.xml: repklace pir format by codata format 2013-01-03 Olivia Doppelt-Azeroual * taxoptimizer.xml: debug xml for taxoptimizer program optional outputFile missing 2012-12-18 Olivia Doppelt-Azeroual * taxoptimizer.xml: add xml for taxoptimizer program 2012-11-16 Olivia Doppelt-Azeroual * pdb22.xml: add text tag in description element 2012-11-15 Bertrand Néron * pdb22.xml: add pdb22 interface 2012-09-10 Bertrand Néron * T3SSscan-FLAGscan: add T3SSscan-FLAGscan interface 2012-09-18 Bertrand Néron * sax_merge.xml: update list of autors. 2012-09-11 Bertrand Néron * sax_merge.xml: modification according to new sax_merge options 2012-08-31 Bertrand Néron * pdb21.xml: change prompt for folding parameter 2012-08-31 Bertrand Néron * infoseq.xml: change the default value for e_delimiter parameter since | is not allowed by mobyle. the new default value (in Mobyle) is @ character change the comment according this new default value. 2012-07-31 Bertrand Néron * predator.xml: add attribute issimple to sequences parameter 2012-07-31 Bertrand Néron * sig.xml: improve comment of patterns parameter, remove text which is not a pattern in example 2012-06-25 Bertrand Néron * blast2.xml, Env/nucdbs_blast.xml: modify xml inclusion for protein_db and nucleotid_db to allow inclusion of blast specific bank remove fallback if any bank is find as it is a non sense to deploy blast without bank. 2012-06-18 Bertrand Néron * merge to reintegrate release_1_0 r1009 in trunk 2012-06-15 Bertrand Néron * clustalw-multialign: add parameter to allow user to provide an Alignment file as input 2012-06-15 Bertrand Néron * 'BMGE.xml', 'align_reorder.xml', 'bambe.xml', 'blast2genoclass.xml', 'blast2seqid.xml', 'blast2taxoclass.xml', 'blast2taxonomy.xml', 'cif.xml', 'clustalw-profile.xml', 'clustalw-sequence.xml', 'codonw.xml', 'comalign.xml', 'combat.xml', 'consensus.xml', 'dca.xml', 'detect_cnv.xml', 'extend_align.xml', 'fasta.xml', 'fastaRename.xml', 'gblocks.xml', 'gff2ps.xml', 'gruppi.xml', 'hmmalign.xml', 'hmmbuild.xml', 'hmmemit.xml', 'hmmfetch.xml', 'hmmscan.xml', 'hmmsearch.xml', 'hmmsim.xml', 'hmmstat.xml', 'jackhmmer.xml', 'ktreedist.xml', 'mfold.xml', 'morePhyML.xml', 'msa.xml', 'msaprobs.xml', 'mspcrunch.xml', 'nw_cat.xml', 'nw_rename.xml', 'pftools.xml', 'phiblast.xml', 'phmmer.xml', 'phyml.xml', 'pima.xml', 'pratt.xml', 'primo.xml', 'prose.xml', 'psiblast.xml', 'psort.xml', 'puzzle.xml', 'rbvotree.xml', 'repeatoire.xml', 'repeats.xml', 'rnadistance.xml', 'rnaeval.xml', 'rnaheat.xml', 'rnainverse.xml', 'rnapdist.xml', 'rnasubopt.xml', 'saps.xml', 'scan_for_matches.xml', 'scan_region.xml', 'scangen.xml', 'seqgen.xml', 'sig.xml', 'signalp.xml', 'smile.xml', 'squizz_convert.xml', 'tacg.xml', 'targetp.xml', 'tipdate.xml', 'treealign.xml', 'trnascan.xml', 'weighbor.xml', 'wise2.xml', 'xpound.xml', 'xxr.xml': add issimple attribute for some parameters specialy those which are mandatory and have not precond. 2012-06-13 Bertrand Néron * mview_blast.xml: fix typo in program description 2012-06-13 Bertrand Néron * mview.xml, mview_alignment.xml, mview_blast.xml: mview is split and replaced by mview_alignment and mview_blast, - fix output problem when html format is required. - fix issimple parameter - fix comment for alignment>colouring and consensus_colouring (mview does work with fata reprot so the interface will not be deployed) 2012-04-25 Bertrand Néron * saxs_merge.xml: bug fix in format/code[proglang= "python"] for parameter blimit elimit baverage and eaverage 2012-04-23 Bertrand Néron * dnapars.xml, dnadist.xml, phyml.xml, pars.xml, protdist.xml, protpars.xml: add issimple attribute to bootsrap related options 2012-04-23 Bertrand Néron * fastaRename, nw_rename: improve help (add comment in head and parameter), replace "Report" type by "ID_Mapping" in fastaRename output and nw_rename input to fix chaining between these two programs. 2012-04-21 Bertrand Néron * saxs_merge.xml: add xml for saxs_merge.py program. 2012-04-10 Corinne Maufrais * fuzztran.xml: code python correction: ismatch versus pmismatch 2012-04-05 Bertrand Néron * repeats.xml: add comment in head 2012-03-21 Bertrand Néron * extend_align.xml: add comment in head add parameter extend_method fix format element for parameter linker 2012-03-02 Bertrand Néron * ktreedist.xml: add comment to warn the user about a bug in NEWICK/NEXUS parser (the tree must be on one line) and add an example in parameters ref_tree comp_tree 2012-03-02 Bertrand Néron * morePhyml.xml: mv dataFormat elemt fo parameter usertreefile at the right place. 2011-12-19 Bertrand Néron * puzzle.xml, nw_rename.xml,phyml.xml,clustalw-multialign.xml,drawtree.xml,dnapars.xml,drawgram.xml, consense.xml,bambe.xml,protpars.xml,hmmsearch.xml,unroot.xml,mix.xml,fitch.xml,hmmscan.xml, rbvotree.xml,ktreedist.xml,seqgen.xml,pratt.xml,pars.xml,fastdnaml.xml,kitsch.xml,morePhyML.xml: add dataFormat element for all parameter of class Tree 2011-12-19 Bertrand Néron * fastdnaml.xml: remove double bracket square in perl code of treefiles parameter 2011-12-16 Bertrand Néron * gff2ps: change type of gff_file subclass abstractext to Feature and add GFF as dataFormat 2011-12-30 Bertrand Néron * INSTALL: add installation instructions for xml programs definitions 2011-12-30 Bertrand Néron * gruppi.xml: change entity gruppi_env in gruppi_data 2011-12-30 Bertrand Néron * combat.xml, consensus.xml, detect_cnv.xml, dssp.xml, fasta.xml, pftools.xml, pratt.xml, rnacofold.xml, rnaduplex.xml, rnafold.xml, rnaheat.xml, rnainverse.xml, rnalfold.xml, rnapdist.xml, rnaplfold.xml, rnasubopt.xml, scangen.xml, scan_region.xml, smile.xml: externalized hard coded path from xml definition into xml files in Env directory. the data will be include in definition during deployement using xinclude mechanism. 2011-11-30 Bertrand Néron * dnapars.xml, mix.xml, pars.xml, protpars.xml: Fix typo parcymony -> parsymony 2011-11-25 Bertrand Néron * muscle.xml: BUGFIX replace some reference to an unknown parameter outline by outfile ( in code, precond ...) 2011-11-25 Bertrand Néron * clustalO-sequence.xml: fix name, remove empty comment in input parameter , fix doc/comment 2011-11-25 Bertrand Néron * clustalO-multialign.xml: close parenthesis in sequences_input comment 2011-10-28 Hervé Ménager * squizz_convert.xml: corrected exclusive input choice error message (reported from quicktree.xml). 2011-10-28 Hervé Ménager * quicktree.xml: corrected exclusive input choice control. 2011-10-27 Bertrand Néron * signalp.xml: update according to the signalp 4.0 version 2011-10-06 Hervé Ménager * fitch.xml, signalp.xml, netchop.xml, hmmstat.xml, rnaup.xml, seqgen.xml, mview.xml, hmmsim.xml, weighbor.xml, rnadistance.xml, golden.xml, dnadist.xml, dnapars.xml, cif.xml, trnascan.xml, kitsch.xml, pftools.xml, hmmfetch.xml, protpars.xml, protdist.xml, tacg.xml: parameter names and references in code have been corrected wrt new no-js-collision requirements. 2011-10-06 Hervé Ménager * fasta.xml: test in ktup control fixed (ADN=>DNA). 2011-10-06 Hervé Ménager * boxshade.xml: test in control fixed. 2011-10-03 Hervé Ménager * consense.xml: additional spaces removed from example data 2011-09-22 Corinne Maufrais * blast2taxonomy.xml: nodeName parameter changed to avoid js problem in the portal 2011-08-03 Bertrand Néron * Entities/clustalO_package.xml, clustalO-profile , clustalO-sequence: bump version to 1.0.3 add MSF to list of input alignemnt supported, update clustlaO-sequence comment 2011-08-03 Bertrand Néron * codonw.xml: force the fasta reformat for seqfile parameter. If sequences are given on one long line codonw does not recognise the fasta format event it was validate by squizz 2011-08-03 Bertrand Néron * msaprobs.xml: fix typo in doi 2011-08-02 Bertrand Néron * msaprobs.xml: add comment for annotaion and annotaion_file parameters 2011-08-01 Bertrand Néron * msaprobs.xml: add program definition for msaprobs, a protein multiple sequence alignment algorithm based on pair hidden Markov models and partition function posterior probabilities. 2011-07-28 Bertrand Néron * clustalO_package.xml, clustalO-multialign.xml: add 3 definitions for clustla-omega program to align a set of (un)aligned sequences. add package informations: in Entities/clustalO_package.xml 2011-06-27 Bertrand Néron * blast2seqid: add the xml definition in the official distribution. this program definition call the script blast2usa 2011-06-27 Bertrand Néron * fasta.xml: fix bug in databank entitities path. remove fallback which have no interest here since the banks are mandatory. 2011-06-06 Bertrand Néron * dnadist.xml: Bug fix in weights-file parameter format. must be test if value is defined, not value itself. 2011-03-24 Bertrand Néron * netOglyc.xml: add comment in head/doc, fix comment in graphics parameter 2011-03-03 Bertrand Néron * mafft.xml: add mafft definition. mafft is a multialignment program 2011-01-03 Bertrand Néron * netchop.xml netNglyc.xml netOglyc.xml signalp.xml targetp.xml tmhmm.xml: add these definitions to pasteur-programs 2011-24-02 Bertrand Néron * cif.xml: bump version to 0.2.2 2011-21-02 Bertrand Néron * muscle.xml: remove parameter grouping which correspond to -stable option. This option is not supported by 3.8.31 muscle version. 2011-21-02 Bertrand Néron * protpars.xml: Bug fix in tree_file format. the expression in tuple indice must be an integer not string, so I replace value by value is not None for python code. 2011-16-02 Bertrand Néron * clustalw-profile.xml, puzzle.xml, clustalw-multialign.xml * clustalw-sequence.xml, dnapars.xml, squizz_convert.xml, dnadist.xml * protdist.xml, golden.xml, lvb.xml, seqgen.xml, fastdnaml.xml, BMGE.xml change PHYLIP data format for PHYLIPI (Interleaved) or PHYLIPS (Sequential) 2011-07-02 Bertrand Néron * cosa.xml, dssp.xml: change PDB datatyping from to AbstractText AbstractText PDB _3DStructure PDB to be compliant with MobyleNet and Jmol viewer 2011-07-02 Bertrand Néron * hmmsearch.xml: BUGFIX one and only one of these 2 parameter public_seq_DB and perso_seq_DB must be specified by the user. 2011-07-02 Bertrand Néron * blast2mydb.xml: BUGFIX in extend_hit. the expression inside the tuple must be an integer not a float or none so [value] -> [ value is not None ] 2011-01-02 Bertrand Néron * dnadist.xml: BUGFIX in variation_coeff and invariant_sites parameters test if value is defined not the value itself ( format element ) 2011-24-01 Bertrand Néron * rnafold.xml: the infile must be the same if readeseq is used or not ( -C True ) 2011-20-01 Bertrand Néron * gruppi.xml: fix path to gruppi_env.xml 2011-05-01 Bertrand Néron * clustalw-multialign.xml: - reorder the hgapresidues vdef values as they appear in vlist to make sense of the comparison vdef vs user value in Mobyle - in output the results in FASTA are handle as Alignment instead of Sequence 2010-13-12 Bertrand Néron * phmmer.xml, wublast2.xml, psiblast.xml, hmmsearch.xml, phiblast.xml, pftools.xml, blast2.xml, gruppi.xml, jackhmmer.xml: update Xinclude path toward Entities from ../Local/Programs/Entities/ to ../../Local/Services/Programs/Entities/ 2010-13-12 Bertrand Néron * cosa.xml, dssp.xml: replace class "Pdb" by "PDB" to be MobyleNet compliant. 2010-13-12 Bertrand Néron * fasta.xml: change the precond of scoring_protein paragraph. the parameters of thgis paragraph must be available even if the program is fasta. 2010-11-10 Corinne Maufrais * scangen.xml: new form 2010-10-12 Bertrand Néron * puzzle.xml: the f84_ratio parameter become (conditionnaly) mandatory. This fix a bug when constrain_TN is checked and no value is provided for f84_ratio the program enter in endless loop. 2010-09-23 Corinne Maufrais * abiview.xml, acdrelations.xml, antigenic.xml, backtranambig.xml, backtranseq.xml, banana.xml, biosed.xml, btwisted.xml, cai.xml, chaos.xml, charge.xml, checktrans.xml, chips.xml, cirdna.xml, codcmp.xml, coderet.xml, compseq.xml, cons.xml, cpgplot.xml, cpgreport.xml, cusp.xml, cutseq.xml, dan.xml, degapseq.xml, density.xml, descseq.xml, diffseq.xml, digest.xml, distmat.xml, dotmatcher.xml, dotpath.xml, dottup.xml, dreg.xml, edialign.xml, einverted.xml, emowse.xml, entret.xml, epestfind.xml, eprimer3.xml, equicktandem.xml, est2genome.xml, etandem.xml, extractalign.xml, extractfeat.xml, extractseq.xml, featcopy.xml, featreport.xml, findkm.xml, freak.xml, fuzznuc.xml, fuzzpro.xml, fuzztran.xml, garnier.xml, geecee.xml, getorf.xml, helixturnhelix.xml, hmoment.xml, iep.xml, infoalign.xml, isochore.xml, jaspscan.xml, lindna.xml, listor.xml, makenucseq.xml, makeprotseq.xml, marscan.xml, maskfeat.xml, maskseq.xml, matcher.xml, megamerger.xml, merger.xml, msbar.xml, mwfilter.xml, needle.xml, needleall.xml, newcpgreport.xml, newcpgseek.xml, newseq.xml, notseq.xml, nthseq.xml, octanol.xml, oddcomp.xml, palindrome.xml, pasteseq.xml, patmatdb.xml, patmatmotifs.xml, pepcoil.xml, pepinfo.xml, pepnet.xml, pepstats.xml, pepwheel.xml, pepwindow.xml, pepwindowall.xml, plotcon.xml, plotorf.xml, polydot.xml, preg.xml, prettyplot.xml, prettyseq.xml, primersearch.xml, profit.xml, prophecy.xml, prophet.xml, pscan.xml, recoder.xml, redata.xml, remap.xml, restover.xml, restrict.xml, revseq.xml, seqmatchall.xml, seqret.xml, seqretsetall.xml, seqretsplit.xml, showalign.xml, showfeat.xml, showorf.xml, showpep.xml, showseq.xml, shuffleseq.xml, sigcleave.xml, silent.xml, sirna.xml, sixpack.xml, skipseq.xml, splitter.xml, stretcher.xml, stssearch.xml, supermatcher.xml, syco.xml, tcode.xml, textsearch.xml, tfscan.xml, tmap.xml, tranalign.xml, transeq.xml, trimest.xml, trimseq.xml, twofeat.xml, union.xml, vectorstrip.xml, water.xml, wobble.xml, wordcount.xml, wordfinder.xml, wordmatch.xml, yank.xml: updated to v6.3.1 2010-09-23 Corinne Maufrais * morePhyML.xml: new xml 2010-09-14 BErtrand Néron * neighbor.xml: add control to multiple and jumble parameters to forbid the settings of this 2 parameters at the same time. remove unused argpos from jumble_seed parameter ( no format ) 2010-09-09 Bertrand Neron * clique.xml, consense.xml, dnadist.xml, dnapars.xml, drawgram.xml, drawtree.xml, fitch.xml, kitsch.xml, mix.xml, neighbor.xml, pars.xml, protdist.xml, protpars.xml, unroot.xml: remove "Phylip" from the program/title as this information is in new package/title element 2010-09-08 Bertrand Neron * bambe.xml, clustalw-sequence.xml, drawgram.xml, hmmalign.xml, lvb.xml, phiblast.xml, quicktree.xml, rnapdist.xml, toppred.xml, bionj.xml, codonw.xml, drawtree.xml, hmmbuild.xml, melting.xml, phmmer.xml, rbvotree.xml, rnaplfold.xml, treealign.xml, bl2seq.xml, comalign.xml, dssp.xml, hmmconvert.xml, mfold.xml, phyml.xml, repeatoire.xml, rnasubopt.xml, trnascan.xml, blast2mydb.xml, combat.xml, elp.xml, hmmemit.xml, mix.xml, pima.xml, repeats.xml, saps.xml, unroot.xml, blast2taxonomy.xml, concatfasta.xml, hmmfetch.xml, mreps.xml, pratt.xml, rnaalifold.xml, scan_for_matches.xml, weighbor.xml, blast2.xml, consense.xml, fasta.xml, hmmscan.xml, msa.xml, predator.xml, rnacofold.xml, scan_region.xml, wise2.xml, BMGE.xml, consensus.xml, fastdnaml.xml, hmmsearch.xml, mspcrunch.xml, primo.xml, rnadistance.xml, seqgen.xml, wublast2.xml boxshade.xml, cosa.xml, fitch.xml, hmmsim.xml, muscle.xml, prose.xml, rnaduplex.xml, sig.xml, xpound.xml, cap3.xml, dca.xml, gblocks.xml, hmmstat.xml, mview.xml, protdist.xml, rnaeval.xml, smile.xml, xxr.xml, cif.xml, detect_cnv.xml, genscan.xml, html4blast.xml, neighbor.xml, protpars.xml, rnafold.xml, squizz_checker.xml, clique.xml, dialign.xml, golden.xml, jackhmmer.xml, newicktops.xml, psiblast.xml, rnaheat.xml, squizz_convert.xml, clustalw-multialign.xml, dnadist.xml, growthpred.xml, kitsch.xml, pars.xml, psort.xml, rnainverse.xml, tacg.xml, clustalw-profile.xml, dnapars.xml, gruppi.xml, ktreedist.xml, pftools.xml, puzzle.xml, rnalfold.xml, tipdate.xml: add element package , sourcelink and hompagelink 2010-08-27 Bertrand Neron * rnaalifold.xml, rnaeval.xml, rnacofold.xml: upgrade xml to new 1.8.4 version 2010-08-10 Corinne Maufrais * rnasubopt.xml, rnafold.xml, rnaplfold.xml: update to v1.8.4 2010-08-20 Corinne Maufrais * abiview.xml, acdrelations.xml, antigenic.xml, backtranambig.xml, backtranseq.xml, banana.xml, biosed.xml, btwisted.xml, cai.xml, chaos.xml, charge.xml, checktrans.xml, chips.xml, cirdna.xml, codcmp.xml, coderet.xml, compseq.xml, cons.xml, cpgplot.xml, cpgreport.xml, cusp.xml, cutseq.xml, dan.xml, degapseq.xml, density.xml, descseq.xml, diffseq.xml, digest.xml, distmat.xml, dotmatcher.xml, dotpath.xml, dottup.xml, dreg.xml, edialign.xml, einverted.xml, emowse.xml, entret.xml, epestfind.xml, eprimer3.xml, equicktandem.xml, est2genome.xml, etandem.xml, extractalign.xml, extractfeat.xml, extractseq.xml, featcopy.xml, featreport.xml, findkm.xml, freak.xml, fuzznuc.xml, fuzzpro.xml, fuzztran.xml, garnier.xml, geecee.xml, getorf.xml, helixturnhelix.xml, hmoment.xml, iep.xml, infoalign.xml, isochore.xml, jaspscan.xml, lindna.xml, listor.xml, makenucseq.xml, makeprotseq.xml, marscan.xml, maskfeat.xml, maskseq.xml, matcher.xml, megamerger.xml, merger.xml, msbar.xml, mwfilter.xml, needle.xml, needleall.xml, newcpgreport.xml, newcpgseek.xml, newseq.xml, notseq.xml, nthseq.xml, octanol.xml, oddcomp.xml, palindrome.xml, pasteseq.xml, patmatdb.xml, patmatmotifs.xml, pepcoil.xml, pepinfo.xml, pepnet.xml, pepstats.xml, pepwheel.xml, pepwindow.xml, pepwindowall.xml, plotcon.xml, plotorf.xml, polydot.xml, preg.xml, prettyplot.xml, prettyseq.xml, primersearch.xml, profit.xml, prophecy.xml, prophet.xml, pscan.xml, recoder.xml, redata.xml, remap.xml, restover.xml, restrict.xml, revseq.xml, seqmatchall.xml, seqret.xml, seqretsetall.xml, seqretsplit.xml, showalign.xml, showfeat.xml, showorf.xml, showpep.xml, showseq.xml, shuffleseq.xml, sigcleave.xml, silent.xml, sirna.xml, sixpack.xml, skipseq.xml, splitter.xml, stretcher.xml, stssearch.xml, supermatcher.xml, syco.xml, tcode.xml, textsearch.xml, tfscan.xml, tmap.xml, tranalign.xml, transeq.xml, trimest.xml, trimseq.xml, twofeat.xml, union.xml, vectorstrip.xml, water.xml, wobble.xml, wordcount.xml, wordfinder.xml, wordmatch.xml, yank.xml: - ref parameter not well-formed: fixe - precond like $(format) = 0 is now parsed - Adding package-level information for issue #195 * showseq.xml, showpep.xml,showorf.xml: things/frames parameter is moved from Choice to String. Multiple choice can not be used in this case because Multiple choice value is sorted by Mobyle and for this 2 programs not sorted value is used. 2010-08-11 Hervé Ménager * lvb.xml: switched to version 2.3 of lvb - "duration" parameter removed. 2010-08-09 Hervé Ménager * coderet.xml, degapseq.xml: bugfix: xml not well-formed 2010-07-01 Corinne Maufrais * kitsch.xml: In treefile parameter, mv outtree filenames to kitsch.outtree 2010-06-21 Bertrand Néron * muscle.xml: change the class of parameter outfile from String to Filename. (String allow space but quote the string on the command line fileName replace space by _ but not quote the name) 2010-06-18 Bertrand Néron * fasdnaml.xml, mspcrunch.xml, rnadistance.xml, tacg.xml, xpound.xml: remove the redirection toward the programeName.out there should never be a shell redirection ( > ) towards the programName.out this file is automatically created by Mobyle if a redirection is made, 2 files descriptors are made pointing on the same destination and enter in conflicts. 2010-06-03 Corinne Maufrais * abiview.xml, acdrelations.xml, antigenic.xml, backtranambig.xml, backtranseq.xml, banana.xml, biosed.xml, btwisted.xml, cai.xml, chaos.xml, charge.xml, checktrans.xml, chips.xml, cirdna.xml, codcmp.xml, coderet.xml, compseq.xml, cons.xml, cpgplot.xml, cpgreport.xml, cusp.xml, cutseq.xml, dan.xml, degapseq.xml, density.xml, descseq.xml, diffseq.xml, digest.xml, distmat.xml, dotmatcher.xml, dotpath.xml, dottup.xml, dreg.xml, edialign.xml, einverted.xml, emowse.xml, entret.xml, epestfind.xml, eprimer3.xml, equicktandem.xml, est2genome.xml, etandem.xml, extractalign.xml, extractfeat.xml, extractseq.xml, featcopy.xml, featreport.xml, findkm.xml, freak.xml, fuzznuc.xml, fuzzpro.xml, fuzztran.xml, garnier.xml, geecee.xml, getorf.xml, helixturnhelix.xml, hmoment.xml, iep.xml, infoalign.xml, isochore.xml, jaspscan.xml, lindna.xml, listor.xml, makenucseq.xml, makeprotseq.xml, marscan.xml, maskfeat.xml, maskseq.xml, matcher.xml, megamerger.xml, merger.xml, msbar.xml, mwfilter.xml, needle.xml, needleall.xml, newcpgreport.xml, newcpgseek.xml, newseq.xml, notseq.xml, nthseq.xml, octanol.xml, oddcomp.xml, palindrome.xml, pasteseq.xml, patmatdb.xml, patmatmotifs.xml, pepcoil.xml, pepinfo.xml, pepnet.xml, pepstats.xml, pepwheel.xml, pepwindow.xml, pepwindowall.xml, plotcon.xml, plotorf.xml, polydot.xml, preg.xml, prettyplot.xml, prettyseq.xml, primersearch.xml, profit.xml, prophecy.xml, prophet.xml, pscan.xml, recoder.xml, redata.xml, remap.xml, restover.xml, restrict.xml, revseq.xml, seqmatchall.xml, seqret.xml, seqretsetall.xml, seqretsplit.xml, showalign.xml, showfeat.xml, showorf.xml, showpep.xml, showseq.xml, shuffleseq.xml, sigcleave.xml, silent.xml, sirna.xml, sixpack.xml, skipseq.xml, splitter.xml, stretcher.xml, stssearch.xml, supermatcher.xml, syco.xml, tcode.xml, textsearch.xml, tfscan.xml, tmap.xml, tranalign.xml, transeq.xml, trimest.xml, trimseq.xml, twofeat.xml, union.xml, vectorstrip.xml, water.xml, wobble.xml, wordcount.xml, wordfinder.xml, wordmatch.xml, yank.xml: reference modification 2010-05-25 Corinne Maufrais * repeatoire.xml: add some parameters * hmmemit.xml: output name file fix * phyml.xml: documentation 2010-05-19 Corinne Maufrais * muscle.xml: bug fix in isout and isstdout precond 2010-05-12 Corinne Maufrais * repeatoire.xml: new xml 2010-05-03 Corinne Maufrais * abiview.xml, acdrelations.xml, antigenic.xml, backtranambig.xml, backtranseq.xml, banana.xml, biosed.xml, btwisted.xml, cai.xml, chaos.xml, charge.xml, checktrans.xml, chips.xml, cirdna.xml, codcmp.xml, coderet.xml, compseq.xml, cons.xml, cpgplot.xml, cpgreport.xml, cusp.xml, cutseq.xml, dan.xml, degapseq.xml, density.xml, descseq.xml, diffseq.xml, digest.xml, distmat.xml, dotmatcher.xml, dotpath.xml, dottup.xml, dreg.xml, edialign.xml, einverted.xml, emowse.xml, entret.xml, epestfind.xml, eprimer3.xml, equicktandem.xml, est2genome.xml, etandem.xml, extractalign.xml, extractfeat.xml, extractseq.xml, featcopy.xml, featreport.xml, findkm.xml, freak.xml, fuzznuc.xml, fuzzpro.xml, fuzztran.xml, garnier.xml, geecee.xml, getorf.xml, helixturnhelix.xml, hmoment.xml, iep.xml, infoalign.xml, isochore.xml, jaspscan.xml, lindna.xml, listor.xml, makenucseq.xml, makeprotseq.xml, marscan.xml, maskfeat.xml, maskseq.xml, matcher.xml, megamerger.xml, merger.xml, msbar.xml, mwfilter.xml, needle.xml, needleall.xml, newcpgreport.xml, newcpgseek.xml, newseq.xml, notseq.xml, nthseq.xml, octanol.xml, oddcomp.xml, palindrome.xml, pasteseq.xml, patmatdb.xml, patmatmotifs.xml, pepcoil.xml, pepinfo.xml, pepnet.xml, pepstats.xml, pepwheel.xml, pepwindow.xml, pepwindowall.xml, plotcon.xml, plotorf.xml, polydot.xml, preg.xml, prettyplot.xml, prettyseq.xml, primersearch.xml, profit.xml, prophecy.xml, prophet.xml, pscan.xml, recoder.xml, redata.xml, remap.xml, restover.xml, restrict.xml, revseq.xml, seqmatchall.xml, seqret.xml, seqretsetall.xml, seqretsplit.xml, showalign.xml, showfeat.xml, showorf.xml, showpep.xml, showseq.xml, shuffleseq.xml, sigcleave.xml, silent.xml, sirna.xml, sixpack.xml, skipseq.xml, splitter.xml, stretcher.xml, stssearch.xml, supermatcher.xml, syco.xml, tcode.xml, textsearch.xml, tfscan.xml, tmap.xml, tranalign.xml, transeq.xml, trimest.xml, trimseq.xml, twofeat.xml, union.xml, vectorstrip.xml, water.xml, wobble.xml, wordcount.xml, wordfinder.xml, wordmatch.xml, yank.xml: - Change Nucleic Biotype in DNA. - Remove element. - Add ref and test element. 2010-04-30 Corinne Maufrais * BMGE.xml bambe.xml bionj.xml bl2seq.xml blast2.xml blast2mydb.xml blast2taxonomy.xml boxshade.xml cap3.xml cif.xml clique.xml clustalw-multialign.xml clustalw-profile.xml clustalw-sequence.xml codonw.xml combat.xml concatfasta.xml consense.xml consensus.xml cosa.xml dca.xml detect_cnv.xml dialign.xml dnadist.xml dnapars.xml drawgram.xml drawtree.xml elp.xml fasta.xml fastdnaml.xml fitch.xml blocks.xml genscan.xml golden.xml growthpred.xml gruppi.xml hmmalign.xml hmmbuild.xml hmmconvert.xml hmmemit.xml hmmfetch.xml hmmscan.xml hmmsearch.xml hmmsim.xml hmmstat.xml jackhmmer.xml kitsch.xml ktreedist.xml lvb.xml melting.xml mfold.xml mix.xml mreps.xml msa.xml mspcrunch.xml muscle.xml mview.xml neighbor.xml newick tops.xml pars.xml pftools.xml phiblast.xml phmmer.xml phyml.xml pima.xml pratt.xml predator.xml primo.xml prose.xml protdist.xml protpars.xml psiblast.xml psort.xml puzzle.xml quicktree.xml rbvotree.xml repeats.xml rnaalifold.xml rnacofold.xml rnaduplex.xml rnafold.xml rnaheat.xml rnainverse.xml rnalfold.xml rnapdist.xml rnaplfold.xml rnasubopt.xml saps.xml scan_for_matches.xml scan_region.xml seqgen.xml sig.xml smile.xml squizz_convert.xml tacg.xml tipdate.xml toppred.xml treealign.xml trnascan.xml unroot.xml weighbor.xml wise2.xml wublast2.xml xpound.xml xxr.xml : - Change Nucleic Biotype in DNA. - Remove element. - Add ref and test element. 2010-04-28 Sandrine Larroude * cif.xml: Program added to 2 categories. 2010-04-27 Sandrine Larroude * puzzle.xml: doclink addition. 2010-04-15 Sandrine Larroude * treealign.xml: Fix the writing on par.dat of expected usertree value that was deleting by mistake. 2010-04-09 Sandrine Larroude * treealign.xml: (1) Fix the python code of usertree that was not writing the usertree name in the parameter file and Remove the parameter file as isout. (2) Remove the usertree functionality as the program is not supporting it... 2010-03-23 Corinne Maufrais * mfold.xml: Change "out_ss and out_rnaml" parameter type from Report or Text to 2DStructure. Modification for viewers compatibility. 2010-03-23 Corinne Maufrais * blast2taxonomy.xml, fastdnaml.xml, gff2ps.xml, hmmalign.xml, hmmemit.xml, hmmfetch.xml, hmmscan.xml, hmmsearch.xml, muscle.xml, predator, rbvotree.xml, rnafold.xml: - issdtout parameter: is 'programName.out', - isout parameter: could not be set to 'programName.out'. 2010-03-22 Sandrine Larroude * cif.xml: Addition of cif program (Emmanuel Quevillon, Pasteur). 2010-03-22 Bertrand Neron * predator.xml: - Fix typo in precond for alignment parameter, - Add parameter predator_out which is the stdout to allow displaying help about predator results. 2010-03-19 Bertrand Neron * pars.xml: Bug fix in ctrl perl code for jumble_times. 2010-03-19 Bertrand Neron * pars.xml: - Fix typo in jumble_bootstrap prompt, - Add a control to check if seqboot the jumble_times must be >= 1, - Change the control of jumble_times: if the method is jumble the jumble_times < 1000000 whatever specified in seqboot_replicates, if method is seqboot seqboot_replicates * jumble < 1000000. 2010-03-18 Bertrand Neron * pars.xml: Add a parameter to allow the user to specify the discrete character format (interleaved/sequential). 2010-03-16 Corinne Maufrais * hmmsearch.xml: Add personal protein sequence database parameter. 2010-03-10 Nicolas Joly * mreps.xml: Do not specify sequence format in prompt. 2010-03-10 Corinne Maufrais * abiview.xml, acdrelations.xml, antigenic.xml, backtranambig.xml, backtranseq.xml, banana.xml, biosed.xml, btwisted.xml, cai.xml, chaos.xml, charge.xml, checktrans.xml, chips.xml, cirdna.xml, codcmp.xml, coderet.xml, compseq.xml, cons.xml, cpgplot.xml, cpgreport.xml, cusp.xml, cutseq.xml, dan.xml, degapseq.xml, density.xml, descseq.xml, diffseq.xml, digest.xml, distmat.xml, dotmatcher.xml, dotpath.xml, dottup.xml, dreg.xml, edialign.xml, einverted.xml, emowse.xml, entret.xml, epestfind.xml, eprimer3.xml, equicktandem.xml, est2genome.xml, etandem.xml, extractalign.xml, extractfeat.xml, extractseq.xml, featcopy.xml, featreport.xml, findkm.xml, freak.xml, fuzznuc.xml, fuzzpro.xml, fuzztran.xml, garnier.xml, geecee.xml, getorf.xml, helixturnhelix.xml, hmoment.xml, iep.xml, infoalign.xml, isochore.xml, jaspscan.xml, lindna.xml, listor.xml, makenucseq.xml, makeprotseq.xml, marscan.xml, maskfeat.xml, maskseq.xml, matcher.xml, megamerger.xml, merger.xml, msbar.xml, mwfilter.xml, needle.xml, needleall.xml, newcpgreport.xml, newcpgseek.xml, newseq.xml, notseq.xml, nthseq.xml, octanol.xml, oddcomp.xml, palindrome.xml, pasteseq.xml, patmatdb.xml, patmatmotifs.xml, pepcoil.xml, pepinfo.xml, pepnet.xml, pepstats.xml, pepwheel.xml, pepwindow.xml, pepwindowall.xml, plotcon.xml, plotorf.xml, polydot.xml, preg.xml, prettyplot.xml, prettyseq.xml, primersearch.xml, profit.xml, prophecy.xml, prophet.xml, pscan.xml, recoder.xml, redata.xml, remap.xml, restover.xml, restrict.xml, revseq.xml, seqmatchall.xml, seqret.xml, seqretsetall.xml, seqretsplit.xml, showalign.xml, showfeat.xml, showorf.xml, showpep.xml, showseq.xml, shuffleseq.xml, sigcleave.xml, silent.xml, sirna.xml, sixpack.xml, skipseq.xml, splitter.xml, stretcher.xml, stssearch.xml, supermatcher.xml, syco.xml, tcode.xml, textsearch.xml, tfscan.xml, tmap.xml, tranalign.xml, transeq.xml, trimest.xml, trimseq.xml, twofeat.xml, union.xml, vectorstrip.xml, water.xml, wobble.xml, wordcount.xml, wordfinder.xml, wordmatch.xml, yank.xml: Updated to 6.2.0 version. 2010-03-10 Corinne Maufrais * hmmalign.xml, hmmbuild.xml, hmmconvert.xml, hmmemit.xml, hmmfetch.xml, hmmsearch.xml,hmmscan.xml, hmmsim.xml, hmmstat.xml: docs added. 2010-02-24 Sandrine Larroude * jackhmmer.xml, phmmer.xml: New programs coming with hmmer version 3.0. 2010-02-15 Corinne Maufrais * hmmalign.xml, hmmbuild.xml, hmmconvert.xml, hmmemit.xml, hmmfetch.xml,h mmsearch.xml: Updated to version 3.0. * hmmscan.xml, hmmsim.xml, hmmstat.xml: New xml, from hmmer 3.0. * hmmcalibrate.xml, hmmpfam: Remove from repository. Old hmmer version 2.0. 2010-02-09 Corinne Maufrais * bambe.xml, bl2seq.xml, blast2.xml, blast2mydb.xml, blast2taxonomy.xml, boxshade.xml, cap3.xml, clique.xml, clustalw-multialign.xml, clustalw-profile.xml, clustalw-sequence.xml, codonw.xml, combat.xml, concatfasta.xml, consense.xml, consensus.xml, dca.xml, detect_cnv.xml, dialign.xml, dnadist.xml, dnapars.xml, drawgram.xml, drawtree.xml, dssp.xml, elp.xml, fasta.xml, fastdnaml.xml, fitch.xml, gblocks.xml, genscan.xml, gff2ps.xml, golden.xml, growthpred.xml, html4blast.xml, kitsch.xml, ktreedist.xml, lvb.xml, mfold.xml, mix.xml, mreps.xml, msa.xml, mspcrunch.xml, muscle.xml, mview.xml, neighbor.xml, newicktops.xml, pars.xml, pftools.xml, phiblast.xml, phyml.xml, pima.xml, pratt.xml, predator.xml, primo.xml, protdist.xml, protpars.xml, psiblast.xml, puzzle.xml, quicktree.xml, rbvotree.xml, rnaalifold.xml, rnacofold.xml, rnadistance.xml, rnaduplex.xml, rnaeval.xml, rnafold.xml, rnaheat.xml, rnainverse.xml, rnalfold.xml, rnapdist.xml, rnaplfold.xml, rnasubopt.xml, saps.xml, scan_region.xml, seqgen.xml, sig.xml, smile.xml, squizz_convert.xml, tacg.xml, tipdate.xml, toppred.xml, treealign.xml, trnascan.xml, unroot.xml, weighbor.xml, wise2.xml, wublast2.xml, xpound.xml, xxr.xml: Fix perl code. 2010-02-01 Nicolas Joly * toppred.xml: Fix perl code. * clustalw-*.xml: Reduce diff between perl and python code. 2010-01-29 Nicolas Joly * combat.xml: Use real matrix file names for vlist values (labels remains unchanged), and adjust corresponding code. Do not show gnuplot params file in results list. * pftools.xml: Fix bogus python tests for databanks parameters. While here, tweak perl code a lot. * xxr.xml: Fix strings comparisons in perl code. * newicktops.xml: Fix command line generation for PostScript size parameter. Cleanup. 2010-01-29 Corinne Maufrais * checktrans.xml, cpgplot.xml, cpgreport.xml, dan.xml, diffseq.xml, einverted.xml, marscan.xml, notseq.xml, sirna.xml, sixpack.xml: Remove repeat comment in osformat, sformat ... 2010-01-28 Corinne Maufrais * antigenic.xml, dan.xml, density.xml, diffseq.xml, digest.xml, dreg.xml, equicktandem.xml, etandem.xml, featreport.xml, fuzznuc.xml, fuzzpro.xml, fuzztran.xml, garnier.xml, helixturnhelix.xml, jaspscan.xml, marscan.xml, patmatdb.xml, patmatmotifs.xml, pepcoil.xml, preg.xml, recoder.xml, restrict.xml, sigcleave.xml, silent.xml, sirna.xml, tcode.xml, tmap.xml, twofeat.xml: Modification of report parameter. 2010-01-25 Corinne Maufrais * scan_region.xml: Comment some parameters. Conflic with UCSC databases needed for users. 2010-01-25 Nicolas Joly * smile.xml: Fix alphabet file path in perl code. 2010-01-25 Corinne aufrais * antigenic.xml, dan.xml, density.xml, diffseq.xml, digest.xml, dreg.xml, equicktandem.xml, etandem.xml, featreport.xml, fuzznuc.xml, fuzzpro.xml, fuzztran.xml, garnier.xml, helixturnhelix.xml, jaspscan.xml, marscan.xml, patmatdb.xml, patmatmotifs.xml, pepcoil.xml, preg.xml, recoder.xml, restrict.xml, sigcleave.xml, silent.xml, sirna.xml, tcode.xml, tmap.xml, twofeat.xml: Modification of report parameter. Remove sequence type. 2010-01-25 Nicolas Joly * rnafold.xml: Use mv instead of cp. Do not run RNAfold command if one of the previous steps failed. 2010-01-25 Sandrine Larroude * melting.xml: doclink addition. 2010-01-20 Sandrine Larroude * growthpred.xml: Version 1.07 with changes in example case. 2010-01-20 Corinne Maufrais * detect_cnv.xml: New paragraph. 2010-01-20 Corinne Maufrais * scan_region.xml: New xml for the penncnv analyses 2010-01-19 Nicolas Joly * Env/blast2_env.xml: New sample file which can be used to setup BLASTDB/BLASTMAT environment variables. * blast2.xml: Use previous if available. 2010-01-18 Sandrine Larroude * xxr.xml: Addition for xxr program (Philippe Bouige). 2010-01-13 Corinne Maufrais * detect_cnv.xml: Request of principal user: change category: genetics:population --> genetics:detection, change typecnv name to rawcnv name. 2010-01-13 Corinne Maufrais * bambe.xml, featcopy.xml, featreport.xml, gruppi.xml, mfold.xml, mreps.xml, muscle.xml, toppred.xml: Types harmonization: - Move all ...Output to ...Report and all AstractText to Report. 2010-01-12 Sandrine Larroude * blast2.xml, blast2mydb.xml, blast2taxonomy.xml, html4blast.xml, wublast2.xml, mspcrunch.xml, mview.xml, phiblast.xml, psiblast.xml, boxshade.xml, fasta.xml, dssp.xml: Types harmonization: - "Camelization", - Move all ...Output to ...Report and all AstractText to Report. 2010-01-12 Corinne Maufrais * blast2taxonomy.xml, cap3.xml, codonw.xml, comalign.xml, dca.xml, detect_cnv.xml, dnadist.xml, dnapars.xml, elp.xml, gff2ps.xml, hmmbuild.xml, mfold.xml, mix.xml, msa.xml, pars.xml, pratt.xml, predator.xml, protdist.xml, protpars.xml, psort.xml, rnadistance.xml, rnaeval.xml, rnafold.xml, rnainverse.xml, tacg.xml, trnascan.xml , xpound.xml: Types harmonization: - "Camelization", - Move all ...Output to ...Report and all AstractText to Report. 2010-01-12 Corinne Maufrais * abiview.xml, antigenic.xml, backtranambig.xml, backtranseq.xml, banana.xml, biosed.xml, btwisted.xml, cai.xml, chaos.xml, charge.xml, checktrans.xml, chips.xml, cirdna.xml, codcmp.xml, coderet.xml, compseq.xml, cons.xml, cpgplot.xml, cpgreport.xml, cusp.xml, cutseq.xml, dan.xml, degapseq.xml, density.xml, descseq.xml, diffseq.xml, digest.xml, distmat.xml, dotmatcher.xml, dotpath.xml, dottup.xml, dreg.xml, edialign.xml, einverted.xml, emowse.xml, entret.xml, epestfind.xml, eprimer3.xml, equicktandem.xml, est2genome.xml, etandem.xml, extractalign.xml, extractfeat.xml, extractseq.xml, featcopy.xml, featreport.xml, findkm.xml, freak.xml, fuzznuc.xml, fuzzpro.xml, fuzztran.xml, garnier.xml, geecee.xml, getorf.xml, helixturnhelix.xml, hmoment.xml, iep.xml, infoalign.xml, isochore.xml, jaspscan.xml, lindna.xml, listor.xml, makenucseq.xml, makeprotseq.xml, marscan.xml, maskfeat.xml, maskseq.xml, matcher.xml, megamerger.xml, merger.xml, msbar.xml, mwfilter.xml, needle.xml, newcpgreport.xml, newcpgseek.xml, newseq.xml, notseq.xml, nthseq.xml, octanol.xml, oddcomp.xml, palindrome.xml, pasteseq.xml, patmatdb.xml, patmatmotifs.xml, pepcoil.xml, pepinfo.xml, pepnet.xml, pepstats.xml, pepwheel.xml, pepwindow.xml, pepwindowall.xml, plotcon.xml, plotorf.xml, polydot.xml, preg.xml, prettyplot.xml, prettyseq.xml, primersearch.xml, profit.xml, prophecy.xml, prophet.xml, pscan.xml, recoder.xml, redata.xml, remap.xml, restover.xml, restrict.xml, revseq.xml, seqmatchall.xml, seqret.xml, seqretsetall.xml, seqretsplit.xml, showalign.xml, showfeat.xml, showorf.xml, showpep.xml, showseq.xml, shuffleseq.xml, sigcleave.xml, silent.xml, sirna.xml, sixpack.xml, skipseq.xml, splitter.xml, stretcher.xml, stssearch.xml, supermatcher.xml, syco.xml, tcode.xml, textsearch.xml, tfscan.xml, tmap.xml, tranalign.xml, transeq.xml, trimest.xml, trimseq.xml, twofeat.xml, union.xml, vectorstrip.xml, water.xml, wobble.xml, wordcount.xml, wordfinder.xml, wordmatch.xml, yank.xml: Move all ...Output to ...Report and all AstractText to Report. 2010-01-12 Nicolas Joly * stride.xml: Remove. 2010-01-11 Sandrine Larroude * newicktops.xml: Fix typo in datatype Postscript --> PostScript. 2010-01-11 Corinne Maufrais * detect_cnv.xml: output_split_file parameter added. It is hidden to push *.split* files in zip archive. 2009-12-29 Corinne Maufrais * growthpred.xml: Version 1.04. 2009-12-22 Corinne Maufrais * detect_cnv.xml: New detect_cnv interface. 2009-12-22 Sandrine Larroude * growthpred.xml: New growthpred version 1.03. 2009-12-09 Corinne Maufrais * concatfasta.xml: Improve description. 2009-12-08 Corinne Maufrais * blast2.xml: Remove psitblastn in program list. checkpoint file is required. 2009-12-04 Sandrine Larroude * toppred.xml: Addition of a doclink. 2009-11-30 Sandrine Larroude * phyml.xml: "not" was missing in precond (propinvar2 parameter). 2009-11-17 Sandrine Larroude * unroot.xml: Fix typo in format \\ --> \n (commands parameter). 2009-11-02 Sandrine Larroude * growthpred.xml: Addition for growthpred program developed by Sara Vieira-Silva (Eduardo Rocha's group). 2009-10-26 Corinne Maufrais * clustalw-multialign.xml, clustalw-profile.xml, clustalw-sequence.xml: vdef added for -score parameter. 2009-10-12 Sandrine Larroude * tfscan.xml: doclink changed into a question mark in the output part.Information extracted from free transfac v3.2 documentation. 2009-10-07 Sandrine Larroude * tfscan.xml: doclink updated. 2009-10-07 Bertrand Neron * dnadist.xml: There is a Bug in the 3.67 version of phylip. The lower-triangular value for matrix for option doesn't work correctly. The generated matrix is not machine readable. So we comment this parameter until we upgrade phylip in 3.68 or higher version. 2009-10-01 Bertrand Neron * quicktree.xml: Add a comment and example to distfile parameter. 2009-10-01 Bertrand Neron * tipdate.xml: Change the message of control in codon_rate parameter. 2009-09-17 Corinne Maufrais * blast2taxonomy.xml: Version 1.1 2009-09-14 Bertrand Neron * treealign.xml: Remove fileseq.comment which specify that the sequences must be in NBRF format. 2009-09-03 Corinne Maufrais * mreps.xml, wublast.xml: doclink updated. 2009-09-03 Corinne Maufrais * muscle.xml: New paragraph added to support Profile options. 2009-08-27 Nicolas Joly * Entities/goldendb.xml: WGS databank golden indexes have been discontinued. 2009-08-17 Corinne Maufrais * coderet.xml: coderet was limited to EMBL/GenBank feature tables. acceptedDataFormats limited to EMBL/GenBank and comment added 2009-07-28 Corinne Maufrais * abiview.xml, antigenic.xml, backtranambig.xml, backtranseq.xml, banana.xml, biosed.xml, btwisted.xml, cai.xml, chaos.xml, charge.xml, checktrans.xml, chips.xml, cirdna.xml, codcmp.xml, coderet.xml, compseq.xml, cons.xml, cpgplot.xml, cpgreport.xml, cusp.xml, cutseq.xml, dan.xml, degapseq.xml, density.xml, descseq.xml, diffseq.xml, digest.xml, distmat.xml, dotmatcher.xml, dotpath.xml, dottup.xml, dreg.xml, edialign.xml, einverted.xml, emowse.xml, entret.xml, epestfind.xml, eprimer3.xml, equicktandem.xml, est2genome.xml, etandem.xml, extractalign.xml, extractfeat.xml, extractseq.xml, featcopy.xml, featreport.xml, findkm.xml, freak.xml, fuzznuc.xml, fuzzpro.xml, fuzztran.xml, garnier.xml, geecee.xml, getorf.xml, helixturnhelix.xml, hmoment.xml, iep.xml, infoalign.xml, isochore.xml, jaspscan.xml, lindna.xml, listor.xml, makenucseq.xml, makeprotseq.xml, marscan.xml, maskfeat.xml, maskseq.xml, matcher.xml, megamerger.xml, merger.xml, msbar.xml, mwfilter.xml, needle.xml, newcpgreport.xml, newcpgseek.xml, newseq.xml, notseq.xml, nthseq.xml, octanol.xml, oddcomp.xml, palindrome.xml, pasteseq.xml, patmatdb.xml, patmatmotifs.xml, pepcoil.xml, pepinfo.xml, pepnet.xml, pepstats.xml, pepwheel.xml, pepwindow.xml, pepwindowall.xml, plotcon.xml, plotorf.xml, polydot.xml, preg.xml, prettyplot.xml, prettyseq.xml, primersearch.xml, profit.xml, prophecy.xml, prophet.xml, pscan.xml, recoder.xml, redata.xml, remap.xml, restover.xml, restrict.xml, revseq.xml, seqmatchall.xml, seqret.xml, seqretsetall.xml, seqretsplit.xml, showalign.xml, showfeat.xml, showorf.xml, showpep.xml, showseq.xml, shuffleseq.xml, sigcleave.xml, silent.xml, sirna.xml, sixpack.xml, skipseq.xml, splitter.xml, stretcher.xml, stssearch.xml, supermatcher.xml, syco.xml, tcode.xml, textsearch.xml, tfscan.xml, tmap.xml, tranalign.xml, transeq.xml, trimest.xml, trimseq.xml, twofeat.xml, union.xml, vectorstrip.xml, water.xml, wobble.xml, wordcount.xml, wordfinder.xml, wordmatch.xml, yank.xml: new emboss version 6.1.0. 2009-07-22 Corinne Maufrais * stride.xml: - Remove acceptedDataFormats for parameter 'query' (non-"Sequence"/"Alignment" types). - Move datatype class from "Text" to "String" for parameter 'outfile' (no hidden parameter datatype class can be set to "Text"). 2009-07-15 Nicolas Joly * clustalw-multialign.xml, clustalw-profile.xml, clustalw-sequence.xml: Update ClustalW interfaces to recent 2.0.11 version. 2009-07-07 Bertrand Neron * mix.xml: Rewrite the support of Jumble and Multiple Datatset options. 2009-07-02 Sandrine Larroude * clustalw-multialign.xml, clustalw-profile.xml, clustalw-sequence.xml: Reformat the documentation for clustal format which is moved to the output part. 2009-07-01 Corinne Maufrais * clustalw-multialign.xml, clustalw-profile.xml, clustalw-sequence.xml: Add documentation for clustal format. 2009-06-19 Sandrine Larroude * trnascan.xml: Small change in a help message of "CAA triplet" by "CCA triplet". 2009-06-08 Corinne Maufrais * rbvotree.xml: Change outfile parameter from isout to isstdout. 2009-04-30 Bertrand Neron * elp.xml: Change biotype from DNA to Nucleic. 2009-04-29 Sandrine Larroude * ktreedist.xml: Remove an accent from author name. 2009-04-29 Corinne Maufrais * concatfasta.xml: Addition for concatfasta script. 2009-04-21 Nicolas Joly * muscle.xml: Add `-quiet' option, which prevent progress messages output. * hmmcalibrate.xml: Do not insert options after the HMM file name but rather before. While here remove some unneeded argpos values. * wublast2.xml: Switch gap penality score parameters from Float to Integer to match reality, and add some value range controls. 2009-04-16 Sandrine Larroude * clustalw-profile.xml: Change the outfile part to avoid double output and to have an output when Clustal is selected as output format. 2009-04-10 Sandrine Larroude * ktreedist.xml: Addition for ktreedist program. * gblocks.xml: Addition for gblocks program. 2009-04-09 Herve Menager * elp.xml: bugfix: path is fixed. 2009-04-08 Corinne Maufrais * abiview.xml, antigenic.xml, backtranambig.xml, backtranseq.xml, banana.xml, biosed.xml, btwisted.xml, cai.xml, chaos.xml, charge.xml, checktrans.xml, chips.xml, cirdna.xml, codcmp.xml, coderet.xml, compseq.xml, cons.xml, cpgplot.xml, cpgreport.xml, cusp.xml, cutseq.xml, dan.xml, degapseq.xml, descseq.xml, diffseq.xml, digest.xml, distmat.xml, dotmatcher.xml, dotpath.xml, dottup.xml, dreg.xml, edialign.xml, einverted.xml, emowse.xml, entret.xml, epestfind.xml, eprimer3.xml, equicktandem.xml, est2genome.xml, etandem.xml, extractalign.xml, extractfeat.xml, extractseq.xml, findkm.xml, freak.xml, fuzznuc.xml, fuzzpro.xml, fuzztran.xml, garnier.xml, geecee.xml, getorf.xml, helixturnhelix.xml, hmoment.xml, iep.xml, infoalign.xml, isochore.xml, lindna.xml, listor.xml, makenucseq.xml, makeprotseq.xml, marscan.xml, maskfeat.xml, maskseq.xml, matcher.xml, megamerger.xml, merger.xml, msbar.xml, needle.xml, newcpgreport.xml, newcpgseek.xml, newseq.xml, notseq.xml, nthseq.xml, octanol.xml, oddcomp.xml, palindrome.xml, pasteseq.xml, patmatdb.xml, patmatmotifs.xml, pepcoil.xml, pepinfo.xml, pepnet.xml, pepstats.xml, pepwheel.xml, pepwindow.xml, plotcon.xml, plotorf.xml, polydot.xml, preg.xml, prettyplot.xml, prettyseq.xml, primersearch.xml, profit.xml, prophecy.xml, prophet.xml, pscan.xml, recoder.xml, redata.xml, remap.xml, restover.xml, restrict.xml, revseq.xml, seqmatchall.xml, seqret.xml, seqretsplit.xml, showalign.xml, showfeat.xml, showorf.xml, showseq.xml, shuffleseq.xml, sigcleave.xml, silent.xml, sirna.xml, sixpack.xml, skipseq.xml, splitter.xml, stretcher.xml, stssearch.xml, supermatcher.xml, syco.xml, tcode.xml, textsearch.xml, tfscan.xml, tmap.xml, tranalign.xml, transeq.xml, trimest.xml, trimseq.xml, twofeat.xml, union.xml, vectorstrip.xml, water.xml, wobble.xml, wordcount.xml, wordfinder.xml, wordmatch.xml, yank.xml: Remove IG format from accepted Sequence format. 2009-04-08 Bernard Caudron * pftools.xml: Add `-v' option, to suppress PROSITE format parsing warnings. 2009-04-06 Bertrand Neron * puzzle.xml: Change the code of outfile , outtree and outdist parameters from *.suffix to infile+suffix to avoid useless renaming of usertree parameter. 2009-04-03 Corinne Maufrais * blast2taxonomy.xml: Output bug fix. 2009-04-02 Bertrand Neron * neighbor.xml, pars.xml, protpars.xml, dnapars.xml, protdist.xml, protpars.xml: The number of data sets must be greater than 1. add a ctrl to test that. 2009-03-30 Corinne Maufrais * banana.xml, btwisted.xml, charge.xml, digest.xml, emowse.xml, epestfind.xml, octanol.xml, pepinfo.xml, pepstats.xml, pepwindow.xml, pepwindowall.xml , restover.xml, restrict.xml, tcode.xml, tfscan.xml: Change acd 'datafile' parameter from Choice to Databox. 2009-03-25 Bertrand Neron * msa.xml: Change prompt of seqs parameter and remove example which is not right. 2009-03-25 Bertrand Neron * squizz_convert.xml: Add dataformat elements to infile_seq and infile_aln parameters. 2009-03-25 Sandrine Larroude * mspcrunch.xml: Change of the class of the outfile parameter from String to Report. 2009-02-16 Bertrand Neron * bionj.xml: Add an example in infile parameter. 2009-02-13 Nicolas Joly * muscle.xml: Assorted perl code fix. 2009-02-09 Bertrand Neron * hmmcalibrate.xml: hmmcalibrate modify in place the hmmprofile. To preserve the input data, the hmmprofile is copied before to perform a hmmcalibrate. This operation is specified in new hmm_init parameter (the element command is removed). 2009-02-06 Bertrand Neron * protdist.xml, dnadist.xml: change type of seqboot_out parameter from Alignment to setOfAlignment with superclass AbstractText. * protpars.xml, dnapars.xml, pars.xml: change the format when seqboot is request to preserve the seqboot results as seqboot.outfile add a parameter seqboot_out to recover and show to the user the seqboot results. 2009-01-30 Bertrand Neron * fasta.xml: Fix typo in biotype in query parameter Nuucleic -> Nucleic. * cap3.xml: Fix typo in biotype in contig parameter Nucleiq -> Nucleic. 2009-01-27 Bertrand Neron * protdist.xml: Change ratio parameter type from Integer to Float, add a ctrl the value must be >= 0.5, add a comment. * dnadist.xml: Change ratio parameter type from Integer to Float, add a ctrl the value must be >= 0.0, add a comment. 2009-01-20 Bertrand Neron * protdist.xml: Parameter gamma ismandatory if gamma_dist in ["Y","G"] and invariant parameter must be written in paramfile instead of command line. 2009-01-20 Bertrand Neron * bambe.xml: Add biotype Nucleic to data_file. 2009-01-19 Nicolas Joly * tacg.xml: s/TacgOutput/TacgTextOutput/. * blast2mydb.xml: s/Report/Output/ in classes. * elp.xml: s/ELPReport/ELPOutput/ in class. 2009-01-16 Nicolas Joly * wublast2.xml: Fix small typo in text output file class. * predator.xml: Fix DSSP input file parameter class. 2009-01-12 Bertrand Neron * bionj.xml: Add an example to the "infile" parameter. 2008-12-10 Nicolas Joly * blast2mydb.xml: New definition file, that allow a blast search against a user provided sequence database. 2008-12-04 Nicolas Joly * muscle.xml: Do not use empty code, add an empty string instead. 2008-11-28 Nicolas Joly * fastdnaml.xml: Replace all old/buggy `fastDNAml_boot' calls by a new `fastdnaml' script. And kill now unneeded `concattree'. 2008-11-27 Corinne Maufrais * drawtree.xml: Update category, display:phylogeny --> display:tree. 2008-11-20 Corinne Maufrais * consense.xml, drawgram.xml, drawtree.xml, rbvotree.xml, tipdate.xml, unroot.xml: Change classification: phylogeny:tree is split into --> phylogeny:display and phylogeny:tree_analyser and creation of a new diplay:tree category. 2008-10-12 Herve Menager * mreps.xml, rnainverse.xml: Removed weird character typo that can cause xsl processing crashes. * comalign.xml, dnadist.xml, dnapars.xml, fasta.xml, garnier.xml, hmmbuild.xml, hmmconvert.xml, hmmpfam.xml, mspcrunch.xml, muscle.xml, pars.xml, protdist.xml, protpars.xml, seqgen.xml, smile.xml, tacg.xml, tipdate.xml: Removed a few contractions from the description and comment texts. 2008-10-06 Herve Menager * tacg.xml: Removed weird character typo that can cause xsl processing crashes. 2008-10-30 Herve Menager * fuzznuc.xml, fuzzpro.xml, fuzztran.xml: Class and superclass elements have been switched so that the XML validates. * stride.xml: Authors and reference elements have been switched so that the XML validates. * weighbor.xml: The parameter name 'Verbose output' has been change to 'verbose' and the prompt to 'Verbose output' to eliminate whitespace (not authorized) so that the XML validates and the python code evaluation works. 2008-10-28 Herve Menager * golden.xml: Typo fix 'accesion->accession'. 2008-10-24 Bertrand Neron * muscle.xml: - Change seqtype option value to auto, protein, dna, rna (as described in user guide protein does not exist). - split the parameter to retrieve the results in 3 parameter: muscleReport if outformat is muscle format htmlReport if outformat is html AlignmentOutput if outformat is phylip, msf, fasta, clustalw - remove unicode char in comment - add output format for phylip interleaved and associated comment 2008-10-24 Herve Menager * dreg and preg regexps are now infiles and therefore support regexps. 2008-10-23 Bertrand Neron * rnadistance.xml, rnaeval.xml, rnafold.xml, rnainverse.xml: Homogenize class to RNAStructure. 2008-10-23 Bertrand Neron * fitch.xml: Bugfix in global parameter; replace value of paramfile element by fitch.params instead of kitsch.params. 2008-10-23 Bertrand Neron * kitsch.xml: Bugfix in printdata parameter in python code "1\\n" -> "1\n". 2008-10-22 Bertrand Neron * pars.xml: Bugfix in pars.params generation when there is multiple_dataset reorganize bootstrap comment to be in same order than the bootstrap menu * dnadist.xml, protdist.xml, dnapars.xml, protpars.xml: Add options in bootstrap parameter (and corresponding comment). * dnapars.xml, protpars.xml: Add a control to forbidden that replicates*jumble > 100000. 2008-10-22 Bertrand Neron * protpars.xml, dnapars.xml: - add a parameter seqboot_times2jumble in seqboot paragraph to simulate the implicit jumble option when multiple dataset is on. - add ctrl in jumble and seqboot to avoid collision between this 2 options (the bootstrap option implies multiple dataset in protpars which implies jumble option). - add (multiple dataset) in bootstrap prompt. - add (one dataset) in jumble prompt. * protpars.xml: Add a comment in seqboot. * dnapars.xml: Modify the comment in seqboot reformat the example in infile. 2008-10-17 Nicolas Joly * Env/gruppi_env.xml: New env entity file, mandatory for gruppi interface. * gruppi.xml: Do not use a local entity, but a dummy one instead. 2008-10-16 Herve Menager * fuzznuc.xml, fuzzpro.xml, fuzztran.xml: Support regexp special characters, by specifying the patterns as input files. This has the further advantage of being compatible with EMBOSS capability of handling more than one pattern (bneron, njoly, hmenager). 2008-09-30 Nicolas Joly * *.xml: Rename *Report classes to *Output. * hmmbuild.xml, mfold.xml: Switch from Text to AbstractText in superclass. * pftools.xml, prose.xml, scan_for_matches.xml: Do some consistency renaming for patterns. * rnadistance.xml: Rename StructureAlignment to RnadistanceAlignment. * dssp.xml: Rename DsspOut to DsspOutput. * blast2taxonomy.xml, boxshade.xml, mfold.xml, psiblast.xml, psort.xml, tacg.xml, toppred.xml: Use consistent class naming for HTML output files. 2008-09-30 Bertrand Neron * mreps.xml: Change class of parameter xmlout from XML to MrepsXmlReport. 2008-09-30 Bertrand Neron * rnafold.xml: Change class of parameter outfile from RNAFoldSequence to RnaFoldSequence change class of parameter seq from RNAStruct to RnaStruct. 2008-09-30 Nicolas Joly * comalign.xml, dnapars.xml, fitch.xml, lvb.xml, pars.xml, protpars.xml: Fix typos (Alignement -> Alignment). * xpound.xml: Fix typo, and class renaming for consistency (xxOutfile -> xxOutput). 2008-09-30 Bertrand Neron * pratt.xml: Change class of distfile parameter from DistanceMatrix to PhylipDistanceMatrix. 2008-09-30 Bertrand Neron * blast2.xml, phiblast.xml, psiblast.xml, wublast2.xml: Rename class BlastXMLReport to BlastXmlReport. 2008-09-30 Bertrand Neron * blast2.xml, html4blast.xml, phiblast.xml, wublast2.xml: Rename class BlastHTMLReport to BlastHtmlReport. 2008-09-30 Bertrand Neron * abiview.xml, banana.xml, boxshade.xml, chaos.xml, charge.xml, cirdna.xml, combat.xml, cpgplot.xml, dan.xml, dotmatcher.xml, dotpath.xml, dottup.xml, drawgram.xml, drawtree.xml, epestfind.xml, findkm.xml, freak.xml, genscan.xml, gff2ps.xml, hmoment.xml, iep.xml, isochore.xml, lindna.xml, mfold.xml, octanol.xml, pepinfo.xml, pepnet.xml, pepwheel.xml, pepwindowall.xml, pepwindow.xml, plotcon.xml, plotorf.xml, polydot.xml, prettyplot.xml, rnaalifold.xml, rnacofold.xml, rnadistance.xml, rnaduplex.xml, rnafold.xml, rnalfold.xml, rnapdist.xml, rnaplfold.xml, rnasubopt.xml, syco.xml, tacg.xml, tcode.xml, tmap.xml, wobble.xml, xpound.xml: Rename class Postscript to PostScript. 2008-09-30 Nicolas Joly * hmm*.xml, wise2.xml: Rename HmmProfile to HmmTextProfile. * hmmbuild.xml, hmmconvert.xml: Rename HmmProfileBin to HmmBinProfile. * hmmbuild.xml: Consistently use `Hmm' (lowercase) in classes. 2008-09-30 Bertrand Neron * seqgen.xml: Bugfix in rateAll format each rate are float thus the string format is not %d but %f. 2008-09-30 Bertrand Neron * seqgen.xml: Fix bug in comment/text element of parameter rate1 and paragraph rate: add lang attribute. 2008-09-30 Bertrand Neron *seqgen.xml: Fix bug in format of rateAll: - in python rate1 rate2 rate3 are variables not string - in perl invalid syntax - add a comment in paragraph rate to advertise that this option is useful only with nucleotides models of substitution. 2008-09-29 Bertrand Neron * fasta.xml: Change outfile parameter class from FastaReport to FastaTextReport, html_outfile parameter class from Html to FastaHtmlReport. * mview.xml: Change fasta parameter class from FastaReport to FastaTextReport. 2008-09-29 Bertrand Neron * clique.xml, mix.xml, pars.xml: Change DiscreteCharMatrix to PhylipDiscreteCharMatrix. 2008-09-29 Bertrand Neron * bionj.xml, dnadist.xml, fitch.xml, kitsch.xml, neighbor.xml, protdist.xml, quicktree.xml, weighbor.xml: Change class DistanceMatrix to PhylipDistanceMatrix. 2008-09-29 Bertrand Neron * rnadistance.xml: Change the class of psfiles parameter from AbstractText to Binary. 2008-09-22 Nicolas Joly * mview.xml: Kill `in_p' parameter, the input file format is now set in each input parameter. And add `force=1' to the alignment input, to be able to handle CLUSTAL format with sequence numbers. 2008-09-22 Bertrand Neron * tipdate.xml: param change_confidence: add a precond, the control must be evaluated only if change_confidence is True. 2008-09-12 Nicolas Joly * golden.xml: Remove command path attribute. * golden.xml, gruppi.xml: Remove XML stylesheet. 2008-09-11 Nicolas Joly * boxshade.xml: Kill the `env' entity reference which does not exists. * *.xml: Consistently use SYSTEM identifier for DOCTYPE declaration. * tipdate.xml: Remove unneeded memory parameter (which do only pollute program output). 2008-09-10 Nicolas Joly * dialign.xml: Make it validate, by moving a precond a correct position. 2008-09-09 Herve Menager * extractfeat.xml: Change two parameter names and their evaluation line, because there seems to be a problem with the evaluation when a parameter name is "value". so 'tag' becomes 'tag_name' and 'value' becomes 'tag_value'. 2008-09-05 Bertrand Neron * tipdate.xml: - param change_rate_estim change vdef from +w to off add ctrl if w+ (tip_date or tip_date_specified). - param change_confidence add ctrl -iw option can only be used with Variable Rate Tip Date Model (+w). 2008-08-29 Bertrand Neron * boxshade.xml: boxshade doesn't support clustal v2.0 format; force the conversion in clustal 1.8 add force="1" attribute to acceptedDataFormats element of alignment parameter. 2008-08-21 Bertrand Neron * dnapars.xml protpars.xml pars.xml neighbor.xml: Add lang attribute to text element in comment element in outgroup parameter. 2008-08-21 Bertrand Neron * dnapars.xml protpars.xml pars.xml neighbor.xml: Add comment in outgroup option. 2008-08-18 Bertrand Neron * codonw.xml: Typo in doclink tutorial url. 2008-08-05 Bertrand Neron * abiview.xml: Modify the typing of "infile" parameter from Binary to AbiTraceFile Binary. 2008-08-04 Bertrand Neron * codonw: Add 2 externals doclinks, http://codonw.sourceforge.net/Readme.html http://codonw.sourceforge.net/Tutorial.html 2008-07-31 Bertrand Neron * for all phylip xml: Update the version tag from 3.65 to 3.67. * dnadist.xml: Change matrix_form parameter to Choice to integrate new option : "human_redable" appeared with Phylip v3.67. 2008-07-04 Bertrand Neron * muscle.xml: Add a doclink. 2008-07-04 Bertrand Neron * muscle.xml: Add the xml of muscle in repository. 2008-06-25 Bertrand Neron * dialign.xml: Add a parameter fasta_alignment to recover fasta alignment in right type. 2008-06-23 Nicolas Joly * xpound.xml: Add missing input file name for xpscript command. 2008-06-11 Bertrand Neron * hmmbuild.xml, hmmcalibrate.xml, hmmconvert.xml, hmmemit.xml: Small typo in category, s/builting/building/. 2008-06-10 Nicolas Joly * hmmfetch.xml: Small typo, s/pfamprag/pfamfrag/. 2008-06-06 Nicolas Joly * Version 1.0 released. Programs-5.1.1/newicktops.xml0000644000175000001560000001422411441651470015070 0ustar bneronsis newicktops 1.0 newicktops A phylogenetic tree drawing program for biologists Manolo Gouy newicktops is a tree drawing program able to draw any binary tree expressed in the standard phylogenetic tree format (e.g., the format used by the PHYLIP package). NJplot allows display of bootstrap scores and printing in the PostScript format. http://pbil.univ-lyon1.fr/software/njplot.html ftp://pbil.univ-lyon1.fr/pub/mol_phylogeny/njplot/ phylogeny:display display:tree command String "newicktops" "newicktops" 0 tree Tree file (S3:0.5,(S1:0.9,(S5:0.5,S4:0.5)75:0.41)97:0.41,S2:0.5); Tree " $value" " " + str(value) 10 paper_format paper format Choice a4 a4 us ($value eq $vdef) ? " -us" : "" ( "" , " -us" )[ value != vdef ] 1 font_size Font size used for taxon names Integer (defined $value) ? " -size $value" : "" ( "" , " -size " + str( value ) )[ value is not None ] 1 lengths Show branch lengths if they appear in tree file Boolean 0 ($value) ? " -lengths" : "" ( "" , " -lengths" )[ value ] 2 boots Show bootstrap values if they appear in tree file Boolean 0 ($value) ? " -boot" : "" ( "" , " -boot" )[ value ] 3 no_title Don't include title in postscript output Boolean 0 ($value) ? " -notitle" : "" ( "" , " -notitle" )[ value ] 4 paper_size the size of page for postscript width width Integer height height Integer ps_size String defined($width) and defined($height) width != None and height != None " -pssize $widthx$height" " -pssize %dx%d" %( width , height ) width height tree_out Tree draw PostScript Binary "*.ps" "*.ps" Programs-5.1.1/tipdate.xml0000644000175000001560000007346411767572177014370 0ustar bneronsis tipdate 1.2 TipDate Analysis of trees with dated tips Andrew Rambaut Andrew Rambaut, 2000. Estimating the rate of molecular evolution: Incorporating non-contemporaneous sequences into maximum likelihood phylogenies. Bioinformatics. http://bioweb2.pasteur.fr/docs/tipdate/TipDate.v1.2.Manual.pdf phylogeny:tree_analyser tipdate alignment Alignment file TipdateAlignment AbstractText " <$value" " <" + str(value) 10 TipDate requires a sequence alignment together with a description of the quartets and dates to be used. The idea is that the user may supply an alignment of a large number of species and then define multiple quartets to be constructed from them. Note that the sequences names have a date at the end (in this case in years). TipDate looks for any number at the end of the names and assumes these are dates. They can be decimal (i.e. 98.2) is 98 and 1/5 (if you want months specify the whole date in months). The units are arbitrary because the rates and dates estimated by the program will be specified in the same units the program doesn't need to know what they are. 6 24 StrainA98 CAGCTCTGCCTCCTGAAGCCCCTA StrainB96 CAGCTCTGTCTCCTGCAGCCCCTA StrainC97 CGGCTCTGCCTCCTGCAGCCCCTG StrainD97 CAGCTCTGCCTCCTGCAGCCCCTG StrainE64 CAGCTCTGCCTCCTGCAGCCCTTA StrainF77 CAGCTCTGCCTCCTGCAGCCCCTA 1 (((StrainA98:1,StrainB96:1):2,(StrainC97:1,StrainD97:1):2):3,(StrainE64:2,StrainF77:2):4); control_options Control options 2 model MODEL (-m) Choice F84 F84 HKY REV (defined $value and $value ne $vdef)? " -m$value" : "" ( "" , " -m" + str(value))[ value is not None and value != vdef] Model This option sets the model of nucleotide substitution with a choice of either F84, HKY (also known as HKY85) or REV (markov general reversable model). The first two models are quite similar but not identical. They both require a transition transversion ratio and relative base frequencies as parameters. Other models such as K2P, F81 and JC69 are special cases of HKY and can be obtained by setting the nucleotide frequencies equal (for K2P) or the transition transversion ratio to 1.0 (for F81) or both (for JC69). constant_rate Molecular clock model (-k) Boolean 0 ($value) ? " -k" : "" ( "" , " -k" )[ value ] This option specifies the Molecular Clock model (Single Rate, SR model). This model is the equivalent of the DNAMLK program in PHYLIP or specifying the molecular clock option in PAUP*. The default (i.e., not specifying k) gives the Non-Clock model (Different Rate, DR model). change_rate_estim Variable Rate Tip Date Model (+w) Choice null null +w +w+ +w- -w ($value ne $vdef and $value ne "-w") ? " $value" : "" ( "" , " " + str(value) )[ value !=vdef and value != "-w"] Variable rate options (+w) can only be used with the dated tip models (+s/-s) $tip_date or $tip_date_specified ( tip_date or tip_date_specified ) This option specifies the Variable Rate Dated Tip (VRDT) model. This model assumes that the rate of substitution changes linearly as we go back through time. The rate of change of rate is given as a proportion of the rate of substitution at the present. This rate can be positive or negative but not all data sets will have the power to estimate this parameter. Using this option will estimate the rate of evolution as a maximum likelihood parameter. You can also constrain the estimation of this parameter to be only positive or only negative user_rate_value Variable Rate Tip Date Model with user value (-w) Float $change_rate_estim eq "-w" change_rate_estim == "-w" " -w $value" " -w " + str(value) Variable rate options (-w) can only be used with the dated tip models (+s/-s) $tip_date or $tip_date_specified tip_date or tip_date_specified Where value is an real number that gives the rate of change of rate as proportion of rate at present per unit time. change_confidence Estimate confidence intervals ( require +w option) (-iw) Boolean defined $change_confidence change_confidence 0 ($value)? " -iw" : "" ( "" , " -iw" )[ value ] -iw option can only be used with Variable Rate Tip Date Model (+w) (defined $change_rate_estim and $change_rate_estim ne "-w") change_rate_estim is not None and change_rate_estim != "-w" Estimate confidence intervals for the rate of change of rate parameter in the Variable Rate Dated Tip model (requires +w option). tip_date Tip Date Model with estimation of the rate of evolution (+s) Boolean 1 ($value) ? " +s" : "" ( "" , " +s" )[ value ] This option specifies the Single Rate Date Tips (SRDT) model. The default value is the Non-Clock model. The input tree and sequences must have names that end with dates. Using this option will estimate the rate of evolution as a maximum likelihood parameter. substitution_confidence Estimate confidence intervals for the absolute rate of substitution (-is) Boolean $tip_date tip_date 0 ($value)? " -is" : "" ( "" , " -is" )[ value ] This option specifies which parameters should have confidence intervals estimated. The default is not to estimate confidence intervals. These options can be used in combination. date_confidence Estimate confidence intervals for date of root of tree (-id) Boolean $tip_date tip_date 0 ($value)? " -id" : "" ( "" , " -id" )[ value ] limit Limit to use estimating confidence intervals (-l) Float 1.92 (defined $value and $value != $vdef)? " -l $value" : "" ( "" , " -l " + str(value) )[ value is not None and value != vdef] Value is real number >=0 $value >= 0 value >= 0 This option specifies the value to use to obtain the confidence intervals around the estimate of rate of molecular evolution (and corresponding date of root of the tree). Where value is a real number >= 0 that specifies the log likelihood ratio that gives the confidence interval. The default is 1.92 which corresponds to half c2 with 1 degree of freedom. A value of 0 will disable the calculation of confidence intervals. tip_date_specified Tip date Model without estimation of the rate of evolution: user value (-s) Float not $tip_date not tip_date (defined $value) ? " -s $value" : "" ( "" , " -s " + str(value) )[ value is not None] Where the value is an real number that gives the rate of molecular evolution in substitutions per site per unit time (whatever the units of time that are represented by the input tip dates). Variable Rate Tip Date Model This option specifies the Variable Rate Dated Tip (VRDT) model. This model assumes that the rate of substitution changes linearly as we go back through time. The rate of change of rate is given as a proportion of the rate of substitution at the present. This rate can be positive or negative but not all data sets will have the power to estimate this parameter. This model must be used in conjunction with the +s or -s options, above. root_value_estimate Estimate root tree (+r) Boolean 0 ($value) ? " +r" : "" ( "" , " +r" )[ value ] Alternatively TipDate can find the maximum likelihood position of the root. This tries all possible positions (2n-3) so increases the duration of analysis. root_value Specify rooting of tree To perform the molecular clock and tip date models (-r) Integer not $root_value_estimate not root_value_estimate 1 (defined $value and $value!=$vdef) ? " -r $value" : "" ( "" , " -r " + str(value) )[ value is not None and value !=vdef] The input tree must be rooted. This option is used to specify an outgroup sequence to root the tree with (sorry this is not very sophisticated: if you need to use more than one outgroup, root the tree before hand). Where value is an integer number which refers to the sequence that will be used to root the tree (starting at 1). codon_categories CODON CATEGORIES = 112, 123, 120, etc. [default: homogeneity] (-p) String (defined $value)? " -p $value" : "" ( "" , " -p " + str(value) )[ value is not None ] codon_rate CODON-specific Rate Heterogeneity: #1 #2 #3 separated by commas (-c) String (defined $value)? " -c$value" : "" ( "" , " -c" + str(value) )[ value is not None ] Rates are specified by value, which are three decimal numbers, separated by commas. $value ~= /\d+(,\d+){2}/ len(value.split(',')) == 3 Using this option the user may specify the relative rates for each codon position. This allows codon-specific rate heterogeneity to be modelled. The default is no site-specific rate heterogeneity. seperate Estimate seperate models for each site category (-e) Boolean 0 ($value)? " -e" : "" ( "" , " -e" )[ value ] gamma Discrete Gamma Rate Heterogeneity (2 to 32) (-g) Integer (defined $value)? " -g $value" : "" ( "" , " -g " + str(value) )[ value is not None ] Enter an integer between 2 and 32 $value <= 32 and $value >= 2 value <= 32 and value >= 2 Using this option the user may specify the number of categories for the discrete gamma rate heterogeneity model. Enter an integer between 2 and 32 that specifies the number of categories to use with the discrete gamma rate heterogeneity model. alpha Gamma Rate Heterogeneity (-a) Float defined $gamma gamma is not None (defined $value)? " -a $value" : "" ( "" , " -a " + str(value) )[value is not None] Enter an real number greater than 0 $value > 0 value > 0 Using this option the user may specify a shape for the gamma rate heterogeneity called alpha. The default is no site-specific rate heterogeneity. Where value is a real number >0 that specifies the shape of the gamma distribution to use with gamma rate heterogeneity. datasets Number of Datasets (-n) Integer 1 (defined $value and $value != $vdef)? " -n $value" : "" ( "" , " -n " + str(value) )[ value is not None and value != vdef] user_branch User branch-lengths [default = estimate] (-ul) Boolean 0 ($value) ? " -ul" : "" ("", " -ul" ) [value] equal_freq Equal frequencies of nucleotide (-f=) Boolean 0 ($value)? " -f=" : "" ( "" , " -f=" )[ value ] freq_bases Relative nucleotide frequencies: #A #C #G #T separated by commas (-f) String not $equal_freq not equal_freq (defined $value)? " -f$value" : "" ( "" , " -f" + str(value) )[ value is not None] You must enter four decimal numbers separated by commas $value ~= /\d+(,\d+){3}/ len(value.split(',')) == 4 This option is used to specify the relative frequencies of the four nucleotides. By default, TipDate will estimate them empirically from the data. If the given values don't sum to 1.0 then they will be scaled so that they do. Value is four decimal numbers for the frequencies of A, C, G and T respectively, separated by commas. transition_transversion Transition/transversion ratio for F84/HKY model (-t) Float $model eq "F84" or $model eq "HKY" model == "F84" or model == "HKY" 2.0 (defined $value and $value != $vdef)? " -t$value" : "" ( "" , " -t" + str(value) )[ value is not None and value != vdef] Value is a decimal number greater than or equal to zero $value >= 0 value >= 0 This option allows the user to set a value for the transition transversion ratio (TS/TV). This is only valid when either the HKY or F84 model has been selected. Value is a decimal number greater than zero. rate_matrix Rate matrix values for REV model: 6 values separated by commas (-t) String $model eq "REV" model == "REV" (defined $value)? " -t$value" : "" ( "" , " -t" + str(value) )[ value is not None] Value are size decimal numbers separated by commas $value ~= /\d+(,\d+){5}/ len(value.split(',')) == 6 Where value are size decimal numbers for the instantaneous rates of change from A to C, A to G, A to T, C to G, C to T and G to T respectively, separated by commas. The matrix is symmetrical so the reverse changes occur at the same instantaneous rate as forward changes (e.g. C to A equals A to C) and therefore only six values need be set. These values will be scaled such that the last value (G to T) is 1.0 and the others are set relative to this. output_options Output options 3 branch Print final branch lengths (-vb) Boolean 0 ($value)? " -vb" : "" ( "" , " -vb" )[ value ] likelihoods Write likelihoods for each Site (-vs) Boolean 0 ($value)? " -vs" : "" ( "" , " -vs" )[ value ] notrees Don't Write trees (-vw) Boolean 0 ($value)? " -vw" : "" ( "" , " -vw" )[ value ] progress Show indication of progress (-vp) Boolean 0 ($value)? " -vp":"" ( "" , " -vp" )[ value ] outtree Tree file Tree NEWICK "tipdate.out" "tipdate.out" Programs-5.1.1/makeprotseq.xml0000644000175000001560000002157212072525233015237 0ustar bneronsis makeprotseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net makeprotseq Create random protein sequences http://bioweb2.pasteur.fr/docs/EMBOSS/makeprotseq.html http://emboss.sourceforge.net/docs/themes sequence:edit makeprotseq e_input Input section e_pepstatsfile Pepstats program output file (optional) PepstatsReport Report ("", " -pepstatsfile=" + str(value))[value is not None] 1 This file should be a pepstats output file. Protein sequences will be created with the composition in the pepstats output file. e_required Required section e_amount Number of sequences created (value greater than or equal to 1) Integer 100 ("", " -amount=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 2 e_length Length of each sequence (value greater than or equal to 1) Integer 100 ("", " -length=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 3 e_useinsert Do you want to make an insert Boolean 0 ("", " -useinsert")[ bool(value) ] 4 e_insert Inserted string String e_useinsert ("", " -insert=" + str(value))[value is not None] 5 String that is inserted into sequence e_start Start point of inserted sequence (value greater than or equal to 1) Integer e_useinsert 1 ("", " -start=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 6 e_output Output section e_outseq Name of the output sequence file (e_outseq) Protein Filename makeprotseq.e_outseq ("" , " -outseq=" + str(value))[value is not None] 7 e_osformat_outseq Choose the sequence output format Protein Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 8 e_outseq_out outseq_out option Protein Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 9 Programs-5.1.1/pepstats.xml0000644000175000001560000001411512072525233014542 0ustar bneronsis pepstats EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net pepstats Calculates statistics of protein properties http://bioweb2.pasteur.fr/docs/EMBOSS/pepstats.html http://emboss.sourceforge.net/docs/themes sequence:protein:composition pepstats e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_aadata Amino acids properties and molecular weight data file Protein AminoAcidProperties AbstractText ("", " -aadata=" + str(value))[value is not None ] 2 Amino acid properties e_mwdata Molecular weights data file MolecularWeights AbstractText ("", " -mwdata=" + str(value))[value is not None ] 3 Molecular weight data for amino acids e_advanced Advanced section e_termini Include charge at n and c terminus Boolean 1 (" -notermini", "")[ bool(value) ] 4 e_mono Use monoisotopic weights Boolean 0 ("", " -mono")[ bool(value) ] 5 e_output Output section e_outfile Name of the output file (e_outfile) Filename pepstats.e_outfile ("" , " -outfile=" + str(value))[value is not None] 6 e_outfile_out outfile_out option PepstatsReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/revseq.xml0000644000175000001560000001530712072525233014210 0ustar bneronsis revseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net revseq Reverse and complement a nucleotide sequence http://bioweb2.pasteur.fr/docs/EMBOSS/revseq.html http://emboss.sourceforge.net/docs/themes sequence:edit revseq e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_advanced Advanced section e_reverse Reverse sequence Boolean 1 (" -noreverse", "")[ bool(value) ] 2 Set this to be false if you do not wish to reverse the output sequence e_complement Complement sequence Boolean 1 (" -nocomplement", "")[ bool(value) ] 3 Set this to be false if you do not wish to complement the output sequence e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename outseq.rev ("" , " -outseq=" + str(value))[value is not None] 4 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 5 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 6 Programs-5.1.1/backtranambig.xml0000644000175000001560000002101412072525233015460 0ustar bneronsis backtranambig EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net backtranambig Back-translate a protein sequence to ambiguous nucleotide sequence http://bioweb2.pasteur.fr/docs/EMBOSS/backtranambig.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:translation sequence:protein:composition backtranambig e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_additional Additional section e_table Genetic codes Choice 0 0 1 2 3 4 5 6 9 10 11 12 13 14 15 16 21 22 23 ("", " -table=" + str(value))[value is not None and value!=vdef] 2 e_output Output section e_outfile Name of the output file (e_outfile) DNA Filename backtranambig.e_outfile ("" , " -outfile=" + str(value))[value is not None] 3 e_osformat_outfile Choose the sequence output format DNA Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 4 e_outfile_out outfile_out option DNA Sequence e_outfile auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/extractalign.xml0000644000175000001560000001507212072525233015367 0ustar bneronsis extractalign EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net extractalign Extract regions from a sequence alignment http://bioweb2.pasteur.fr/docs/EMBOSS/extractalign.html http://emboss.sourceforge.net/docs/themes sequence:edit extractalign e_input Input section e_sequence sequence option Alignment FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_regions Regions to extract (eg: 4-57,78-94) String ("", " -regions=" + str(value))[value is not None] 2 Regions to extract. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename extractalign.e_outseq ("" , " -outseq=" + str(value))[value is not None] 3 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 4 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/clique.xml0000644000175000001560000004630111442102523014154 0ustar bneronsis clique clique Compatibility Program http://bioweb2.pasteur.fr/docs/phylip/doc/clique.html This program uses the compatibility method for unrooted two-state characters to obtain the largest cliques of characters and the trees which they suggest. phylogeny:others clique String "clique < clique.params" "clique < clique.params" 0 infile Input File PhylipDiscreteCharMatrix AbstractText $infile ne "infile" infile != "infile" "ln -s $infile infile && " "ln -s "+ str( infile ) + " infile && " -10 5 6 Alpha 110110 Beta 110000 Gamma 100110 Delta 001001 Epsilon 001110 clique_opt Clique options use_ancestral_state Use ancestral states in input file (A) Boolean 0 ($value) ? "A\\n" : "" ( "" , "A\n" )[ value ] 1 There should also be, in the input file after the numbers of species and characters, an A on the first line of the file. There must also be, before the character data, a line or lines giving the ancestral states for each character. It will look like the data for a species (the ancestor). It must start with the letter A in the first column. There then follow enough characters or blanks to complete the full length of a species name (e. g. ANCESTOR). Then the states which are ancestral for the individual characters follow. These may be 0, 1 or ?, the latter indicating that the ancestral state is unknown. Examples: ANCESTOR 0010011 clique.params spec_min_clique_size Specify minimum clique size? (C) Boolean 0 ($value) ? "C\\n$min_clique_size\\n" : "" ( "" , "C\n"+ str( min_clique_size ) + "\n")[ value ] 3 clique.params min_clique_size Minimum clique size Integer $spec_min_clique_size spec_min_clique_size "" "" You must enter a numeric value, greater than or equal to 0 $min_clique_size >= 0 value >= 0 2 This option indicates that you wish to specify a minimum clique size and print out all cliques (and their associated trees) greater than or equal to than that size. clique.params bootstrap Bootstrap options multiple_dataset Analyze multiple data sets (M) Boolean not $multiple_dataweights not multiple_dataweights 0 ($value) ? "M\\nD\\n$datasets_number\\n" : "" ( "" , "M\nD\n"+ str( datasets_number ) +"\n" )[ value ] 10 clique.params datasets_number How many data sets (D) Integer $multiple_dataset multiple_dataset "" "" Enter a value > 0 and <= 1000 $value > 0 and $value <= 1000 ( value > 0 ) and ( value <= 1000 ) 9 consense Compute a consensus tree Boolean $multiple_dataset and $print_treefile multiple_dataset and print_treefile 0 ($value )? " && cp infile clique.infile && cp clique.outtree intree && consense <consense.params && mv outtree consense.outtree && mv outfile consense.outfile" : "" ( "", " && cp infile clique.infile && cp clique.outtree intree && consense <consense.params && mv outtree consense.outtree && mv outfile consense.outfile" )[ value ] 10 weigths Weigth options site_weigthed Sites weighted? (W) Boolean 0 ( $value ) ? "W\\n" : "" ( "" , "W\n" )[ value ] 19 clique.params multiple_dataweights Analyze multiple data Weigths (M) Boolean not $multiple_dataset not multiple_dataset 0 ($value) ? "M\\nW\\n$dataweights_number\\n" : "" ( "" , "M\nW\n"+ str( dataweights_number ) +"\n" )[ value ] 20 clique.params dataweights_number How many sets of weights? Integer $multiple_dataweights multiple_dataweights "" "" Enter a value > 0 and <= 1000 $value > 0 and $value <= 1000 ( value > 0 ) and ( value <= 1000 ) 21 weigth_file Weight file PhylipWeight AbstractText $site_weigthed or $multiple_dataweights site_weigthed or multiple_dataweights " ln -s $value weights && " " ln -s " + str( value ) +" weights && " -20 output Output options print_matrix Print out compatibility matrix (3) Boolean 0 ($value) ? "3\\n" : "" ( "" , "3\n" )[ value ] 1 clique.params print_tree Print out tree (4) Boolean 1 ($value) ? "" : "4\\n" ( "4\n" , "" )[ value ] 1 Tells the program to print a semi-graphical picture of the tree in the outfile. clique.params print_treefile Write out trees onto tree file (5) Boolean 1 ($value) ? "" : "5\\n" ( "5\n" , "" )[ value ] 1 Tells the program to save the tree in a treefile (a standard representation of trees where the tree is specified by a nested pairs of parentheses, enclosing names and separated by commas). clique.params printdata Print out the data at start of run (1) Boolean 0 ($value) ? "1\\n" : "" ( "" , "1\n" )[ value ] 1 clique.params other_options Other options outgroup Outgroup root (O) Integer 1 (defined $value and $value != $vdef) ? "O\\n$value\\n" : "" ("" , "O\\n%s\\n" % str(value) )[ value is not None and value != vdef ] Please enter a value greater than 0 $value > 0 value > 0 1 clique.params outfile Output file Text " && mv outfile clique.outfile" " && mv outfile clique.outfile" "clique.outfile" "clique.outfile" treefile Output tree Tree NEWICK $print_treefile print_treefile " && mv outtree clique.outtree" " && mv outtree clique.outtree" "clique.outtree" "clique.outtree" confirm String "Y\\n" "Y\n" 1000 clique.params terminal_type String "0\\n" "0\n" -1 clique.params consense_confirm String $consense consense "Y\\n" "Y\n" 1000 consense.params consense_terminal_type String $consense consense "T\\n" "T\n" -2 consense.params consense_outfile Consense output file Text $consense consense "consense.outfile" "consense.outfile" consense_treefile Consense output tree Tree NEWICK $consense consense "consense.outtree" "consense.outtree" Programs-5.1.1/distmat.xml0000644000175000001560000002436111672346320014353 0ustar bneronsis distmat EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net distmat Create a distance matrix from a multiple sequence alignment http://bioweb2.pasteur.fr/docs/EMBOSS/distmat.html http://emboss.sourceforge.net/docs/themes phylogeny:distance distmat e_input Input section e_sequence sequence option Alignment FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH 1,n ("", " -sequence=" + str(value))[value is not None] 1 File containing a sequence alignment. e_required Required section e_nucmethod Multiple substitution correction methods for nucleotides Choice 0 0 1 2 3 4 5 ("", " -nucmethod=" + str(value))[value is not None and value!=vdef] 2 Multiple substitution correction methods for nucleotides. e_protmethod Multiple substitution correction methods for proteins Choice 0 0 1 2 ("", " -protmethod=" + str(value))[value is not None and value!=vdef] 3 Multiple substitution correction methods for proteins. e_additional Additional section e_ambiguous Use the ambiguous codes in the calculation. Boolean 0 ("", " -ambiguous")[ bool(value) ] 4 Option to use the ambiguous codes in the calculation of the Jukes-Cantor method or if the sequences are proteins. e_gapweight Weight given to gaps Float 0. ("", " -gapweight=" + str(value))[value is not None and value!=vdef] 5 Option to weight gaps in the uncorrected (nucleotide) and Jukes-Cantor distance methods. e_position Base position to analyse Integer 123 ("", " -position=" + str(value))[value is not None and value!=vdef] 6 Choose base positions to analyse in each codon i.e. 123 (all bases), 12 (the first two bases), 1, 2, or 3 individual bases. e_calculatea Calculate the nucleotide jin-nei parameter 'a' Boolean 0 ("", " -calculatea")[ bool(value) ] 7 This will force the calculation of parameter 'a' in the Jin-Nei Gamma distance calculation, otherwise the default is 1.0 (see -parametera option). e_parametera Nucleotide jin-nei parameter 'a' Float 1.0 ("", " -parametera=" + str(value))[value is not None and value!=vdef] 8 User defined parameter 'a' to be use in the Jin-Nei Gamma distance calculation. The suggested value to be used is 1.0 (Jin et al.) and this is the default. e_output Output section e_outfile Name of the output file (e_outfile) Filename distmat.e_outfile ("" , " -outfile=" + str(value))[value is not None] 9 e_outfile_out outfile_out option EmbossDistanceMatrix AbstractText e_outfile auto Turn off any prompting String " -auto -stdout" 10 Programs-5.1.1/seqmatchall.xml0000644000175000001560000001736612072525233015210 0ustar bneronsis seqmatchall EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net seqmatchall All-against-all word comparison of a sequence set http://bioweb2.pasteur.fr/docs/EMBOSS/seqmatchall.html http://emboss.sourceforge.net/docs/themes alignment:pairwise:local seqmatchall e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 2,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_wordsize Word size (value greater than or equal to 2) Integer 4 ("", " -wordsize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 2 is required value >= 2 2 e_output Output section e_outfile Name of the output alignment file Filename seqmatchall.align ("" , " -outfile=" + str(value))[value is not None] 3 e_aformat_outfile Choose the alignment output format Choice MATCH FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH ("", " -aformat=" + str(value))[value is not None and value!=vdef] 4 e_outfile_out outfile_out option Alignment e_aformat_outfile in ['FASTA', 'MSF'] e_outfile e_outfile_out2 outfile_out2 option Text e_aformat_outfile in ['PAIR', 'MARKX0', 'MARKX1', 'MARKX2', 'MARKX3', 'MARKX10', 'SRS', 'SRSPAIR', 'SCORE', 'UNKNOWN', 'MULTIPLE', 'SIMPLE', 'MATCH'] e_outfile auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/cif.xml0000644000175000001560000003763211767572177013474 0ustar bneronsis cif 0.2.2 CIF Cut DNA regions in frame E. Quevillon, B. Boeda http://bioweb2.pasteur.fr/docs/cif/cif.html http://bioweb2.pasteur.fr/docs/cif/compatible_cohesive_ends.txt http://bioweb2.pasteur.fr/docs/cif/paillasse_liste.txt cif (for Cut In Frame) is a tool that works with DNA sequences. It is used to digest your sequences with a pool of restriction enzymes and to search which enzymes cut your sequence keeping your reading frame after ligation without any frame shift produced due to the digestion. This helps users to work with a gene of interest to localize potential region(s) that could be removed from the final protein to check if regions have an impact or not on the final gene product. It can also allow to identify vital region(s) for the gene. sequence:enzyme:analysis sequence:nucleic:restriction display:nucleic:restriction cif input Input section sequence Sequence DNA Sequence FASTA 1,n " -i $value " ( "" , " -i " + str( value ) + " " )[value is not None] 1 options Options 2 enztype Type of enzymes (-T) Choice 0 0 blunt cohesive klenow ($value) ?"-T $value ": "" ( "" , " -T " + str( value ) )[value != vdef] You can choose between: - blunt : Use blunt cutters - cohesive : Use cohesive cutters - klenow : Use only 5' enzymes for Klenow fill-in. [Default all three] strand Cohesive enzyme strand (-S) Choice 0 0 5 3 ($value) ?"-S $value ": "" ( "" , " -S " + str( value ) )[value != vdef] Cohesive enzyme strand to use: - 5' - 3' - both : 5' and 3' (Default value). (No effect if you choose to use blunt enzymes) digestionmod Digestion mode (-D) Choice 0 0 double simple ($value) ?"-D $value ": "" ( "" , " -D " + str( value ) )[value != vdef] Digestion mode: - double: Report couple of enzymes that digest sequence - simple: Report enzyme name that cut more than one time - both: simple + double (Default value). Length Minimum length of recognition site (-L) Integer 6 ($value) ?"-L $value ": "" ( "" , " -L " + str( value ) )[value is not None and value!=vdef] Use enzymes with minimum length for DNA recognition site. By default, 6. variant Use enzymes with variant recognition site (-V) Boolean 0 ($value) ?"-V": "" ( "" , " -V " )[value] Some cohesive enzymes have variant recognition site like 'GDGCH^C' for Bsp1286I, where: D = not C (A or G or T) H = not G (A or C or T) This option, when set, use of these type of enzymes.. By default, this option is off. exotic Report digestions in frame without ends compatibilities (-X) Boolean 0 $variant == 1 variant == 1 ($value) ?"-X": "" ( "" , " -X " )[value] Some cohesive enzymes have variant recognition site like 'GDGCH^C' for Bsp1286I, where: D = not C (A or G or T) H = not G (A or C or T) Thus, using thoses enzymes may produce a cut in frame but the produced ends may not be compatible together regarding DNA sequence. Requires the '[-V | --variant]' option to work. By default, this option is off. compat List of compatible cohesive ends (-C) Text ($value) ?"-C $value ": "" ( "" , " -C " + str( value ) )[value is not None] File with list of compatible cohesive ends. The default list used is given in the program help pages (compatible_cohesive_ends.txt). If you want to give your own list, the format must be as follow: "Enzyme_name:compatEnz1_name,compatEnz2_name,..." Acc65I:BanI,BsiWI,BsrGI AccI:AciI,AclI,BsaHI,HinP1I,HpaII,NarI,ClaI,BstBI,TaqI enzlist Enzyme list to work with (-E) Text ($value) ?"-E $value ": "" ( "" , " -E " + str( value ) )[value is not None] By default, the program works with a list of enzymes commonly used in laboratory given in the program help pages (paillasse_liste.txt). If you want to give your own list, the format is one enzyme per line. AatII Acc651 ... outputopt Output parameters 3 stop Show stop codon (-P) Boolean 0 $enztype == 0 or $enztype eq 'blunt' enztype == 0 or enztype == 'blunt' ($value) ?"-P": "" ( "" , " -P " )[value] Sometimes, blunt digestion, after linkage, can produce new codon around the cutting site that leads to stop codon. This option displays such digestions with a tag 'stopCodon' in the output line results. NOTE: This option only works if 'blunt' type is set. By default this option is off, thus if such case happened no results are reported for enzymes digestion. cut_pos Show cut positions (-N) Boolean 0 ($value) ?"-N": "" ( "" , " -N " )[value] Enzymes may cut your sequence more than once.This option reports the number of time enzyme(s) cut your sequence. [Default off] mod_aa Show new generated amino acid (-A) Boolean 0 $enztype ne 'cohesive' enztype != 'cohesive' ($value) ?"-A": "" ( "" , " -A " )[value] Experimental option. [Default off] This option allows, for 'blunt' or 'klenow' analysis to show, in such case, the Amino acid that have been changed due to the ligation between the 2 parts of the DNA after the digestion. It will be shown as OldAA>NewAA (e.g.: G>N). outputstyle Output style format (-F) Choice text text gff image ($value) ?"-F $value ": "" ( "" , " -F " + str( value ) )[value != vdef] Choose the output type that you prefer. - Text output (Default value) - GFF3 output - Image (png): Creates image (png). img Output image Picture Binary 1,n $outputstyle eq 'image' outputstyle == 'image' *.png "*.png" Programs-5.1.1/clustalw-sequence.xml0000644000175000001560000011076212073003734016346 0ustar bneronsis clustalw-sequence Clustalw: Sequence to Profile alignments Sequentially add profile2 sequences to profile1 alignment alignment:multiple clustalw -sequences profile Profile Alignments parameters 2 By PROFILE ALIGNMENT, we mean alignment using existing alignments. Profile alignments allow you to store alignments of your favorite sequences and add new sequences to them in small bunches at a time. (e.g. an alignment output file from CLUSTAL W). One or both sets of input sequences may include secondary structure assignments or gap penalty masks to guide the alignment. Merge 2 alignments by profile alignment profile1 Profile 1 Alignment CLUSTAL (defined $value) ? " -profile1=$value" : "" ( "" , " -profile1=" + str( value ) )[value is not None] profile2 Profile 2 Sequence FASTA 1,n (defined $value) ? " -profile2=$value" : "" ( "" , " -profile2=" + str( value ) )[value is not None] general_settings General settings 3 typeseq Protein or DNA (-type) Choice auto auto protein dna (defined $value) ? " -type=$value" : "" ("", " -type="+str(value))[value is not None] quicktree Toggle Slow/Fast pairwise alignments (-quicktree) Choice slow slow fast ($value eq "fast") ? " -quicktree" : "" ( "" , " -quicktree")[ value == "fast"] slow: by dynamic programming (slow but accurate) fast: method of Wilbur and Lipman (extremely fast but approximate) fastpw Fast Pairwise Alignments parameters $quicktree eq "fast" quicktree == "fast" 2 These similarity scores are calculated from fast, approximate, global alignments, which are controlled by 4 parameters. 2 techniques are used to make these alignments very fast: 1) only exactly matching fragments (k-tuples) are considered; 2) only the 'best' diagonals (the ones with most k-tuple matches) are used. ktuple Word size (-ktuple) Integer 1 (defined $value and $value != $vdef) ? " -ktuple=$value" : "" ( "" , " -ktuple=" + str( value ) )[value is not None and value != vdef ] 2 K-TUPLE SIZE: This is the size of exactly matching fragment that is used. INCREASE for speed (max= 2 for proteins; 4 for DNA), DECREASE for sensitivity. For longer sequences (e.g. >1000 residues) you may need to increase the default. topdiags Number of best diagonals (-topdiags) Integer 5 (defined $value and $value != $vdef) ? " -topdiags=$value" : "" ( "" , " -topdiags=" + str( value ))[value is not None and value != vdef ] 2 The number of k-tuple matches on each diagonal (in an imaginary dot-matrix plot) is calculated. Only the best ones (with most matches) are used in the alignment. This parameter specifies how many. Decrease for speed; increase for sensitivity. window Window around best diags (-window) Integer 5 (defined $value and $value != $vdef) ? " -window=$value" : "" ( "" , " -window=" + str( value ) )[ value is not None and value != vdef ] 2 WINDOW SIZE: This is the number of diagonals around each of the 'best' diagonals that will be used. Decrease for speed; increase for sensitivity pairgap Gap penalty (-pairgap) Float 3 (defined $value and $value != $vdef) ? " -pairgap=$value" : "" ( "" , " -pairgap=" + str( value ))[ value is not None and value != vdef ] 2 This is a penalty for each gap in the fast alignments. It has little affect on the speed or sensitivity except for extreme values. score Percent or absolute score ? (-score) Choice percent percent absolute (defined $value and $value ne $vdef) ? " -score=$value" : "" ( "" , " -score=" +str( value ) )[value is not None and value !=vdef] 2 slowpw Slow Pairwise Alignments parameters $quicktree eq "slow" quicktree == "slow" 2 These parameters do not have any affect on the speed of the alignments. They are used to give initial alignments which are then rescored to give percent identity scores. These % scores are the ones which are displayed on the screen. The scores are converted to distances for the trees. pwgapopen Gap opening penalty (-pwgapopen) Float 10.00 (defined $value and $value != $vdef) ? " -pwgapopen=$value" : "" ( "" , " -pwgapopen=" + str( value ) )[ value is not None and value != vdef ] pwgapext Gap extension penalty (-pwgapext) Float 0.10 (defined $value and $value != $vdef) ? " -pwgapext=$value" : "" ( "" , " -pwgapext=" + str( value ) )[ value is not None and value != vdef ] slowpw_prot Protein parameters $typeseq eq "protein" typeseq == "protein" pwmatrix Protein weight matrix (-pwmatrix) Choice gonnet blosum gonnet pam id (defined $value and $value ne $vdef) ? " -pwmatrix=$value" : "" ( "" , " -pwmatrix=" + str(value) )[value is not None and value != vdef ] The scoring table which describes the similarity of each amino acid to each other. For DNA, an identity matrix is used. BLOSUM (Henikoff). These matrices appear to be the best available for carrying out data base similarity (homology searches). The matrices used are: Blosum80, 62, 40 and 30. The Gonnet Pam 250 matrix has been reported as the best single matrix for alignment, if you only choose one matrix. Our experience with profile database searches is that the Gonnet series is unambiguously superior to the Blosum series at high divergence. However, we did not get the series to perform systematically better than the Blosum series in Clustal W (communication of the authors). PAM (Dayhoff). These have been extremely widely used since the late '70s. We use the PAM 120, 160, 250 and 350 matrices. slowpw_dna DNA parameters $typeseq eq "dna" typeseq == "dna" pwdnamatrix DNA weight matrix (-pwdnamatrix) Choice iub iub clustalw (defined $value and $value ne $vdef) ? " -pwdnamatrix=$value" : "" ( "" , " -pwdnamatrix=" + str(value) )[ value is not None and value != vdef ] For DNA, a single matrix (not a series) is used. Two hard-coded matrices are available: 1) IUB. This is the default scoring matrix used by BESTFIT for the comparison of nucleic acid sequences. X's and N's are treated as matches to any IUB ambiguity symbol. All matches score 1.9; all mismatches for IUB symbols score 0. 2) CLUSTALW(1.6). The previous system used by ClustalW, in which matches score 1.0 and mismatches score 0. All matches for IUB symbols also score 0. structure Structure Alignments parameters 2 These options, when doing a profile alignment, allow you to set 2D structure parameters. If a solved structure is available, it can be used to guide the alignment by raising gap penalties within secondary structure elements, so that gaps will preferentially be inserted into unstructured surface loops. Alternatively, a user-specified gap penalty mask can be supplied directly. A gap penalty mask is a series of numbers between 1 and 9, one per position in the alignment. Each number specifies how much the gap opening penalty is to be raised at that position (raised by multiplying the basic gap opening penalty by the number) i.e. a mask figure of 1 at a position means no change in gap opening penalty; a figure of 4 means that the gap opening penalty is four times greater at that position, making gaps 4 times harder to open. Gap penalty masks is to be supplied with the input sequences. The masks work by raising gap penalties in specified regions (typically secondary structure elements) so that gaps are preferentially opened in the less well conserved regions (typically surface loops). CLUSTAL W can read the masks from SWISS-PROT, CLUSTAL or GDE format input files. For many 3-D protein structures, secondary structure information is recorded in the feature tables of SWISS-PROT database entries. You should always check that the assignments are correct - some are quite inaccurate. CLUSTAL W looks for SWISS-PROT HELIX and STRAND assignments e.g. FT HELIX 100 115 FT HELIX 100 115 The structure and penalty masks can also be read from CLUSTAL alignment format as comment lines beginning !SS_ or GM_ e.g. !SS_HBA_HUMA ..aaaAAAAAAAAAAaaa.aaaAAAAAAAAAAaaaaaaAaaa.........aaaAAAAAA !GM_HBA_HUMA 112224444444444222122244444444442222224222111111111222444444 HBA_HUMA VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK Note that the mask itself is a set of numbers between 1 and 9 each of which is assigned to the residue(s) in the same column below. In GDE flat file format, the masks are specified as text and the names must begin with SS_ or GM_. Either a structure or penalty mask or both may be used. If both are included in an alignment, the user will be asked which is to be used. nosecstr1 Do not use secondary structure-gap penalty mask for profile 1 (-nosecstr1) Boolean 0 ($value) ? " -nosecstr1" : "" ( "" , " -nosecstr1")[ value ] 2 This option controls whether the input secondary structure information or gap penalty masks will be used. nosecstr2 Do not use secondary structure-gap penalty mask for profile 2 (-nosecstr2) Boolean 0 ($value) ? " -nosecstr2" : "" ( "" , " -nosecstr2")[ value ] This option controls whether the input secondary structure information or gap penalty masks will be used. helixgap Helix gap penalty (-helixgap) Integer 4 (defined $value and $value != $vdef) ? " -helixgap=$value" : "" ( "" , " -helixgap=" + str( value ) )[ value is not None and value != vdef ] This option provides the value for raising the gap penalty at core Alpha Helical (A) residues. In CLUSTAL format, capital residues denote the A and B core structure notation. The basic gap penalties are multiplied by the amount specified. strandgap Strand gap penalty (-strandgap) Integer 4 (defined $value and $value != $vdef) ? " -strandgap=$value" : "" ( "" , " -strandgap=" + str( value ) )[ value is not None and value != vdef ] This option provides the value for raising the gap penalty at Beta Strand (B) residues. In CLUSTAL format, capital residues denote the A and B core structure notation. The basic gap penalties are multiplied by the amount specified. loopgap Loop gap penalty (-loopgap) Integer 1 (defined $value and $value != $vdef) ? " -loopgap=$value" : "" ( "" , " -loopgap=" + str( value ) )[ value is not None and value != vdef ] This option provides the value for the gap penalty in Loops. By default this penalty is not raised. In CLUSTAL format, loops are specified by . in the secondary structure notation. terminalgap Secondary structure terminal penalty (-terminalgap) Integer 2 (defined $value and $value != $vdef) ? " -terminalgap=$value" : "" ( "" , " -terminalgap=" + str( value ) )[ value is not None and value != vdef ] This option provides the value for setting the gap penalty at the ends of secondary structures. Ends of secondary structures are observed to grow and-or shrink in related structures. Therefore by default these are given intermediate values, lower than the core penalties. All secondary structure read in as lower case in CLUSTAL format gets the reduced terminal penalty. helixendin Helix terminal positions: number of residues inside helix to be treated as terminal (-helixendin) Integer 3 (defined $value and $value != $vdef) ? " -helixendin=$value" : "" ( "" , " -helixendin=" + str( value ) )[ value is not None and value != vdef ] This option (together with the -helixendin) specify the range of structure termini for the intermediate penalties. In the alignment output, these are indicated as lower case. For Alpha Helices, by default, the range spans the end helical turn. helixendout Helix terminal positions: number of residues outside helix to be treated as terminal (-helixendout) Integer 0 (defined $value and $value != $vdef) ? " -helixendout=$value" : "" ( "" , " -helixendout=" + str( value ) )[ value is not None and value != vdef ] This option (together with the -helixendin) specify the range of structure termini for the intermediate penalties. In the alignment output, these are indicated as lower case. For Alpha Helices, by default, the range spans the end helical turn. strandendin Strand terminal positions: number of residues inside strand to be treated as terminal (-strandendin) Integer 1 (defined $value and $value != $vdef) ? " -strandendin=$value" : "" ( "" , " -strandendin=" + str( value ) )[ value is not None and value != vdef ] This option (together with the -strandendout option) specify the range of structure termini for the intermediate penalties. In the alignment output, these are indicated as lower case. For Beta Strands, the default range spans the end residue and the adjacent loop residue, since sequence conservation often extends beyond the actual H-bonded Beta Strand. strandendout Strand terminal positions: number of residues outside strand to be treated as terminal (-strandendout) Integer 1 (defined $value and $value != $vdef) ? " -strandendout=$value" : "" ( "" , " -strandendout=" + str( value ) )[ value is not None and value != vdef ] This option (together with the -strandendin option) specify the range of structure termini for the intermediate penalties. In the alignment output, these are indicated as lower case. For Beta Strands, the default range spans the end residue and the adjacent loop residue, since sequence conservation often extends beyond the actual H-bonded Beta Strand. secstrout Output in alignment (-secstrout) Choice STRUCTURE STRUCTURE MASK BOTH NONE (defined $value and $value ne $vdef) ? " -secstrout=$value" : "" ( "" , " -secstrout=" + str( value ) )[ value is not None and value != vdef ] This option lets you choose whether or not to include the masks in the CLUSTAL W output alignments. Showing both is useful for understanding how the masks work. The secondary structure information is itself very useful in judging the alignment quality and in seeing how residue conservation patterns vary with secondary structure. outputparam Output parameters 5 outputformat Output format (-output) Choice null null GCG GDE PHYLIPI NEXUS PIR FASTA (defined $value and $value ne $vdef) ? " -output=$value" : "" ( "" , " -output=" + str( value) )[ value is not None and value != vdef ] seqnos Output sequence numbers in the output file (for clustalw output only) (-seqnos) Boolean not defined $outputformat outputformat is None 0 (defined $value and $value != $vdef) ? " -seqnos=on" : "" ( "" , " -seqnos=on")[ value is not None and value != vdef] outorder Result order (-outorder) Choice aligned input aligned (defined $value and $value ne $vdef) ? " -outorder=$value" : "" ( "" , " -outorder=" + str(value))[ value is not None and value != vdef ] outfile Sequence alignment file name(-outfile) Filename (defined $value) ? " -outfile=$value" : "" ( "" , " -outfile=" + str( value))[ value is not None ] clustalaligfile Alignment file Alignment CLUSTAL not defined $outputformat outputformat is None (defined $outfile)? "$outfile":"*.aln" ("*.aln", str(outfile))[outfile is not None] In the conservation line output in the clustal format alignment file, three characters are used: '*' indicates positions which have a single, fully conserved residue. ':' indicates that one of the following 'strong' groups is fully conserved (STA,NEQK,NHQK,NDEQ,QHRK,MILV,MILF,HY,FYW). '.' indicates that one of the following 'weaker' groups is fully conserved (CSA,ATV,SAG,STNK,STPA,SGND,SNDEQK,NDEQHK,NEQHRK,FVLIM,HFY). These are all the positively scoring groups that occur in the Gonnet Pam250 matrix. The strong and weak groups are defined as strong score >0.5 and weak score =<0.5 respectively. aligfile Alignment file Alignment $outputformat =~ /^(NEXUS|GCG|PHYLIPI|FASTA)$/ outputformat in [ "NEXUS", "GCG", "PHYLIPI","FASTA"] (defined $outfile)? "$outfile":"*.fasta *.nxs *.phy *.msf" { "OUTFILE":outfile, "FASTA":"*.fasta", "NEXUS": "*.nxs", "PHYLIPI": "*.phy" , 'GCG': '*.msf' }[( "OUTFILE", outputformat)[outfile is None]] seqfile Sequences file Sequence NBRF GDE $outputformat =~ /^(GDE|PIR)$/ outputformat in [ 'GDE', 'PIR' ] (defined $outfile)? "$outfile":"*.gde *.pir" { "OUTFILE":outfile, 'GDE':'*.gde', 'PIR':'*.pir}[( "OUTFILE", outputformat)[outfile is None]] dndfile Tree file Tree NEWICK "*.dnd" "*.dnd" Programs-5.1.1/mobyle_xml_types.txt0000644000175000001560000001654311135113762016317 0ustar bneronsispython class | xml class | description AbstractText | AceAssembly | file in ace format (cap3) AbstractText | AncestorsFile | file to specify ancestors state in phylip (mix) AbstractText | BambeTree | AbstractText | BananaOutput | AbstractText | Blast2taxonomyHtmlOutput | AbstractText | BlastHtmlOutput | AbstractText | BlastTextOutput | AbstractText | BlastXmlOutput | AbstractText | BoxshadeHtmlOutput | AbstractText | BoxshadeRtfOutput | AbstractText | BoxshadeXfigOutput | AbstractText | BtwistedOutput | AbstractText | CaiOutput | AbstractText | ChargeOutput | AbstractText | ChecktransOutput | AbstractText | ChipsOutput | AbstractText | ClippingParametersFile | AbstractText | CodcmpOutput | AbstractText | CoderetOutput | AbstractText | CodonWAnalysisFile | AbstractText | CofoldSequence | AbstractText | ComAlignAlignment | AbstractText | CompseqOutput | AbstractText | ConsensusAlphabet | AbstractText | CostsFile | AbstractText | CpgplotOutput | AbstractText | CpgreportOutput | AbstractText | CuspOutput | AbstractText | DCAlignment | AbstractText | DsspOutput | AbstractText | ELPOutput | AbstractText | EmbossDistanceMatrix | AbstractText | EmowseOutput | AbstractText | EnergyParameterFile | AbstractText | EntryFullText | AbstractText | EnzymeData | AbstractText | EpestfindOutput | AbstractText | EpsilonFile | AbstractText | Est2genomeOutput | AbstractText | FastaHtmlOutput | AbstractText | FastaTextOutput | AbstractText | Feature | AbstractText | FindkmOutput | AbstractText | ForceAlignFile | AbstractText | FreakOutput | AbstractText | GeeceeOutput | AbstractText | GffCustomFile | AbstractText | GruppiOutput | AbstractText | HmmDirichletPrior | AbstractText | HmmNullModelFile | AbstractText | HmmPAM | AbstractText | HmmTextProfile | AbstractText | HmomentOutput | AbstractText | IepOutput | AbstractText | InfoalignOutput | AbstractText | InvertedOutput | AbstractText | IsochoreOutput | AbstractText | LindnaMappingCommands | AbstractText | MFoldFoldingConstraints | AbstractText | MegamergerOutput | AbstractText | MfoldDetailHtmlOutput | AbstractText | MfoldHtmlOutput | AbstractText | MfoldRnamlOutput | AbstractText | MixtureFile | AbstractText | Molwt | AbstractText | MrepsXmlOutput | AbstractText | MuscleAlignment | AbstractText | MuscleHtmlOutput | AbstractText | NewcpgreportOutput | AbstractText | NewcpgseekOutput | AbstractText | OddcompOutput | AbstractText | PalindromeOutput | AbstractText | Pattern | AbstractText | Pdb | AbstractText | PepcoilOutput | AbstractText | PepinfoOutput | AbstractText | PepstatsOutput | AbstractText | PeptideMolweights | AbstractText | PhylipCategoriesRates | AbstractText | PhylipDiscreteCharMatrix | AbstractText | PhylipDistanceMatrix | AbstractText | PhylipWeight | AbstractText | PrettyseqOutput | AbstractText | Primer3Mishybridizing | AbstractText | Primer3Mispriming | AbstractText | Primer3Output | AbstractText | PrimerPairs | AbstractText | PrimoOligo | AbstractText | PrimoRegion | AbstractText | PrimoRepeats | AbstractText | ProfileOrMatrix | AbstractText | ProfitOutput | AbstractText | ProphecyOutput | AbstractText | ProsePattern | AbstractText | PrositePattern | AbstractText | PrositeProfile | AbstractText | ProteinCodes | AbstractText | PsortHtmlOutput | AbstractText | QuicktandemOutput | AbstractText | RNAStructure | AbstractText | RedataOutput | AbstractText | RemapOutput | AbstractText | RestoverOutput | AbstractText | RestrictionFile | AbstractText | RnaFoldSequence | AbstractText | RnadistanceAlignment | AbstractText | ScanPattern | AbstractText | SequenceOverlap | AbstractText | ShowalignOutput | AbstractText | ShowfeatOutput | AbstractText | ShoworfOutput | AbstractText | ShowseqOutput | AbstractText | SigPattern | AbstractText | SixpackOutput | AbstractText | StrideOutputFile | AbstractText | StssearchOutput | AbstractText | SupermatcherError | AbstractText | SycoOutput | AbstractText | SymbolFile | AbstractText | TRnaScanFirstPassResult | AbstractText | TacgHtmlOutput | AbstractText | TacgTextOutput | AbstractText | TandemOutput | AbstractText | TaxonomyOutput | AbstractText | TextsearchOutput | AbstractText | TfscanOutput | AbstractText | TipdateAlignment | AbstractText | ToppredHtmlOutput | AbstractText | USA_list | AbstractText | Vector | AbstractText | VectorstripOutput | AbstractText | WobbleOutput | AbstractText | WordcountOutput | AbstractText | WordfinderError | AbstractText | XpoundOutput | AbstractText | XreportOutput | Binary | AbiTraceFile | Binary | HmmBinProfile | Binary | PDF | Binary | Picture | Binary | PostScript | Text | 3DStructure | // Programs-5.1.1/phyml.xml0000644000175000001560000007417012126514513014036 0ustar bneronsis phyml 3.0 (20122408) PHYML A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood S. Guindon and O. Gascuel Guindon, S. and Gascuel, O. (2003) A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood Systematic Biology, 52(5), 696-704 http://bioweb2.pasteur.fr/docs/phyml/phyml-manual-20122408.pdf http://code.google.com/p/phyml http://code.google.com/p/phyml phylogeny:likelihood phyml alignment Sequence Alignment Alignment PHYLIP-RELAXED "-i $value" " -i "+str(value) seqtype Data type (-d) Choice nt nt aa (defined $value) ? " -d $value" : "" ( "" , " -d " + str(value) )[ value is not None ] inputopt Input Options datasets Number of data sets to analyse (-n) Integer 1 (defined $value and $value != $vdef) ? " -n $value" : "" ("", " -n "+str(value))[value is not None and value != vdef] bootstrap_sets Number of bootstraps sets to generate (only works with one data set to analyse) (-b) Integer $datasets == 1 datasets == 1 (defined $value) ? " -b $value" : "" ( "" , " -b " + str(value) )[ value is not None] must be an integer >= -5 value >= -5
  • b > 0: int is the number of bootstrap replicates.
  • b = 0: neither approximate likelihood ratio test nor bootstrap values are computed.
  • b = -1: approximate likelihood ratio test returning aLRT statistics.
  • b = -2: approximate likelihood ratio test returning Chi2-based parametric branch supports.
  • b = -4: (default) SH-like branch supports alone.
  • b = -5: approximate Bayes branch supports.
control_opt Control Options ntmodel Nucleotide substitution model (-m) Choice $seqtype eq "nt" seqtype == "nt" HKY85 HKY85 JC69 K80 F81 F84 TN93 GTR ($value ne $vdef) ? " -m $value" : "" ("", " -m "+str(value))[value != vdef] aamodel Amino-acid substitution model (-m) Choice $seqtype eq "aa" seqtype == "aa" LG LG WAG JTT MtREV Dayhoff DCMut RtREV CpREV VT Blosum62 MtMam MtArt HIVw HIVb ($value ne $vdef) ? " -m $value" : "" ("", " -m "+str(value))[value != vdef] character_frequencies character frequencies predef_char_freq predefined character frequencies Choice null null e m ("" , " -f " + str(value))[value is not None and value != vdef]
  • e:
    • Nucleotide sequences: (Empirical) the equilibrium base frequencies are estimated by counting the occurence of the different bases in the alignment (default).
    • Amino-acid sequences: (Empirical) the equilibrium amino-acid frequencies are estimated by counting the occurence of the different amino-acids in the alignment.
  • m:
    • Nucleotide sequences: (ML) the equilibrium base frequencies are estimated using maximum likelihood.
    • Amino-acid sequences: (Model) the equilibrium amino-acid frequencies are estimated using the frequencies defined by the substitution model (default).
user_character_frequencies String predef_char_freq is None and seqtype == "nt" and (a_freq is not None and c_freq is not None and g_freq is not None and t_freq is not None) ("", " -f %f,%f,%f,%f" % (a_freq, c_freq, g_freq, t_freq) )[a_freq is not None or c_freq is not None]
?
a_freq A frequencies Float frequencies for a, c, g ,t must be set a_freq is not None and c_freq is not None and g_freq is not None and t_freq is not None c_freq C frequencies Float frequencies for a, c, g ,t must be set a_freq is not None and c_freq is not None and g_freq is not None and t_freq is not None g_freq G frequencies Float frequencies for a, c, g ,t must be set a_freq is not None and c_freq is not None and g_freq is not None and t_freq is not None t_freq T frequencies Float frequencies for a, c, g ,t must be set a_freq is not None and c_freq is not None and g_freq is not None and t_freq is not None
predef_char_freq user_character_frequencies a_freq c_freq g_freq t_freq
tstvratio1 set transition/transversion ratio to get the maximum likelihood estimate (DNA sequences only) Boolean $seqtype eq "nt" seqtype == "nt" 0 ($value) ? " -t e" : "" ( "" , " -t e")[ value ] tstvratio2 User transition/transversion ratio for DNA sequences? (-t) Float $seqtype eq "nt" and not $tstvratio1 seqtype == "nt" and not tstvratio1 (defined $value) ? " -t $value" : "" ( "" , " -t "+str(value))[ value is not None ] propinvar1 set proportion of invariable sites to get the maximum likelihood estimate. (-v) Boolean 0 ($value) ? " -v e" : "" ( "" , " -v e")[ value ] propinvar2 User proportion of invariable sites? (-v) Float not $propinvar1 not propinvar1 (defined $value) ? " -v $value" : "" ( "" , " -v "+str(value))[value is not None] Value must be >= 0 and < 1 $value >= 0 and $value < 1 value >= 0 and value < 1 This option will not be took in account if "set proportion of invariable sites to get the maximum likelihood estimate" is switch on. nbsubstcat Number of relative substitution rate categories (-c) Integer 1 (defined $value and $value != $vdef) ? " -c $value" : "" ("", " -c "+str(value))[value is not None and value != vdef] gamma1 set distribution of the gamma distribution shape parameter to get the maximum likelihood estimate Boolean 0 ($value) ? " -a e" : "" ("", " -a e")[value] gamma2 distribution of the gamma distribution shape parameter. (-a) Float not $gamma1 not gamma1 (defined $value) ? " -a $value" : "" ("", " -a "+str(value))[value is not None] Must be a positive value value > 0 Can be a fixed positive value. This option will not be took into account if "set distribution of the gamma distribution shape parameter to get the maximum likelihood estimate" is switch on. move Tree topology search operation (-s) Choice NNI NNI SPR BEST ($value ne $vdef) ? " -s $value" : "" ("", " -s "+str(value))[value != vdef] PhyML proposes three different methods to estimate tree topologies. The default approach is to use simultaneous NNI. This option corresponds to the original PhyML algorithm. The second approach relies on subtree pruning and regrafting (SPR). It generally finds better tree topologies compared to NNI but is also significantly slower. The third approach, termed BEST, simply estimates the phylogeny using both methods and returns the best solution among the two. usertreefile Starting tree filename (u) Tree NEWICK (defined $value) ? " -u $value" : "" ("", " -u "+str(value))[value is not None] The tree must be in Newick format. optimisation parameter optimisation Choice tlr tlr tl lr l r n ("", " -o " + str(value))[value is not None and value != vdef] This option focuses on specific parameter optimisation. rand_start sets the initial tree to random. Boolean move in ('SPR', 'BEST') 0 ("", " --rand_start ")[bool(value)] It is only valid if SPR searches are to be performed. n_rand_starts the number of initial random trees to be used. Integer move in ('SPR', 'BEST') ("", " --n_rand_starts " + str(value))[value is not None] It is only valid if SPR searches are to be performed. r_seed the seed used to initiate the random number generator (must be an integer) Integer ( "", " --r_seed " + str(value))[value is not None] print_site_lnl Print the likelihood for each site in file. Boolean 0 (""," --print_site_lnl ")[bool(value)] print_trace Print each phylogeny explored during the tree search process in file. Boolean 0 (""," --print_trace ")[bool(value)]
outfile Output file Text "*_phyml_stats.txt" "*_phyml_stats.txt" outtree Output tree Tree NEWICK "*_phyml_tree.txt" "*_phyml_tree.txt" boot_outfile Bootstrap output file Text "*_phyml_boot_stats.txt" "*_phyml_boot_stats.txt" boot_outtree Bootstrap output trees Tree NEWICK "*_phyml_boot_trees.txt" "*_phyml_boot_trees.txt" site_lnl_output The likelihood for each site. $print_site_lnl bool(print_site_lnl) Report "*_phyml_lk.txt" "*_phyml_lk.txt" trace_output trace of each phylogeny explored during the tree search process. $print_trace bool(print_trace) Report "*_phyml_trace.txt" "*_phyml_trace.txt"
Programs-5.1.1/pepcoil.xml0000644000175000001560000002301112072525233014325 0ustar bneronsis pepcoil EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net pepcoil Predicts coiled coil regions in protein sequences http://bioweb2.pasteur.fr/docs/EMBOSS/pepcoil.html http://emboss.sourceforge.net/docs/themes sequence:protein:2D_structure sequence:protein:motifs structure:2D_structure pepcoil e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_window Window size (value from 7 to 28) Integer 28 ("", " -window=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 7 is required value >= 7 Value less than or equal to 28 is required value <= 28 2 e_output Output section e_coil Report coiled coil regions Boolean 1 (" -nocoil", "")[ bool(value) ] 3 e_frame Show coil frameshifts Boolean e_coil 0 ("", " -frame")[ bool(value) ] 4 Yes if -coil is true e_other Report non coiled coil regions Boolean 0 ("", " -other")[ bool(value) ] 5 e_outfile Name of the report file Filename pepcoil.report ("" , " -outfile=" + str(value))[value is not None] 6 e_rformat_outfile Choose the report output format Choice MOTIF DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 7 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 8 Programs-5.1.1/msa.xml0000644000175000001560000002235511767572177013507 0ustar bneronsis msa 2.1 MSA Multiple sequence alignment S. F. Altschul http://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/msa.html ftp://fastlink.nih.gov/pub/msa/msa.tar.Z alignment:multiple msa seqs Sequences File Sequence FASTA " $value" " "+str(value) 2 This is a file containing the sequences to be aligned. control Control parameters 1 optimal Turns off the optimal multiple alignment (-m) Boolean 0 ($value)? " -m":"" ("" , " -m")[ value ] Turns off the optimal multiple alignment segment of the program.This allows the user to see the heuristic alignment and other data produced by the program before the it attempts to produce an optimal multiple alignment. forcedres User file to force the alignment of certain residues (-f) ForceAlignmentPattern AbstractText (defined $value)? " -f $value" : "" ( "" , " -f " + str(value) )[ value is not None ] Allows the user to force the alignment of certain residues. The file referred to must have one or more lines of the following format: seqs.| "S" precedes block start | "L" precedes block length The example would force positions 22 to 31 of sequence 2 to be aligned with positions 21 to 30 of sequence 3 and positions 25 to 34 of sequence 5; it would else force position 35 of sequence 2 to be aligned with position 36 of sequence 3 and position 41 of sequence 5. Needless to say, all positions forced into alignment must be mutually consistent. 2 3 5 S 22 21 25 L 10 S 35 36 41 L 1 Cost Cost parameters 1 endgap Charges terminal gaps the same as internal gaps (-g) Boolean 0 ($value)? " -g":"" ("" , " -g")[ value ] As a default, no charge is made for the existence of a terminal gap. unweight Cost of a multiple alignment (-b) Boolean 0 ($value)? " -b":"" ("" , " -b")[ value ] The cost of a multiple alignment is taken to be the unweighted sum of all the pairwise alignments. In the absence of this flag, the program estimates an evolutionary tree and uses it to assign weights to each pairwise alignment using either rationale-1 or rational-2 as described in Altschul et al., J. Molec. Biol. 208 (1989). maxscore Maximum score of an optimal multiple alignment (-d) Integer (defined $value)? " -d$value" : "" ( "" , " -d" + str(value) )[ value is not None ] Specifies the maximum score of an optimal multiple alignment. Default is calculated from the scores of the optimal pairwise alignments, the weights, and the epsilons. epsilons User specified epsilons for each pairwise alignment (-e) EpsilonFile AbstractText (defined $value)? " -e $value" : "" ( "" , " -e " + str(value) )[ value is not None ] As a default, the program calculates an heuristic multiple alignment and uses it to set epsilons for each pairwise alignment. Frequently the "optimal multiple alignment" will be found to have observed epsilons exceeding those supplied or calculated. When this is the case, it is advisable to rerun the program using suitably augmented epsilons. The file named here should have integers separated by spaces or newlines or both, with one integer for each pair of sequences in the order 1-2, 1-3, ... , 1-N, 2-3, ... , (N-1)-N. costs User costs file (-c) Costs AbstractText (defined $value)? " -c $value" : "" ( "" , " -c " + str(value) )[ value is not None ] Allows the user to specify the cost for a gap, as well as the cost for aligning any pair of letters or a letter with a null. The default is PAM-250 costs for protein sequences, using the one-letter code. The format of this file is an integer, followed by all possible pairs of aligned symbols followed by their cost. For example, the file might begin as follows: This would specify a cost of 0 for aligning a null symbol with another null symbol, a cost of 1 for aligning an A with a null symbol, etc., and an additional cost of 3 for the existence of a gap. The program assumes the costs are symmetric, so that there is no need to have a line for C A as well as for A C. All costs must be non-negative integers. 3 - - 0 - A 1 A C 2 output Output parameters 1 quiet Suppress verbose output (-o) Boolean 0 ($value)? " -o":"" ("" , " -o")[ value ] Programs-5.1.1/stssearch.xml0000644000175000001560000001045112072525233014675 0ustar bneronsis stssearch EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net stssearch Search a DNA database for matches with a set of STS primers http://bioweb2.pasteur.fr/docs/EMBOSS/stssearch.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:primers stssearch e_input Input section e_seqall seqall option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -seqall=" + str(value))[value is not None] 1 e_infile Primer pairs file PrimerPairs AbstractText ("", " -infile=" + str(value))[value is not None] 2 e_output Output section e_outfile Name of the output file (e_outfile) Filename stssearch.e_outfile ("" , " -outfile=" + str(value))[value is not None] 3 e_outfile_out outfile_out option StssearchReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 4 Programs-5.1.1/hmmsim.xml0000644000175000001560000011334511767572177014221 0ustar bneronsis hmmsim HMMSIM Collect profile HMM score distributions on random sequences hmm:simulation hmmsim hmmfile HMM file HmmProfile AbstractText " $value" " "+str(value) 30 generalOptions General options 1 aln Obtain alignment length statistics (-a) Boolean $altSco eq '--vit' altSco == '--vit' 0 ($value) ? " -a" : "" ( "" , " -a" )[ value ] Collect expected Viterbi alignment length statistics from each simulated sequence. This only works with Viterbi scores (the default; see --vit). Two additional fields are printed in the output table for each model: the mean length of Viterbi alignments, and the standard deviation. verbose Verbose: print scores (-v) Boolean 0 ($value) ? " -v" : "" ( "" , " -v" )[ value ] Length Length of random target sequences (-L) Integer 100 (defined $value and $value != $vdef ) ? " -L $value" : "" ( "" , " -L " + str(value) )[ value is not None and value !=vdef ] value > 0 $value > 0 value > 0 number Number of random target sequences (-N) Integer 1000 (defined $value and $value != $vdef) ? " -N $value" : "" ( "" , " -N " + str(value) )[ value is not None and value !=vdef] value > 0 $value > 0 value > 0 AdvancedOptions Advanced options 1 altAln Alternative alignment styles Choice --fs --fs --sw --ls (defined $value and $value ne $vdef) ? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef ] H3 only uses multihit local alignment ( --fs mode), and this is where we believe the statistical fits. Unihit local alignment scores (Smith/Waterman; --sw mode) also obey our statistical conjectures. Glocal alignment statistics (either multihit or unihit) are still not adequately understood nor adequately fitted. fs: Collect multihit local alignment scores. This is the default. 'fs comes from HMMER2' s historical terminology for multihit local alignment as 'fragment search mode'. sw: Collect unihit local alignment scores. The H3 J state is disabled. 'sw' comes from HMMER2's historical terminology for unihit local alignment as 'Smith/Waterman search mode'. ls Collect multihit glocal alignment scores. In glocal (global/local) alignment, the entire model must align, to a subsequence of the target. The H3 local entry/exit transition probabilities are disabled. 'ls' comes from HMMER2's historical terminology for multihit local alignment as 'local search mode'. s: Collect unihit glocal alignment scores. Both the H3 J state and local entry/exit transition probabilities are disabled. 's' comes from HMMER2's historical terminology for unihit glocal alignment. altSco Option controlling scoring algorithm Choice --vit --vit --fwd --hyd --msv --fast (defined $value and $value ne $vdef) ? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef ] vit: Collect Viterbi maximum likelihood alignment scores. This is the default. fwd: Collect Forward log-odds likelihood scores, summed over alignment ensemble. hyb: Collect 'Hybrid' scores, as described in papers by Yu and Hwa (for instance, Bioinformatics 18:864, 2002). These involve calculating a Forward matrix and taking the maximum cell value. The number itself is statistically somewhat unmotivated, but the distribution is expected be a well-behaved extreme value distribution (Gumbel). msv: Collect MSV (multiple ungapped segment Viterbi) scores, using H3's main acceleration heuristic. fast: For any of the above options, use H3's optimized production implementation (using SIMD vectorization). The default is to use the 'generic' implementation (slow and non-vectorized). The optimized implementations sacrifice a small amount of numerical precision. This can introduce confounding noise into statistical simulations and fits, so when one gets super-concerned about exact details, it's better to be able to factor that source of noise out. controlMasse Controlling range of fitted tail masses 1 tmin Set lower bound tail mass for fwd,island (--tmin) Float 0.02 (defined $value and $value != $vdef) ? " --tmin $value" : "" ( "" , " --tmin " + str(value) )[ value is not None and value !=vdef ] Set the lower bound on the tail mass distribution. (The default is 0.02 for the default single tail mass.) tmax Set upper bound tail mass for fwd,island (--tmax) Float 0.02 (defined $value and $value != $vdef) ? " --tmax $value" : "" ( "" , " --tmax " + str(value) )[ value is not None and value !=vdef ] Set the upper bound on the tail mass distribution. (The default is 0.02 for the default single tail mass.) tpoints Set number of tail probs to try (--tpoints) Integer 1 (defined $value and $value != $vdef) ? " --tpoints $value" : "" ( "" , " --tpoints " + str(value) )[ value is not None and value !=vdef ] Set the number of tail masses to sample, starting from --tmin and ending at --tmax. The default is 1, for the default 0.02 single tail mass. tlinear Use linear not log spacing of tail probs (--tlinear) Boolean 0 ($value) ? " --tlinear" : "" ( "" , " --tlinear" )[ value ] Sample a range of tail masses with uniform linear spacing. The default is to use uniform logarithmic spacing. ECalibration Options controlling h3 parameter estimation methods 1 H3 uses three short random sequence simulations to estimating the location parameters for the expected score distributions for MSV scores, Viterbi scores, and Forward scores. These options allow these simulations to be modified. EmL Lengt of sequences for MSV Gumbel mu fit (EmL) Integer 200 (defined $value and $value!=$vdef) ? " --EmL $value" : "" ( "" , " --EmL " + str(value) )[ value is not None and value !=vdef ] Sets the sequence length in simulation that estimates the location parameter mu for MSV E-values. Default is 200. Enter a value > 0. Enter a value > 0 $value > 0 value > 0 EmN Number of sequences for MSV Gumbel mu fit (EmN) Integer 200 (defined $value and $value!=$vdef) ? " --EmN $value" : "" ( "" , " --EmN " + str(value) )[ value is not None and value !=vdef ] Sets the number of sequences in simulation that estimates the location parameter mu for MSV E-values. Default is 200. Enter a value > 0. Enter a value > 0 $value > 0 value > 0 EvL Lengt of sequences for Viterbi Gumbel mu fit (EvL) Integer 200 (defined $value and $value!=$vdef) ? " --EvL $value" : "" ( "" , " --EvL " + str(value) )[ value is not None and value !=vdef ] Sets the sequence length in simulation that estimates the location parameter mu for Viterbi E-values. Default is 200. Enter a value > 0 Enter a value > 0 $value > 0 value > 0 EvN Number of sequences for Viterbi Gumbel mu fit (EvN) Integer 200 (defined $value and $value != $vdef) ? " --EvN $value" : "" ( "" , " --EvN " + str(value) )[ value is not None and value !=vdef ] Sets the number of sequences in simulation that estimates the location parameter mu for Viterbi E-values. Default is 200. Enter a value > 0. Enter a value > 0 $value > 0 value > 0 EfL Lengt of sequences for Forward exp tail tau fit (EfL) Integer 100 (defined $value and $value != $vdef) ? " --EfL $value" : "" ( "" , " --EfL " + str(value) )[ value is not None and value !=vdef ] Sets the sequence length in simulation that estimates the location parameter tau for Forward E-values. Default is 100. Enter a value > 0 Enter a value > 0 $value > 0 value > 0 EfN Number of sequences for Forward exp tail tau fit (EfN) Integer 200 (defined $value and $value != $vdef) ? " --EfN $value" : "" ( "" , " --EfN " + str(value) )[ value is not None and value !=vdef ] Sets the number of sequences in simulation that estimates the location parameter tau for Forward E-values. Default is 200. Enter a value > 0 Enter a value > 0 $value > 0 value > 0 Eft Tail mass for Forward exponential tail tau fit (Eft) Float 0.04 (defined $value and $value != $vdef) ? " --Eft $value" : "" ( "" , " --Eft " + str(value) )[ value is not None and value !=vdef ] Sets the tail mass fraction to fit in the simulation that estimates the location parameter tau for Forward evalues. Default is 0.04. Enter a value > 0 and < 1 Enter a value > 0 and < 1 $value > 0 and $value < 1 value > 0 and value < 1 debugg Debugging options 1 stall Arrest after start: for debugging MPI under gdb (--stall) Boolean 0 ($value) ? " --stall" : "" ( "" , " --stall" )[ value ] For debugging the MPI master/worker version: pause after start, to enable the developer to attach debuggers to the running master and worker(s) processes. Send SIGCONT signal to release the pause. (Under gdb: (gdb) signal SIGCONT) (Only available if optional MPI support was enabled at compile-time.) seed Set random number seed (--seed) Integer 0 (defined $value and $value != $vdef) ? " --seed $value" : "" ( "" , " --seed " + str(value))[ value is not None and value != vdef ] Set the random number seed. The default is 0, which makes the random number generator use an arbitrary seed, so that different runs of hmmsim will almost certainly generate a different statistical sample. For debugging, it is useful to force reproducible results, by fixing a random number seed. expert Experiments options 1 These options were used in a small variety of different exploratory experiments. bgflat Set uniform background frequencies (--bgflat) Boolean 0 ($value) ? " --bgflat" : "" ( "" , " --bgflat" )[ value ] Set the background residue distribution to a uniform distribution, both for purposes of the null model used in calculating scores, and for generating the random sequences. The default is to use a standard amino acid background frequency distribution. bgcomp Set bg frequencies to model's average composition (--bgcomp) Boolean 0 ($value) ? " --bgcomp" : "" ( "" , " --bgcomp" )[ value ] Set the background residue distribution to the mean composition of the profile. This was used in exploring some of the effects of biased composition. lengthmode Turn the H3 length model off (--x-no-lengthmode) Boolean 0 ($value) ? " --x-no-lengthmode" : "" ( "" , " --x-no-lengthmode " )[value ] Turn the H3 target sequence length model off. Set the self-transitions for N,C,J and the null model to 350/351 instead; this emulates HMMER2. Not a good idea in general. This was used to demonstrate one of the main H2 vs. H3 differences. nu Set nu parameter (expected HSPs) for GMSV (--nu) Float $altSco eq '--msv' altSco == '--msv' 2.0 (defined $value and $value != $vdef) ? " --nu $value" : "" ( "" , " --nu " + str(value) )[ value is not None and value !=vdef ] Set the nu parameter for the MSV algorithm -- the expected number of ungapped local alignments per target sequence. The default is 2.0, corresponding to a E->J transition probability of 0.5. This was used to test whether varying nu has significant effect on result (it doesn't seem to, within reason). This option only works if --msv is selected (it only affects MSV), and it will not work with --fast (because the optimized implementations are hardwired to assume nu=2.0). pthresh Set P-value threshold for --ffile (--pthresh) Float defined $ffile ffile is not None 0.02 (defined $value and $value != $vdef) ? " --pthresh $value" : "" ( "" , " --pthresh " + str(value) )[ value is not None and value !=vdef ] Set the filter P-value threshold to use in generating filter power files with --ffile. The default is 0.02 (which would be appropriate for testing MSV scores, since this is the default MSV filter threshold in H3's acceleration pipeline.) Other appropriate choices (matching defaults in the acceleration pipeline) would be 0.001 for Viterbi, and 1e-5 for Forward. output_options Output options 1 save Direct summary output to file, not stdout. (-o) Filename (defined $value)? " -o $value" : "" ( "" , " -o " + str(value) )[ value is not None ] 1 save_out Direct summary output to file. Report defined $save save is not None "$save" str( save ) 1 afile Output alignment lengths to file (--afile) Filename $aln and $altSco eq '--vit' aln and altSco == '--vit' (defined $value)? " --afile $value" : "" ( "" , " --afile " + str(value) )[ value is not None ] 1 When collecting Viterbi alignment statistics (the -a option), for each sampled sequence, output two fields per line to a file: the length of the optimal alignment, and the Viterbi bit score. Requires that the -a option is also used. afile_out Output alignment lengths Report defined $afile afile is not None "$afile" str( afile ) 1 efile Output E vs. E plots to file in xy format (--efile) Filename (defined $value)? " --efile $value" : "" ( "" , " --efile " + str(value) )[ value is not None ] 1 Output a rank versus. E-value plot in XMGRACE xy format to file. The x-axis is the rank of this sequence, from highest score to lowest; the y-axis is the E-value calculated for this sequence. E-values are calculated using H3's default procedures (i.e. the 'pmu, plambda' parameters in the output table). You expect a rough match between rank and E-value if E-values are accurately estimated. efile_out Output E vs. E plots to file in xy format Report defined $efile efile is not None "$efile" str( efile ) 1 ffile Output filter fraction: sequences passing P thresh (--ffile) Filename (defined $value)? " --ffile $value" : "" ( "" , " --ffile " + str(value) )[ value is not None ] 1 Output a 'filter power' file: for each model, a line with three fields: model name, number of sequences passing the P-value threshold, and fraction of sequences passing the P-value threshold. See --pthresh for setting the P-value threshold, which defaults to 0.02 (the default MSV filter threshold in H3). The P-values are as determined by H3's default procedures (the 'pmu,plambda' parameters in the output table). If all is well, you expect to see filter power equal to the predicted P-value setting of the threshold. ffile_out Output filter fraction: sequences passing P thresh Report defined $ffile ffile is not None "$ffile" str( ffile ) 1 pfile Output cumulative survival plots (--pfile) Filename (defined $value)? " --pfile $value" : "" ( "" , " --pfile " + str(value) )[ value is not None ] 1 Output cumulative survival plots (P(S>x)) to file in XMGRACE xy format. There are three plots: (1) the observed score distribution; (2) the maximum likelihood fitted distribution; (3) a maximum likelihood fit to the location parameter (mu/tau) while assuming lambda=log 2. pfile_out Output cumulative survival plots Report defined $pfile pfile is not None "$pfile" str( pfile ) 1 xfile Output bitscores as binary double vector to file (--xfile) Filename (defined $value)? " --xfile $value" : "" ( "" , " --xfile " + str(value) )[ value is not None ] 1 Output the bit scores as a binary array of double-precision floats (8 bytes per score) to file. Programs like Easel's esl-histplot can read such binary files. This is useful when generating extremely large sample sizes. xfile_out Output bitscores as binary double vector to file BitScores Binary defined $xfile xfile is not None "$xfile" str( xfile ) 1 Programs-5.1.1/psort.xml0000644000175000001560000000472111767572177014073 0ustar bneronsis psort PSORT Predicts protein subcellular localization sites from their amino acid sequence Nakai, K. and Horton, P. A program for detecting the sorting signals of proteins and predicting their subcellular localization, trends Biochem. Sci., in press, 1999. http://bioweb2.pasteur.fr/docs/psort/index.html http://psort.hgc.jp/ sequence:protein:composition psort seqfile Protein sequence file Protein Sequence FASTA " $value" " "+str(value) 2 Verbose Verbose mode (-w) Boolean 1 ($value) ? " -w" : "" ("", " -w")[ value ] 1 htmlfile Html output file PsortHtmlReport Report " >psort.html" " >psort.html" 100 "psort.html" "psort.html" Programs-5.1.1/skipseq.xml0000644000175000001560000001447612072525233014370 0ustar bneronsis skipseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net skipseq Reads and writes (returns) sequences, skipping first few http://bioweb2.pasteur.fr/docs/EMBOSS/skipseq.html http://emboss.sourceforge.net/docs/themes sequence:edit skipseq e_input Input section e_feature Use feature information Boolean 0 ("", " -feature")[ bool(value) ] 1 e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 2 e_skip Number of sequences to skip at start Integer 0 ("", " -skip=" + str(value))[value is not None and value!=vdef] 3 e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename skipseq.e_outseq ("" , " -outseq=" + str(value))[value is not None] 4 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 5 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 6 Programs-5.1.1/protdist.xml0000644000175000001560000006553611745213176014572 0ustar bneronsis protdist protdist Compute distance matrix from protein sequences http://bioweb2.pasteur.fr/docs/phylip/doc/protdist.html This program uses protein sequences to compute a distance matrix, under four different models of amino acid replacement. It can also compute a table of similarity between the amino acid sequences. The distance for each pair of species estimates the total branch length between the two species, and can be used in the distance matrix programs FITCH, KITSCH or NEIGHBOR. phylogeny:distance protdist String "protdist <protdist.params" "protdist <protdist.params" 0 infile Alignment File Protein Alignment PHYLIPI "ln -s $infile infile && " "ln -s " + str( infile ) + " infile && " the name of this data can't be "infile" or "outfile" $value ne "infile" and $value ne "outfile" value not in ( "infile" , "outfile" ) -10 The input file must contained aligned sequences in PHYLIP format obtained by sequence alignment programs. 5 13 Alpha AACGTGGCCACAT Beta AAGGTCGCCACAC Gamma CAGTTCGCCACAA Delta GAGATTTCCGCCT Epsilon GAGATCTCCGCCC Method Distance model (P) Choice J J "" "" H "P\\n" "P\n" D "P\\nP\\n" "P\nP\n" K "P\\nP\\nP\\n" "P\nP\nP\n" S "P\\nP\\nP\\nP\\n" "P\nP\nP\nP\n" C "P\\nP\\nP\\nP\\nP\\n" "P\nP\nP\nP\nP\n" 2 protdist.params gamma_dist Gamma distribution of rates among positions (G) Choice $Method =~ /^[JDC]$/ Method in [ "J" , "D" , "C" ] N N "" "" Y "G\\n" "G\n" G "G\\nG\\n" "G\nG\n" 2 protdist.params gamma Coefficient of variation of substitution rate among positions (must be positive) Float $gamma_dist =~ /^[YG]$/ gamma_dist in [ "Y" , "G" ] "$value\\n" str( value )+ "\n" 1500 Instead of the more widely-known coefficient alpha, PROTDIST uses the coefficient of variation (ratio of the standard deviation to the mean) of rates among amino acid positions. So if there is 20% variation in rates, the CV is is 0.20. The square of the C.V. is also the reciprocal of the better-known shape parameter, alpha, of the Gamma distribution, so in this case the shape parameter alpha = 1/(0.20*0.20) = 25. If you want to achieve a particular value of alpha, such as 10, you will want to use a CV of 1/sqrt(100) = 1/10 = 0.1. protdist.params invariant Fraction of invariant positions Float $gamma_dist eq "G"/ gamma_dist == "G" "$value\\n" str(value)+"\n" 1501 protdist.params bootstrap Bootstrap options seqboot Perform a bootstrap before analysis Boolean 0 ($value) ? "seqboot < seqboot.params && mv outfile seqboot.outfile && rm infile && ln -s seqboot.outfile infile && " : "" ( "" , "seqboot < seqboot.params && mv outfile seqboot.outfile && rm infile && ln -s seqboot.outfile infile && " )[ value ] -5 By selecting this option, the bootstrap will be performed on your sequence file. So you don't need to perform a separated seqboot before. Don't give an already bootstrapped file to the program, this won't work! resamp_method Resampling methods (J) Choice $seqboot seqboot bootstrap bootstrap "" "" jackknife "J\\n" "" permute_species "J\\nJ\\n" "J\nJ\n" permute_char "J\\nJ\\nJ\\n" "J\nJ\nJ\n" permute_within_species "J\\nJ\\nJ\\nJ\\n" "J\nJ\nJ\nJ\n" 1 1. The bootstrap. Bootstrapping was invented by Bradley Efron in 1979, and its use in phylogeny estimation was introduced by me (Felsenstein, 1985b). It involves creating a new data set by sampling N characters randomly with replacement, so that the resulting data set has the same size as the original, but some characters have been left out and others are duplicated. The random variation of the results from analyzing these bootstrapped data sets can be shown statistically to be typical of the variation that you would get from collecting new data sets. The method assumes that the characters evolve independently, an assumption that may not be realistic for many kinds of data. 2. Delete-half-jackknifing. This alternative to the bootstrap involves sampling a random half of the characters, and including them in the data but dropping the others. The resulting data sets are half the size of the original, and no characters are duplicated. The random variation from doing this should be very similar to that obtained from the bootstrap. The method is advocated by Wu (1986). 3. Permuting species for each characters. This method of resampling (well, OK, it may not be best to call it resampling) was introduced by Archie (1989) and Faith (1990; see also Faith and Cranston, 1991). It involves permuting the columns of the data matrix separately. This produces data matrices that have the same number and kinds of characters but no taxonomic structure. It is used for different purposes than the bootstrap, as it tests not the variation around an estimated tree but the hypothesis that there is no taxonomic structure in the data: if a statistic such as number of steps is significantly smaller in the actual data than it is in replicates that are permuted, then we can argue that there is some taxonomic structure in the data (though perhaps it might be just the presence of aa pair of sibling species). 4. Permuting characters order. This simply permutes the order of the characters, the same reordering being applied to all species. For many methods of tree inference this will make no difference to the outcome (unless one has rates of evolution correlated among adjacent sites). It is included as a possible step in carrying out a permutation test of homogeneity of characters (such as the Incongruence Length Difference test). 5. Permuting characters separately for each species. This is a method introduced by Steel, Lockhart, and Penny (1993) to permute data so as to destroy all phylogenetic structure, while keeping the base composition of each species the same as before. It shuffles the character order separately for each species. seqboot.params seqboot_seed Random number seed (must be odd) Integer $seqboot seqboot "$value\\n" str( value ) + "\n" Random number seed must be odd $value > 0 and ($value % 2) != 0 value > 0 and (value % 2) != 0 10000 seqboot.params replicates How many replicates (R)? Integer $seqboot seqboot 100 (defined $value and $value != $vdef) ? "R\\n$value\\n" : "" ( "", "R\n" +str( value )+ "\n" )[ value is not None and value != vdef ] This server allows no more than 1000 replicates $replicates <= 1000 replicates <= 1000 Bad data sets number: it must be greater than 1 $value > 1 value > 1 1 seqboot.params weight_opt Weight options weights Use weights for sites (W) Boolean 0 ($value) ? "W\\n" : "" ( "" , "W\n" )[ value ] 1 protdist.params weights_file Weights file PhylipWeight AbstractText $weights weights (defined $value) ? "ln -s $weights_file weights && " : "" ( "" , "ln -s " + str( weights_file ) + " weights && " )[ value is not None ] the name of this data can't be "infile" or "outfile" value not in ( "infile" , "outfile" ) $value ne "infile" and $value ne "outfile" -1 multiple_dataset String $seqboot and $replicates > 1 seqboot and replicates > 1 "M\\nD\\n$replicates\\n" "M\nD\n"+ str( replicates ) + "\n" 1 protdist.params bootconfirm String $seqboot seqboot "Y\\n" "Y\n" 1000 seqboot.params bootterminal_type String $seqboot seqboot "0\\n" "0\n" -1 seqboot.params output Output options printdata Print out the data at start of run (1) Boolean 0 ($value) ? "1\\n" : "" ( "" , "1\n" )[ value ] 1 protdist.params categ_options Categories model options (options available only if Categories model choosed) $Method eq "C" Method == "C" 3 code Genetic code (U) Choice U U M V F Y (defined $value and $value ne $vdef) ? "U\\n$code\\n" : "" ( "", "U\n" + str( code ) + "\n" )[ value is not None and value != vdef ] 3 protdist.params categorization Categorization of amino acids (A) Choice G G C H (defined $value and $value ne $vdef) ? "A\\n$categorization\\n" : "" ( "" , "A\n" +str( categorization ) +"\n" )[ value is not None and value != vdef ] 10 All have groups: (Glu Gln Asp Asn), (Lys Arg His), (Phe Tyr Trp) plus: George/Hunt/Barker: (Cys), (Met Val Leu Ileu), (Gly Ala Ser Thr Pro) Chemical: (Cys Met), (Val Leu Ileu Gly Ala Ser Thr), (Pro) Hall: (Cys), (Met Val Leu Ileu), (Gly Ala Ser Thr), (Pro) protdist.params change_prob Prob change category (1.0=easy) (E) Float 0.4570 (defined $value and $value != $vdef) ? "E\\n$value\\n" : "" ("", "E\n"+ str( value ) +"\n")[ value is not None and value != vdef ] Enter a value between 0.0 and 1.0 $change_prob > 0.0 and $change_prob < 1.0 change_prob > 0.0 and change_prob < 1.0 3 protdist.params ratio Transition/transversion ratio (T) Float 2.000 (defined $value and $value != $vdef) ? "T\\n$value\\n" : "" ("", "T\n"+str(value)+"\n")[value is not None and value != vdef] The transition/transversion ratio must be any number from 0.5 upwards. value >= 0.5 $value >= 0.5 3 If the Categories distance is selected another menu option, T, will appear allowing the user to supply the Transition/Transversion ratio that should be assumed at the underlying DNA level. The transition/transversion ratio can be any number from 0.5 upwards. protdist.params outfile Outfile PhylipDistanceMatrix AbstractText " && mv outfile protdist.outfile" " && mv outfile protdist.outfile" 40 "protdist.outfile" "protdist.outfile" seqboot_out seqboot outfile SetOfAlignment AbstractText $seqboot seqboot 40 "seqboot.outfile" "seqboot.outfile" confirm String "Y\\n" "Y\n" 1000 protdist.params terminal_type String "0\\n" "0\n" -1 protdist.params Programs-5.1.1/msaprobs.xml0000644000175000001560000002040411767601016014527 0ustar bneronsis msaprobs 0.9.4 MSAProbs is a protein multiple sequence alignment algorithm based on pair hidden Markov models and partition function posterior probabilities Yongchao Liu, Bertil Schmidt and Douglas L. Maskell Yongchao Liu, Bertil Schmidt and Douglas L. Maskell (Bioinformatics 2010 26(16): 1958-1964) MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities. http://sourceforge.net/projects/msaprobs/files/MSAProbs-0.9.4.tar.gz/download http://sourceforge.net/projects/msaprobs/

MSAProbs is an open-source protein multiple sequence ailgnment algorithm, achieving the stastistically highest alignment accuracy on popular benchmarks: BALIBASE, PREFAB, SABMARK, OXBENCH, compared to ClustalW, MAFFT, MUSCLE, ProbCons and Probalign.

alignment:multiple msaprobs sequences Sequences File ( a file containing several sequences ). Protein Sequence FASTA " $sequences" " " + str( sequences ) 1000 accuracy Accuracy Options consitency passes of consistency transformation( 0 >= REPS >= 5 default: 2 ) Integer 2 (defined $value and $value != $vdef)" -c $value " : "" ("" , " -c "+str(value))[ value is not None and value != vdef ] use 0 >= REPS >= 5 $value >=0 and $value<=5 value >=0 and value<=5 A probabilistic consistency transformation is used to re-estimate more accurate posterior probabilities of each sequence pair x and y by introducing another sequence z. Instead of re-computing the posterior probabilities based on three-sequence alignments, the transformation is performed based on the already computed probability matrices estimated from pairwise alignments. To avoid a biased sampling of sequences, we therefore derive a weighed probabilistic consistency transformation approach This motivation of the weighted approach is to obtain more accurate alignments than the non-weighted one. The transformations are further performed for a fixed number of iterations to refine the probabilities. In MSAProbs, two iterations (the default value) are used. This default value offers a good trade-off between alignment accuracy and execution time. iterative_refinement passes of iterative-refinement ( use 0 >= REPS >= 1000 default: 10 ) Integer 10 (defined $value and $value != $vdef)" -ir $value " : "" ("" , " -ir "+str(value))[ value is not None and value != vdef ] use 0 >= REPS >= 1000 $value >=0 and $value<=100 value >=0 and value<=100 As a post-processing step, a randomized iterative alignment is employed to further improve alignment accuracy. This refinement randomly partitions S into two non-overlapped subsets, and then performs a profile–profile alignment of the two subsets. MSAProbs designs its own pseudo random number generator based on the linear congruential method for the random partition of S. The iterative refinement is designed to complete after a fixed number of iterations (10 iterations, by default). output_opt Output Options annotation write annotation for multiple alignment to FILENAME Filename (defined $value)" -annot $value " : "" ("" , " -annot " + str(value))[ value is not None ] The score of each column of the final alignment, from the leftmost to the right most, will be report on this annotation file. alignment Alignment file Protein Alignment FASTA "msaprobs.out" "msaprobs.out" annotation_file Annotation file defined $annotation annotation is not None MSAProbsAnnotation Report $annotation annotation Each line represents the score of each column of the final alignment from the leftmost to the right most.
Programs-5.1.1/fitch.xml0000644000175000001560000006022411724156742014006 0ustar bneronsis fitch fitch Fitch-Margoliash and Least-Squares Distance Methods http://bioweb2.pasteur.fr/docs/phylip/doc/fitch.html This program carries out Fitch-Margoliash, Least Squares, and a number of similar methods phylogeny:distance fitch String "fitch <fitch.params" "fitch <fitch.params" 0 infile Distances matrix File PhylipDistanceMatrix AbstractText $infile ne "infile" infile != "infile" "ln -s $infile infile && " "ln -s " + str( value ) + " infile && " -5 Give a file containing a distance matrix obtained by distance matrix programs like protdist or dnadist 5 Alpha 0.000000 0.330447 0.625670 1.032032 1.354086 Beta 0.330447 0.000000 0.375578 1.096290 0.677616 Gamma 0.625670 0.375578 0.000000 0.975798 0.861634 Delta 1.032032 1.096290 0.975798 0.000000 0.226703 Epsilon 1.354086 0.677616 0.861634 0.226703 0.000000 Method Method (D) Choice FM FM "" "" ME "D\\n" "D\n" fitch.params fitch_options Fitch options negative_branch Negative branch lengths allowed (-) Boolean 0 ($value) ? "-\\n" : "" ( "" , "-\n" )[ value ] 1 fitch.params power Power (P) Float 2.0 (defined $value and $value != $vdef) ? "P\\n$value\\n" : "" ( "" , "P\n" + str(value ) + "\n" )[ value is not None and value != vdef ] 1 For the Fitch-Margoliash method, which is the default method with this program, P is 2.0. For the Cavalli-Sforza and Edwards least squares method it should be set to 0 (so that the denominator is always 1). An intermediate method is also available in which P is 1.0, and any other value of P, such as 4.0 or -2.3, can also be used. This generates a whole family of methods. Please read the documentation (man distance). fitch.params jumble_options Randomize options jumble Randomize (jumble) input order (J) Boolean not $user_tree not user_tree 0 ($value) ? "J\\n$jumble_seed\\n$jumble_number\\n" : "" ( "" , "J\n"+str( jumble_seed ) +"\n" + str( jumble_number ) +"\n" )[ value ] 20 fitch.params jumble_seed Random number seed (must be odd) Integer $jumble jumble "" "" Random number seed must be odd $value > 0 and ($value % 2) != 0 value > 0 and (value % 2) != 0 19 jumble_number Number of times to jumble Integer $jumble jumble 1 "" "" 19 bootstrap Bootstrap options multiple Analyze multiple data sets (M) Boolean 0 ($value) ? "M\\n$multiple_number\\n$multiple_seed\\n" : "" ( "", "M\n"+str(multiple_number)+"\n"+str(multiple_seed)+"\n")[value] 10 fitch.params multiple_number How many data sets Integer $multiple multiple "" "" There must be no more than 1000 datasets for this server $value <= 1000 value <= 1000 9 multiple_seed Random number seed (must be odd) Integer $multiple multiple "" "" Random number seed must be odd $value > 0 and ($value % 2) != 0 value > 0 and (value % 2) != 0 19 consense Compute a consensus tree Boolean $multiple and $print_treefile multiple and print_treefile 0 ($value) ? " && cp infile fitch.infile && cp fitch.outtree intree && consense <consense.params && mv outtree consense.outtree && mv outfile consense.outfile" : "" ( "" , " && cp infile fitch.infile && cp fitch.outtree intree && consense <consense.params && mv outtree consense.outtree && mv outfile consense.outfile" )[ value ] 10 consense_confirm String $consense consense "Y\\n" "Y\n" 1000 consense.params consense_terminal_type String $consense consense "T\\n" "T\n" -2 consense.params consense_outfile Consense output file Text $consense consense "consense.outfile" "consense.outfile" consense_treefile Consense tree file Tree NEWICK $consense consense "consense.outtree" "consense.outtree" user_tree_opt User tree options user_tree Use User tree (default: no, search for best tree) (U) Boolean defined $tree_file tree_file is not None 0 ($value) ? "U\\n" : "" ( "" , "U\n")[ value ] You cannot randomize (jumble) your dataset and give a user tree at the same time not ( $user_tree and $jumble ) not ( user_tree and jumble ) 1 To give your tree to the program, you must normally put it in the alignment file, after the sequences, preceded by a line indicating how many trees you give. Here, this will be automatically appended: just give a treefile and the number of trees in it. fitch.params tree_file User Tree file Tree NEWICK $user_tree user_tree (defined $value) ? "cat $tree_file >> intree; " : "" ("" , "cat "+str( tree_file ) + " >> intree; " )[ value is not None ] -1 use_lengths Use lengths from user trees (N) Boolean $user_tree user_tree 0 ($value) ? "N\\n" : "" ( "" , "N\n" )[ value ] 2 fitch.params output Output options print_tree Print out tree (3) Boolean 1 ($value) ? "" : "3\\n" ( "3\n" , "" )[ value ] 1 Tells the program to print a semi-graphical picture of the tree in the outfile. fitch.params print_treefile Write out trees onto tree file (4) Boolean 1 ($value) ? "" : "4\\n" ( "4\n" , "" )[ value ] 1 Tells the program to save the tree in a treefile (a standard representation of trees where the tree is specified by a nested pairs of parentheses, enclosing names and separated by commas). fitch.params printdata Print out the data at start of run (1) Boolean 0 ($value) ? "1\\n" : "" ( "" , "1\n" )[ value ] 1 fitch.params other_options Other options outgroup Outgroup species root (O) Integer 1 (defined $value and $value != $vdef) ? "O\\n$value\\n" : "" ( "" , "O\n" +str( value )+ "\n" )[ value is not None and value != vdef] Please enter a value greater than 0 $value > 0 value > 0 1 fitch.params triangular Matrix format Choice square square "" "" lower "L\\n" "L\n" upper "R\\n" "R\n" 1 fitch.params subreplicates Subreplicates (S) Boolean 0 ($value) ? "S\\n" : "" ( "" , "S\n" )[ value ] 1 If the S (subreplication) option is in effect, the above degrees of freedom must be modified by noting that N is not n(n-1)/2 but is the sum of the numbers of replicates of all cells in the distance matrix read in, which may be either square or triangular. A further explanation of the statistical test of the clock is given in a paper of mine (Felsenstein, 1986). fitch.params global Global rearrangements (G) Boolean 0 ($value) ? "G\\n" : "" ( "", "G\n")[ value ] 1 is the Global search option. This causes, after the last species is added to the tree, each possible group to be removed and re-added. This improves the result, since the position of every species is reconsidered. It approximately triples the run-time of the program. It is not an option in KITSCH because it is the default and is always in force there. fitch.params outfile Fitch output file Text " && mv outfile fitch.outfile" " && mv outfile fitch.outfile" "fitch.outfile" "fitch.outfile" treefile Fitch tree file Tree NEWICK $print_treefile print_treefile " && mv outtree fitch.outtree" " && mv outtree fitch.outtree" "fitch.outtree" "fitch.outtree" confirm String "Y\\n" "Y\n" 1000 fitch.params terminal_type String "0\\n" "0\n" -1 fitch.params Programs-5.1.1/rnaplfold.xml0000644000175000001560000003177111672710655014700 0ustar bneronsis rnaplfold RNAplfold Compute average pair probabilities for local base pairs in long sequences Stephan H Bernhart, Ivo L Hofacker, Peter F Stadler. S. H. Bernhart, I.L. Hofacker, and P.F. Stadler (2006) "Local Base Pairing Probabilities in Large RNAs" Bioinformatics 22: 614-615 A.F. Bompfunewerer, R. Backofen, S.H. Bernhart, J. Hertel, I.L. Hofacker, P.F. Stadler, S. Will (2007) "Variations on {RNA} Folding and Alignment: Lessons from Benasque" J. Math. Biol. RNAplfold computes local pair probabilities for base pairs with a maximal span of L. The probabilities are averaged over all windows of size L that contain the base pair. Output consists of a dot plot in postscript file, where the averaged pair probabilities can easily be parsed and visually inspected. sequence:nucleic:2D_structure structure:2D_structure RNAplfold seq RNA Sequence File RNA Sequence FASTA " < $value" " < " + str(value) 1000 control Control options 2 winsize Size of windows for average pair probabilities. (-W) Integer 70 (defined $value and $value != $vdef)? " -W $value" : "" ( "" , " -W " + str(value) )[ value is not None and value != vdef] span Allow only pairs (i,j) with j-i<=span. (-L) Integer (defined $value and $value <= $winsize)? " -L $value" : "" ( "" , " -L " + str(value) )[ value is not None and value <= winsize ] cutoff Report only base pairs with an average probability > cutoff. (-c) Float 0.01 (defined $value and $value != $vdef)? " -c $value" : "" ( "" , " -c " + str(value) )[ value is not None and value != vdef] width Length to compute the mean probability of unpaired base. (-u) Float (defined $value)? " -u $value" : "" ( "" , " -u " + str(value) )[ value is not None] temperature Rescale energy parameters to a temperature of temperature Celcius (-T) Integer 37 (defined $value and $value != $vdef)? " -T $value" : "" ( "" , " -T " + str(value) )[ value is not None and value != vdef] tetraloops Do not include special stabilizing energies for certain tetraloops (-4) Boolean 0 ($value)? " -4" : "" ( "" , " -4" )[ value ] dangling How to treat dangling end energies for bases adjacent to helices in free ends and multiloops (-d) Choice -d1 -d1 -d -d2 (defined $value and $value ne $vdef)? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] How to treat 'dangling end' energies for bases adjacent to helices in free ends and multiloops: Normally only unpaired bases can participate in at most one dangling end. With -d2 this check is ignored, this is the default for partition function folding (-p). -d ignores dangling ends altogether. Note that by default pf and mfe folding treat dangling ends differently, use -d2 (or -d) in addition to -p to ensure that both algorithms use the same energy model. The -d2 options is available for RNAfold, RNAeval, and RNAinverse only. input Input parameters 2 noGU Do not allow GU pairs (-noGU) Boolean 0 ($value)? " -noGU" : "" ( "" , " -noGU" )[ value ] noCloseGU Do not allow GU pairs at the end of helices (-noCloseGU) Boolean 0 ($value)? " -noCloseGU" : "" ( "" , " -noCloseGU" )[ value ] nsp Non standard pairs (comma seperated list) (-nsp) String (defined $value)? " -nsp $value" : "" ( "" , " -nsp " + str(value) )[ value is not None ] Allow other pairs in addition to the usual AU,GC,and GU pairs. pairs is a comma seperated list of additionally allowed pairs. If a the first character is a '-' then AB will imply that AB and BA are allowed pairs. e.g. RNAfold -nsp -GA will allow GA and AG pairs. Nonstandard pairs are given 0 stacking energy. parameter Energy parameter file (-P) EnergyParameterFile AbstractText (defined $value)? " -P $value" : "" ( "" , " -P " + str(value) )[ value is not None ] Read energy parameters from paramfile, instead of using the default parameter set. A sample parameterfile should accompany your distribution. See the RNAlib documentation for details on the file format. logout Output is switched from probabilities to their logarithm Boolean 0 (defined $value)? " -O " : "" ( "" , " -O " )[ value ] Toggles -u option, output is switched from probabilities to their logarithm, which are NOT exactly the mean energies needed to the respective stretch of bases! readseq String "readseq -f=19 -a $seq > $seq.tmp && (cp $seq $seq.orig && mv $seq.tmp $seq) ; " "readseq -f=19 -a "+ str(seq) + " > "+ str(seq) +".tmp && (cp "+ str(seq) +" "+ str(seq) +".orig && mv "+ str(seq) +".tmp "+ str(seq) +") ; " -10 psfiles Postscript file PostScript Binary "*.ps" "*.ps" Programs-5.1.1/melting.xml0000644000175000001560000002563712120043671014344 0ustar bneronsis melting 4.1f MELTING enthalpy, entropy and melting temperature N. Le Novere Nicolas Le Novere (2001), MELTING, computing the melting temperature of nucleic acid duplex. Bioinformatics 17(12), 1226-1227 http://bioweb2.pasteur.fr/docs/melting/melting.pdf http://www.ebi.ac.uk/~lenov/meltinghome.html http://www.ebi.ac.uk/~lenov/SOFTWARES/ sequence:nucleic:composition melting String "melting -q -v" "melting -q -v" 0 hybridation_type Hybridisation type (-H) Choice null null dnadna dnarna rnarna " -H$value" " -H" + str(value) 1 nnfile Nearest Neighbor parameters set (-A) Choice default default all97a.nn bre86a.nn san96a.nn sug96a.nn fre86a.nn xia98a.nn sug95a.nn (defined $value and $value ne $vdef) ? " -A$value" : "" ( "" , " -A" + str(value) )[ value is not None and value != vdef]
Informs the program to use file.nn as an alternative set of nearest-neighbor parameters, rather than the default for the specified hybridisation type (option -H). melting provides some files ready-to-use:
  • all97a.nn (DNA/DNA hybridisation of Allawi and SantaLucia(1997). Biochemistry 36 : 10581-10594)
  • bre86a.nn (DNA/DNA hybridisation of Breslauer et al. (1986). Proc Natl Acad Sci USA 83 : 3746-3750)
  • san96a.nn (DNA/DNA hybridisation of SantaLucia et al.(1996). Biochemistry 35 : 3555-3562)
  • sug96a.nn (DNA/DNA hybridisation of Sugimoto et al.(1996). Nuc Acids Res 24 : 4501-4505)
  • fre86a.nn (RNA/RNA hybridisation of Freier et al (1986) Proc Natl Acad Sci USA 83: 9373-9377)
  • xia98a.nn (RNA/RNA hybridisation of Xia et al (1998) Biochemistry 37: 14719-14735)
  • sug95a.nn (DNA/RNA hybridisation of Sugimoto et al. (1995). Biochemistry 34 : 11211-11216)
Be careful, the option -A changes the default parameter set defined by the option -H.
1
sequence Sequence string (-S) String " -S$value" " -S" + str(value) 1 complement_string Complementary sequence (-C) String (defined $value) ? " -C$value" : "" ( "" , " -C" + str(value) )[ value is not None ]
Enters the complementary sequence, from 3’ to 5’. This option is mandatory if there are mismatches between the two strands. If it is not used, the program will compute it as the complement of the sequence entered with the option -S.
            5' GTGAGCTCAT 3'
            3' CACTCGAGTA 5'
         
1
salt_concentration Salt concentration (-N) Float " -N$value" " -N" + str(value) Must be greater than 0.0 and lower than 10.0 $value > 0.0 and $value < 10.0 value > 0.0 and value < 10.0 1 Value must be greater than 0.0 and lower than 10.0 nucacid_concentration Nucleic acid concentration in excess (-P) Float (defined $value) ? " -P$value" : "" ("", " -P" + str(value))[value is not None] Must be greater than 0.0 and lower than 0.1 $value > 0.0 and $value < 0.1 value > 0.0 and value < 0.1 1 Value must be greater than 0.0 and lower than 0.1 correction_factor Nucleic acid correction factor (-F) Float (defined $value) ? " -F$value" : "" ( "" , " -F" + str(value) )[ value is not None ] 1 salt_correction Salt correction (-K) Choice null null wet91a san96a san98a (defined $value) ? " -K$value" : "" ( "" , " -K" + str(value) )[ value is not None ] 1 approx Force approximative temperature computation (-x) Boolean 0 ($value) ? " -x" : "" ( "" , " -x" )[ value ] 1 dangling_ends Use parameters for dangling ends (dnadnade.nn) (-D)? Boolean 0 ($value) ? " -Ddnadnade.nn " : "" ( "" , " -Ddnadnade.nn " )[ value ] 1 mismatches Use parameters for mismatches (dnadnamm.nn) (-M)? Boolean 0 ($value) ? " -Mdnadnamm.nn " : "" ( "" , " -Mdnadnamm.nn " )[ value ] 1
Programs-5.1.1/sirna.xml0000644000175000001560000003557612072525233014031 0ustar bneronsis sirna EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net sirna Finds siRNA duplexes in mRNA http://bioweb2.pasteur.fr/docs/EMBOSS/sirna.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:composition sirna e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_seqinsection Sequence input options e_poliii Select probes for pol iii expression vectors Boolean 0 ("", " -poliii")[ bool(value) ] 2 This option allows you to select only the 21 base probes that start with a purine and so can be expressed from Pol III expression vectors. This is the NARN(17)YNN pattern that has been suggested by Tuschl et al. e_aa Select only regions that start with aa Boolean 0 ("", " -aa")[ bool(value) ] 3 This option allows you to select only those 23 base regions that start with AA. If this option is not selected then regions that start with AA will be favoured by giving them a higher score, but regions that do not start with AA will also be reported. e_tt Select only regions that end with tt Boolean 0 ("", " -tt")[ bool(value) ] 4 This option allows you to select only those 23 base regions that end with TT. If this option is not selected then regions that end with TT will be favoured by giving them a higher score, but regions that do not end with TT will also be reported. e_polybase Allow regions with 4 repeats of a base Boolean 1 (" -nopolybase", "")[ bool(value) ] 5 If this option is FALSE then only those 23 base regions that have no repeat of 4 or more of any bases in a row will be reported. No regions will ever be reported that have 4 or more G's in a row. e_output Output section e_outfile Name of the report file Filename sirna.report ("" , " -outfile=" + str(value))[value is not None] 6 The output is a table of the forward and reverse parts of the 21 base siRNA duplex. Both the forward and reverse sequences are written 5' to 3', ready to be ordered. The last two bases have been replaced by 'dTdT'. The starting position of the 23 base region and the %GC content is also given. If you wish to see the complete 23 base sequence, then either look at the sequence in the other output file, or use the qualifier '-context' which will display the 23 bases of the forward sequence in this report with the first two bases in brackets. These first two bases do not form part of the siRNA probe to be ordered. e_rformat_outfile Choose the report output format Choice TABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 7 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile e_outseq Name of the output sequence file (e_outseq) Filename sirna.e_outseq ("" , " -outseq=" + str(value))[value is not None] 8 This is a file of the sequences of the 23 base regions that the siRNAs are selected from. You may use it to do searches of mRNA databases (e.g. REFSEQ) to confirm that the probes are unique to the gene you wish to use it on. e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 9 e_outseq_out outseq_out option Sequence e_outseq e_context Show the two bases before the output 21 base probe Boolean 0 ("", " -context")[ bool(value) ] 10 The output report file gives the sequences of the 21 base siRNA regions ready to be ordered. This does not give you an indication of the 2 bases before the 21 bases. It is often interesting to see which of the suggested possible probe regions have an 'AA' in front of them (i.e. it is useful to see which of the 23 base regions start with an 'AA'). This option displays the whole 23 bases of the region with the first two bases in brackets, e.g. '(AA)' to give you some context for the probe region. YOU SHOULD NOT INCLUDE THE TWO BASES IN BRACKETS WHEN YOU PLACE AN ORDER FOR THE PROBES. auto Turn off any prompting String " -auto -stdout" 11 Programs-5.1.1/dotmatcher.xml0000644000175000001560000005674212072525233015045 0ustar bneronsis dotmatcher EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net dotmatcher Draw a threshold dotplot of two sequences http://bioweb2.pasteur.fr/docs/EMBOSS/dotmatcher.html http://emboss.sourceforge.net/docs/themes alignment:pairwise:dot_plots dotmatcher e_input Input section e_asequence asequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -asequence=" + str(value))[value is not None] 1 e_bsequence bsequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -bsequence=" + str(value))[value is not None] 2 e_matrixfile Matrix file Choice mobyle_null mobyle_null EBLOSUM30 EBLOSUM35 EBLOSUM40 EBLOSUM45 EBLOSUM50 EBLOSUM55 EBLOSUM60 EBLOSUM62 EBLOSUM62-12 EBLOSUM65 EBLOSUM70 EBLOSUM75 EBLOSUM80 EBLOSUM85 EBLOSUM90 EBLOSUMN EDNAFULL EDNAMAT EDNASIMPLE EPAM10 EPAM100 EPAM110 EPAM120 EPAM130 EPAM140 EPAM150 EPAM160 EPAM170 EPAM180 EPAM190 EPAM20 EPAM200 EPAM210 EPAM220 EPAM230 EPAM240 EPAM250 EPAM260 EPAM270 EPAM280 EPAM290 EPAM30 EPAM300 EPAM310 EPAM320 EPAM330 EPAM340 EPAM350 EPAM360 EPAM370 EPAM380 EPAM390 EPAM40 EPAM400 EPAM410 EPAM420 EPAM430 EPAM440 EPAM450 EPAM460 EPAM470 EPAM480 EPAM490 EPAM50 EPAM500 EPAM60 EPAM70 EPAM80 EPAM90 SSSUB ("", " -matrixfile=" + str(value))[value is not None and value!=vdef] 3 This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. e_additional Additional section e_windowsize Window size over which to test threshold (value greater than or equal to 3) Integer 10 ("", " -windowsize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 3 is required value >= 3 4 e_threshold Threshold (value greater than or equal to 0) Integer 23 ("", " -threshold=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 5 e_output Output section e_stretch Stretch plot Boolean 0 ("", " -stretch")[ bool(value) ] 6 Display a non-proportional graph e_graph Choose the e_graph output format Choice not e_stretch png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 7 e_goutfile Name of the output graph Filename not e_stretch dotmatcher_graph ("" , " -goutfile=" + str(value))[value is not None] 8 outgraph_png Graph file Picture Binary not e_stretch and e_graph == "png" "*.png" outgraph_gif Graph file Picture Binary not e_stretch and e_graph == "gif" "*.gif" outgraph_ps Graph file PostScript Binary not e_stretch and e_graph == "ps" or e_graph == "cps" "*.ps" outgraph_meta Graph file Picture Binary not e_stretch and e_graph == "meta" "*.meta" outgraph_data Graph file Text not e_stretch and e_graph == "data" "*.dat" e_xygraph Choose the e_xygraph output format Choice e_stretch png png gif cps ps meta data (" -xygraph=" + str(vdef), " -xygraph=" + str(value))[value is not None and value!=vdef] 9 xy_goutfile Name of the output graph Filename e_stretch dotmatcher_xygraph ("" , " -goutfile=" + str(value))[value is not None] 10 xy_outgraph_png Graph file Picture Binary e_stretch and e_xygraph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_stretch and e_xygraph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_stretch and e_xygraph == "ps" or e_xygraph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_stretch and e_xygraph == "meta" "*.meta" xy_outgraph_data Graph file Text e_stretch and e_xygraph == "data" "*.dat" auto Turn off any prompting String " -auto -stdout" 11 Programs-5.1.1/squizz_checker.xml0000644000175000001560000000365212105210041015715 0ustar bneronsis squizz_checker 0.99b SQUIZZ Sequence/Alignment format checker N. Joly http://bioweb2.pasteur.fr/docs/squizz/seqfmt.html http://bioweb2.pasteur.fr/docs/squizz/alifmt.html alignment:formatter sequence:formatter squizz infile Sequence/Alignment Text " $value" " "+str(value) 2 strict Disable strict format checks (-s) Boolean 0 (defined $value and $value != $vdef) ? " -s" : "" ("", " -s")[value is not None and value !=vdef] 1 Enabled by default Programs-5.1.1/pima.xml0000644000175000001560000003076012073003734013627 0ustar bneronsis pima 1.40 PIMA Pattern-Induced Multi-sequence Alignment program R. D. Smith and T. F. Smith R. D. Smith and T. F. Smith. Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative modelling. protein Engineering, vol5, number 1, pp 35-41, 1992 ftp://ftp.ebi.ac.uk/pub/software/unix/pima/ alignment:multiple pima sequence Sequences file Sequence IG GENBANK NBRF EMBL CODATA FASTA " $value" " " + str(value) 3 Name of the input file containing the sequences to be clustered and multi-aligned. Sequences can be in any of the following formats: IG/Stanford, GenBank/GB, NBRF, EMBL, Pearson/Fasta, PIR/CODATA. The format of the output sequence files will match the format of this input file. cluster_name Cluster name String " $value" " " + str(value) 2 An arbitrary name used to label the cluster. pima_params Parameters ref_seq_name Reference sequence name String defined $sec_struc_seq_filename sec_struc_seq_filename is not None (defined $value)? " $value" : "" ("", " "+str(value))[value is not None] 4 [optional; if specified, then sec_struct_seq_filename must also be specified]. Locus name of one of the primary sequences for which the secondary structure is in the file seq_struct_seq_filename. sec_struc_seq_filename sec_struc_seq_filename Text defined $ref_seq_name_ ref_seq_name is not None (defined $value)? " $value" : "" ( "" , " " + str(value) )[ value is not None ] 5 [optional; if specified, then ref_seq_name must also be specified] Name of a file containing secondary structure sequences for one or more of the primary sequences in the set. The secondary structure sequences in this file must be in one of the formats listed above (see sequence_filename, above). The locus name of each sequence must be the locus name of it's corresponding primary sequence with the suffix '.ss' (e.g. 1ldm.ss). An alpha-helix, 3-10 helix and beta-strand must be designated 'h', 'g', and 'e', respectively. All other characters in the secondary structure sequences will be ignored with respect to the the structure-dependent gap penalty. To allow gaps to be placed between the first and the second and the last elements of these structures, the first and last 2 elements of each should be changed to another character designation. In the secondary structure sequence file pdb-dssp.ss provided with this package, these end cap elements are designated 'i', 'f', and 'd', for alpha-helices, 3-10 helices and beta-strands, respectfully. pima_options Options 1 score_cutoff Cluster score cutoff (-c) Float 0.0 (defined $value and $value != $vdef)? " -c $value " : "" ( "" , " -c " + str(value) )[ value is not None and value != vdef] Use a cluster score cutoff of number. This is the lowest match score to be used to incorporate a sequence into a cluster. The default value of 0.0 will force all input sequences into 1 cluster, but the final pattern may be completely degenerate. ext_gap_cost Gap extension penalty (-d) Integer (defined $value)? " -d $value" : "" ( "" , " -d " + str(value) )[ value is not None ] Use a length dependent gap penalty of number. This is the cost of extending a gap. The default value is dependent on the matrix file used. gap_open_cost Gap opening penalty (-i) Integer (defined $value)? " -i $value" : "" ( "" , " -i " + str(value) )[ value is not None ] Use a length independent gap penalty of number. This is the cost of opening a gap. The default value is dependent on the matrix file used. min_score Minimum local score (-l) Integer (defined $value)? " -l $value" : "" ( "" , " -l " + str(value) )[ value is not None ] Use minimum local score of number. This is the lowest score a quadrant can have before an attempt is made to join this local alignment with the local alignment at the previous step. The default value is dependent on the matrix file used. mat_file Matrix file (-m) Choice patgen.mat patgen.mat class1.mat class2.mat user (defined $value and $value ne $vdef and $value ne "user")? " -m $value" : "" ( "" , " -m " + str(value) )[ value is not None and value !=vdef and value !="user"] Use matrix file with the name file. The default matrix file is patgen.mat and is provided with this package. The matrix file class1.mat uses the original pima alphabet. The matrix file class2.mat is also provided, which is similar to the matrix file class1.mat but uses the new alphabet. user_mat_file User matrix file (-m) Text $mat_file eq "user" mat_file == "user" (defined $value)? " -m $value" : "" ( "" , " -m " + str(value) )[ value is not None ] User matrix file. not_num_ext Do not use numerical extensions on each step of the alignment. (-n) Boolean 0 ($value)? " -n" : "" ( "" , " -n" )[ value ] sec_struc_gap_cost Secondary structure gap penalty (-t) Integer (defined $value)? " -t $value " : "" ( "" , " -t " + str(value) + " " )[ value is not None ] Use a secondary structure gap penalty of number. This is the cost of a gap at a position matching a secondary structure character. The default value is dependent on the matrix file used and is always 10 times the value of the length independent gap penalty of the matrix file. results Output files Text "*.cluster" *.pattern" *.pima" "*.cluster" "*.pattern" "*.pima" Programs-5.1.1/mwfilter.xml0000644000175000001560000001226611672346320014540 0ustar bneronsis mwfilter EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net mwfilter Filter noisy data from molecular weights file http://bioweb2.pasteur.fr/docs/EMBOSS/mwfilter.html http://emboss.sourceforge.net/docs/themes sequence:protein:composition mwfilter e_input Input section e_infile Molecular weights file MolecularWeights AbstractText ("", " -infile=" + str(value))[value is not None] 1 e_datafile Molecular weight standards data file StandardMolecularWeights AbstractText ("", " -datafile=" + str(value))[value is not None ] 2 e_required Required section e_tolerance Ppm tolerance Float 50.0 ("", " -tolerance=" + str(value))[value is not None and value!=vdef] 3 e_additional Additional section e_showdel Output deleted mwts Boolean 0 ("", " -showdel")[ bool(value) ] 4 e_output Output section e_outfile Name of the output file (e_outfile) Filename outfile.mwfilter ("" , " -outfile=" + str(value))[value is not None] 5 e_outfile_out outfile_out option MolecularWeights AbstractText e_outfile auto Turn off any prompting String " -auto -stdout" 6 Programs-5.1.1/iep.xml0000644000175000001560000003334312072525233013460 0ustar bneronsis iep EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net iep Calculate the isoelectric point of proteins http://bioweb2.pasteur.fr/docs/EMBOSS/iep.html http://emboss.sourceforge.net/docs/themes sequence:protein:composition iep e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_additional Additional section e_amino Number of n-termini (value greater than or equal to 0) Integer 1 ("", " -amino=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 2 e_carboxyl Number of c-termini (value greater than or equal to 0) Integer 1 ("", " -carboxyl=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 3 e_termini Include charge at n and c terminus Boolean 1 (" -notermini", "")[ bool(value) ] 4 e_lysinemodified Number of modified lysines (value greater than or equal to 0) Integer 0 ("", " -lysinemodified=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 5 e_disulphides Number of disulphide bridges (value greater than or equal to 0) Integer 0 ("", " -disulphides=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 6 e_advanced Advanced section e_step Step value for ph (value from .01 to 1.) Float .5 ("", " -step=" + str(value))[value is not None and value!=vdef] Value greater than or equal to .01 is required value >= .01 Value less than or equal to 1. is required value <= 1. 7 e_output Output section e_plot Plot charge vs ph Boolean 0 ("", " -plot")[ bool(value) ] 8 e_report Write results to a file Boolean 1 (" -noreport", "")[ bool(value) ] 9 e_graph Choose the e_graph output format Choice e_plot png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 10 xy_goutfile Name of the output graph Filename e_plot iep_xygraph ("" , " -goutfile=" + str(value))[value is not None] 11 xy_outgraph_png Graph file Picture Binary e_plot and e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_plot and e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_plot and e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_plot and e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_plot and e_graph == "data" "*.dat" e_outfile Name of the output file (e_outfile) Filename e_report iep.e_outfile ("" , " -outfile=" + str(value))[value is not None] 12 e_outfile_out outfile_out option IepReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 13 Programs-5.1.1/scan_for_matches.xml0000644000175000001560000001464711767572177016232 0ustar bneronsis scan_for_matches scan_for_matches Scan Nucleotide or Protein Sequences for Matching Patterns Scan_for_matches is a utility to search for patterns in DNA and protein sequences. http://bioweb2.pasteur.fr/docs/scan_for_matches/scan_for_matches.txt ftp://ftp.mcs.anl.gov/pub/Genomics/PatScan/ sequence:protein:pattern sequence:nucleic:pattern scan_for_matches sequence Input sequence Sequence FASTA " < $value" " < " + str(value) 100 pat_file Pattern file ScanPattern AbstractText " $value" " " + str(value) 99 Some examples of pattern: - Simple Patterns Built by Matching Ranges and Reverse Complements: p1=4...7 3...8 ~p1 (three "pattern units" with: 4...7 which "match 4 to 7 characters and call them p1", 3...8 which "match 3 to 8 characters" and ~pi "match the reverse complement of p1" ) - Defining Pairing Rules and Allowing Mismatches, Insertions, and Deletions r1={au,ua,gc,cg,gu,ug,ga,ag} p1=2...3 0...4 p2=2...5 1...5 r1~p2 0...4 ~p1 (p1=2...3 match 2 or 3 characters (call it p1), 0...4 match 0 to 4 characters, p2=2...5 match 2 to 5 characters (call it p2), 1...5 match 1 to 5 characters, r1~p2 match the reverse complement of p2 using rule r1, allowing G-A and A-G pairs, 0...4 match 0 to 4 characters, ~p1 match the reverse complement of p1 allowing only G-C, C-G, A-T, and T-A pairs) - Mismatches and bulges p1=10...10 3...8 ~p1[1,2,1] (the third pattern unit must match 10 characters, allowing one "mismatch" (a pairing other than G-C, C-G, A-T, or T-A)) -Searching for repeats: p1=6...6 3...8 p1 (find exact 6 character repeat separated by to 8 characters) p1=6...6 3..8 p1[1,0,0] (allow one mismatch) p1=3...3 p1[1,0,0] p1[1,0,0] p1[1,0,0] (match 12 characters that are the remains of a 3-character sequence occurring 4 times) p1=4...8 0...3 p2=6...8 p1 0...3 p2 (This would match things like ATCT G TCTTT ATCT TG TCTTT) -Searching for particular sequences: p1=6...8 GAGA ~p1 (match a hairpin with GAGA as the loop) RRRRYYYY (match 4 purines followed by 4 pyrimidines) TATAA[1,0,0] (match TATAA, allowing 1 mismatch) control_options Control options 2 complementary_strand Search complementary strand (-c) Boolean 0 ($value) ? " -c" : "" ( "" , " -c" )[ value ] protein Protein sequence? (-p) Boolean 0 ($value) ? " -p" : "" ( "" , " -p" )[ value ] outfile_name Outfile name Filename hits " > $value" " > " + str(value) 101 outfile Output file Text outfile_name str(outfile_name) Programs-5.1.1/makenucseq.xml0000644000175000001560000011453112072525233015036 0ustar bneronsis makenucseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net makenucseq Create random nucleotide sequences http://bioweb2.pasteur.fr/docs/EMBOSS/makenucseq.html http://emboss.sourceforge.net/docs/themes sequence:edit makenucseq e_input Input section e_codonfile Codon usage file (optional) Choice mobyle_null mobyle_null Eacc.cut Eacica.cut Eadenovirus5.cut Eadenovirus7.cut Eagrtu.cut Eaidlav.cut Eanasp.cut Eani.cut Eani_h.cut Eanidmit.cut Earath.cut Easn.cut Eath.cut Eatu.cut Eavi.cut Eazovi.cut Ebacme.cut Ebacst.cut Ebacsu.cut Ebacsu_high.cut Ebja.cut Ebly.cut Ebme.cut Ebmo.cut Ebna.cut Ebommo.cut Ebov.cut Ebovin.cut Ebovsp.cut Ebpphx.cut Ebraja.cut Ebrana.cut Ebrare.cut Ebst.cut Ebsu.cut Ebsu_h.cut Ecac.cut Ecaeel.cut Ecal.cut Ecanal.cut Ecanfa.cut Ecaucr.cut Eccr.cut Ecel.cut Echi.cut Echick.cut Echicken.cut Echisp.cut Echk.cut Echlre.cut Echltr.cut Echmp.cut Echnt.cut Echos.cut Echzm.cut Echzmrubp.cut Ecloab.cut Ecpx.cut Ecre.cut Ecrigr.cut Ecrisp.cut Ectr.cut Ecyapa.cut Edayhoff.cut Eddi.cut Eddi_h.cut Edicdi.cut Edicdi_high.cut Edog.cut Edro.cut Edro_h.cut Edrome.cut Edrome_high.cut Edrosophila.cut Eeca.cut Eeco.cut Eeco_h.cut Eecoli.cut Eecoli_high.cut Eemeni.cut Eemeni_high.cut Eemeni_mit.cut Eerwct.cut Ef1.cut Efish.cut Efmdvpolyp.cut Ehaein.cut Ehalma.cut Ehalsa.cut Eham.cut Ehha.cut Ehin.cut Ehma.cut Ehorvu.cut Ehum.cut Ehuman.cut Ekla.cut Eklepn.cut Eklula.cut Ekpn.cut Elacdl.cut Ella.cut Elyces.cut Emac.cut Emacfa.cut Emaize.cut Emaize_chl.cut Emam_h.cut Emammal_high.cut Emanse.cut Emarpo_chl.cut Emedsa.cut Emetth.cut Emixlg.cut Emouse.cut Emsa.cut Emse.cut Emta.cut Emtu.cut Emus.cut Emussp.cut Emva.cut Emyctu.cut Emze.cut Emzecp.cut Encr.cut Eneigo.cut Eneu.cut Eneucr.cut Engo.cut Eoncmy.cut Eoncsp.cut Eorysa.cut Eorysa_chl.cut Epae.cut Epea.cut Epet.cut Epethy.cut Epfa.cut Ephavu.cut Ephix174.cut Ephv.cut Ephy.cut Epig.cut Eplafa.cut Epolyomaa2.cut Epombe.cut Epombecai.cut Epot.cut Eppu.cut Eprovu.cut Epse.cut Epseae.cut Epsepu.cut Epsesm.cut Epsy.cut Epvu.cut Erab.cut Erabbit.cut Erabit.cut Erabsp.cut Erat.cut Eratsp.cut Erca.cut Erhile.cut Erhime.cut Erhm.cut Erhoca.cut Erhosh.cut Eric.cut Erle.cut Erme.cut Ersp.cut Esalsa.cut Esalsp.cut Esalty.cut Esau.cut Eschma.cut Eschpo.cut Eschpo_cai.cut Eschpo_high.cut Esco.cut Eserma.cut Esgi.cut Esheep.cut Eshp.cut Eshpsp.cut Esli.cut Eslm.cut Esma.cut Esmi.cut Esmu.cut Esoltu.cut Esoy.cut Esoybn.cut Espi.cut Espiol.cut Espn.cut Espo.cut Espo_h.cut Espu.cut Esta.cut Estaau.cut Estrco.cut Estrmu.cut Estrpn.cut Estrpu.cut Esty.cut Esus.cut Esv40.cut Esyhsp.cut Esynco.cut Esyncy.cut Esynsp.cut Etbr.cut Etcr.cut Eter.cut Etetsp.cut Etetth.cut Etheth.cut Etob.cut Etobac.cut Etobac_chl.cut Etobcp.cut Etom.cut Etrb.cut Etrybr.cut Etrycr.cut Evco.cut Evibch.cut Ewheat.cut Ewht.cut Exel.cut Exenla.cut Exenopus.cut Eyeast.cut Eyeast_cai.cut Eyeast_high.cut Eyeast_mit.cut Eyeastcai.cut Eyen.cut Eyeren.cut Eyerpe.cut Eysc.cut Eysc_h.cut Eyscmt.cut Eysp.cut Ezebrafish.cut Ezma.cut ("", " -codonfile=" + str(value))[value is not None and value!=vdef] 1 Optional codon usage file. Nucleotide sequences will be created as triplets matching the frequencies in the file, with the end trimmed to be in the correct reading frame. e_required Required section e_amount Number of sequences created (value greater than or equal to 1) Integer 100 ("", " -amount=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 2 e_length Length of each sequence (value greater than or equal to 1) Integer 100 ("", " -length=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 3 e_useinsert Do you want to make an insert Boolean 0 ("", " -useinsert")[ bool(value) ] 4 e_insert Inserted string String e_useinsert ("", " -insert=" + str(value))[value is not None] 5 String that is inserted into sequence e_start Start point of inserted sequence (value greater than or equal to 1) Integer e_useinsert 1 ("", " -start=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 6 e_output Output section e_outseq Name of the output sequence file (e_outseq) DNA Filename makenucseq.e_outseq ("" , " -outseq=" + str(value))[value is not None] 7 e_osformat_outseq Choose the sequence output format DNA Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 8 e_outseq_out outseq_out option DNA Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 9 Programs-5.1.1/trnascan.xml0000644000175000001560000010540511767572177014536 0ustar bneronsis trnascan 1.23 tRNAscan-SE Detection of transfer RNA genes T. Lowe, S. Eddy Fichant, G.A. and Burks, C. (1991) Identifying potential tRNA genes in genomic DNA sequences, J. Mol. Biol., 220, 659-671. Eddy, S.R. and Durbin, R. (1994) RNA sequence analysis using covariance models, Nucl. Acids Res., 22, 2079-2088. Pavesi, A., Conterio, F., Bolchi, A., Dieci, G., Ottonello, S. (1994) Identification of new eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis of trnascriptional control regions, Nucl. Acids Res., 22, 1247-1256. Lowe, T.M. and Eddy, S.R. (1997) tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence, Nucl. Acids Res., 25, 955-964. tRNAscan-SE identifies transfer RNA genes in genomic DNA or RNA sequences. It combines the specificity of the Cove probabilistic RNA prediction package (Eddy & Durbin, 1994) with the speed and sensitivity of tRNAscan 1.3 (Fichant & Burks, 1991) plus an implementation of an algorithm described by Pavesi and colleagues (1994) which searches for eukaryotic pol III tRNA promoters (our implementation referred to as EufindtRNA). tRNAscan and EufindtRNA are used as first-pass prefilters to identify "candidate" tRNA regions of the sequence. These subsequences are then passed to Cove for further analysis, and output if Cove confirms the initial tRNA prediction. http://selab.janelia.org/software.html#trnascan ftp://selab.janelia.org/pub/software/tRNAscan-SE/ sequence:nucleic:pattern tRNAscan-SE sequence Sequence File DNA Sequence FASTA " $value" " "+str(value) 2 search_options Search Mode options 1 prokaryotic Improve detection of prokaryotic tRNAs (-P) Boolean 0 ($value) ? " -P":"" ("" , " -P")[ value ] This parameter loosens the search parameters for EufindtRNA to improve detection of prokaryotic tRNAs. Use this option when scanning prokaryotic sequences or both eukaryotic and prokaryotic sequences in the same sequence file. This option also disables pseudogene checking automatically since criteria for pseudogene checking were developed for eukaryotic pseudogenes. Use of this mode with prokaryotic sequences will also improve bounds prediction of the 3' end (the terminal CCA triplet). archeal Select archeal-specific covariance model (-A) Boolean 0 ($value) ? " -A" : "" ( "" , " -A" )[ value ] This option selects an archaeal-specific covariance model for tRNA analysis, as well as slightly loosening the EufindtRNA search cutoffs. organellar Bypasses the fast first-pass scanners that are poor at detecting organellar tRNAs (-O) Boolean 0 ($value) ? " -O":"" ("" , " -O")[ value ] This parameter bypasses the fast first-pass scanners that are poor at detecting organellar tRNAs and runs Cove analysis only. Since true organellar tRNAs have been found to have Cove scores between 15 and 20 bits, the search cutoff is lowered from 20 to 15 bits. Also, pseudogene checking is disabled since it is only applicable to eukaryotic cytoplasmic tRNA pseudogenes. Since Cove-only mode is used, searches will be very slow (see -C option below) relative to the default mode. general General covariance model trained on all three phylogenetic domains (-G) Boolean 0 ($value) ? " -G" : "" ( "" , " -G" )[ value ] This option selects the general tRNA covariance model that was trained on tRNAs from all three phylogenetic domains (archaea, bacteria, & eukarya). This mode can be used when analyzing a mixed collection of sequences from more than one phylogenetic domain, with only slight loss of sensitivity and selectivity. The original publication describing this program and tRNAscan-SE version 1.0 used this general tRNA model exclusively. If you wish to compare scores to those found in the paper or scans using v1.0, use this option. Use of this option is compatible with all other search mode options described in this section. cove_only Analyze sequences using Cove only (-C) Boolean 0 ($value) ? " -C":"" ("" , " -C")[ value ] Directs tRNAscan-SE to analyze sequences using Cove analysis only. This option allows a slightly more sensitive search than the default tRNAscan + EufindtRNA -> Cove mode, but is much slower (by approx. 250 to 3,000 fold). Output format and other program defaults are otherwise identical to the normal analysis. breakdown Show both primary and secondary structure components to covariance model bit score (-H) Boolean 0 ($value) ? " -H" : "" ( "" , " -H" )[ value ] Since tRNA pseudogenes often have one very low component (good secondary structure but poor primary sequence similarity to the tRNA model, or vice versa), this information may be useful in deciding whether a low-scoring tRNA is likely to be a pseudogene. The heuristic pseudogene detection filter uses this information to flag possible pseudogenes -- use this option to see why a hit is marked as a possible pseudogene. The user may wish to examine score breakdowns from known tRNAs in the organism of interest to get a frame of reference. disable_checking Disable pseudogene checking (-D) Boolean 0 ($value) ? " -D":"" ("" , " -D")[ value ] This will slightly speed the program and may be necessary for non-eukaryotic sequences that are flagged as possible pseudogenes but are known to be functional tRNAs. special_options Special options 1 trnascan_only Use tRNAscan only to analyze sequences (-T) Boolean 0 ($value) ? " -T":"" ("" , " -T")[ value ] Directs tRNAscan-SE to use only tRNAscan to analyze sequences. This mode will default to using 'strict' parameters with tRNAscan analysis (similar to tRNAscan version 1.3 operation). This mode of operation is faster (3-5 times faster than default mode analysis), but will result in approximately 0.2 to 0.6 false positive tRNAs per Mbp, decreased sensitivity, and less reliable prediction of anticodons, tRNA isotype, and introns. eufindtrna_only Use EufindtRNA only to search for tRNAs (-E) Boolean 0 ($value) ? " -E":"" ("" , " -E")[ value ] Since Cove is not being used as a secondary filter to remove false positives, this run mode defaults to 'Normal' parameters which more closely approximates the sensitivity and selectivity of the original algorithm describe by Pavesi and colleagues (see the option -e for a description of the various run modes). trnascan_mode Strict or relaxed tRNAscan mode (-t) Choice S S R (defined $value and $value ne $vdef) ? " -t $value":"" ("" , " -t "+ str(value))[ value is not None and value != vdef] Relaxed parameters may give very slightly increased search sensitivity, but increase search time by 20-40 fold. eufindtrna_mode EufindtRNA mode (-e) Choice S S R N (defined $value) ? " -e $value":"" ("" , " -e "+ str(value))[ value is not None ] Explicitly set EufindtRNA params, where <mode>= R, N, or S (relaxed, normal, or strict). The 'relaxed' mode is used for EufindtRNA when using tRNAscan-SE in default mode. With relaxed parameters, tRNAs that lack pol III poly-T terminators are not penalized, increasing search sensitivity, but decreasing selectivity. When Cove analysis is being used as a secondary filter for false positives (as in tRNAscan-SE's default mode), overall selectivity is not decreased. Using 'normal' parameters with EufindtRNA does incorporate a log odds score for the distance between the B box and the first poly-T terminator, but does not disqualify tRNAs that do not have a terminator signal within 60 nucleotides. This mode is used by default when Cove analysis is not being used as a secondary false positive filter. Using 'strict' parameters with EufindtRNA also incorporates a log odds score for the distance between the B box and the first poly-T terminator, but _rejects_ tRNAs that do not have such a signal within 60 nucleotides of the end of the B box. This mode most closely approximates the originally published search algorithm (3); sensitivity is reduced relative to using 'relaxed' and 'normal' modes, but selectivity is increased which is important if no secondary filter, such as Cove analysis, is being used to remove false positives. This mode will miss most prokaryotic tRNAs since the poly-T terminator signal is a feature specific to eukaryotic tRNAs genes (always use 'relaxed' mode for scanning prokaryotic sequences for tRNAs). save_first_pass Save first pass results (-r) Boolean 0 ($value) ? " -r\\#":"" ("" , " -r#")[ value ] Save tabular, formatted output results from tRNAscan and/or EufindtRNA first pass scans. The format is similar to the final tabular output format, except no Cove score is available at this point in the search (if EufindtRNA has detected the tRNA, the negative log likelihood score is given). Also, the sequence ID number and source sequence length appear in the columns where intron bounds are shown in final output. This option may be useful for examining false positive tRNAs predicted by first-pass scans that have been filtered out by Cove analysis. previous_first_pass_result Use a previous first pass result tabular file (-u) TrnaScanFirstPassResult AbstractText $matching or $start matching and start (defined $value) ? " -u $value":"" ("" , " -u "+ str(value))[ value is not None ] This option allows the user to re-generate results from regions identified to have tRNAs by a previous tRNAscan-SE run. Either a regular tabular result file, or output saved with the -r option may be used as the specified <file>. This option is particularly useful for generating either secondary structure output (-f option) or ACeDB output (-a option) without having to re-scan entire sequences. Alternatively, if the -r option is used to generate the previous results file, tRNAscan-SE will pick up at the stage of Cove-confirmation of tRNAs and output final tRNA predictons as with a normal run. false_positives Save false positives (-F) Boolean 0 ($value) ? " -F\\#":"" ("" , " -F#")[ value ] Save first-pass candidate tRNAs that were then found to be false positives by Cove analysis. This option saves candidate tRNAs found by either tRNAscan and/or EufindtRNA that were then rejected by Cove analysis as being false positives. tRNAs are saved in the FASTA sequence format. specify_options Specify Alternate Cutoffs / Data Files options 1 cutoff Cove cutoff score for reporting tRNAs (-X) Integer 20 (defined $value and $value != $vdef) ? " -X $value":"" ("" , " -X "+ str(value))[ value is not None and value != vdef] This option allows the user to specify a different Cove score threshold for reporting tRNAs. It is not recommended that novice users change this cutoff, as a lower cutoff score will increase the number of pseudogenes and other false positives found by tRNAscan-SE (especially when used with the 'Cove only' scan mode). Conversely, a higher cutoff than 20.0 bits will likely cause true tRNAs to be missed by tRNAscan (numerous 'real' tRNAs have been found just above the 20.0 cutoff). Knowledgable users may wish to experiment with this parameter to find very unusual tRNAs or pseudogenes beyond the normal range of detection with the preceding caveats in mind. Length Max length of tRNA intron+variable region (-L) Integer 116 (defined $value and $value !=$vdef) ? " -L $value":"" ("" , " -L "+ str(value))[ value is not None and value !=vdef] Set max length of tRNA intron+variable region (default=116bp). The default maximum tRNA length for tRNAscan-SE is 192 bp, but this limit can be increased with this option to allow searches with no practical limit on tRNA length. In the first phase of tRNAscan-SE, EufindtRNA searches for A and B boxes of <length> maximum distance apart, and passes only the 5' and 3' tRNA ends to covariance model analysis for confirmation (removing the bulk of long intervening sequences). tRNAs containing group I and II introns have been detected by setting this parameter to over 800 bp. Caution: group I or II introns in tRNAs tend to occur in positions other than the canonical position of protein-spliced introns, so tRNAscan-SE mispredicts the intron bounds and anticodon sequence for these cases. tRNA bound predictions, however, have been found to be reliable in these same tRNAs. add_to_both_ends Number of nucleotids to add to both ends during first-pass (-z) Integer 7 (defined $value and $value != $vdef) ? " -z $value" : "" ( "" , " -z "+ str(value) )[ value is not None and value != vdef] By default, tRNAscan-SE adds 7 nucleotides to both ends of tRNA predictions when first-pass tRNA predictions are passed to covariance model (CM) analysis. CM analysis generally trims these bounds back down, but on occasion, allows prediction of an otherwise truncated first-pass tRNA prediction. genetic Genetic code (-g) Choice not $trnascan_only and not $eufindtrna_only not trnascan_only and not eufindtrna_only Standard Standard gcode.cilnuc gcode.echdmito gcode.invmito gcode.othmito gcode.vertmito gcode.ystmito (defined $value and $value ne $vdef) ? " -g $value":"" ("" , " -g "+ str(value))[ value is not None and value != vdef] This option does not have any effect when using the -T or -E options -- you must be running in default or Cove only analysis mode. covariante Specify an alternate covariance model (-c) Text (defined $value) ? " -c $value":"" ("" , " -c " + str(value))[ value is not None] misc_options Misc options 1 matching Search only sequences with names matching this string (-n) String (defined $value) ? " -n $value":"" ("" , " -n "+ str(value))[ value is not None] Search only sequences with names matching this string. Only those sequences with names (first non-white space word after '>' symbol on FASTA name/description line) matching this string are analyzed for tRNAs. start Start search at first sequence with name matching this string (-s) String (defined $value) ? " -s $value":"" ("" , " -s "+ str(value))[ value is not None ] Start search at first sequence with name matching <EXPR> string and continue to end of input sequence file(s). This may be useful for re-starting crashed/aborted runs at the point where the previous run stopped. (If same names for output file(s) are used, program will ask if files should be over-written or appended to -- choose append and run will successfully be restarted where it left off). output_options Output options 1 secondary_structure Save secondary structure results file (-f) Boolean 0 ($value) ? " -f\\#":"" ("" , " -f#")[ value ] Save final results and Cove tRNA secondary structure predictions. This output format makes visual inspection of individual tRNA predictions easier since the tRNA sequence is displayed along with the predicted tRNA base pairings. acedb Output final results in ACeDB format instead of the default tabular format (-a) Boolean 0 ($value) ? " -a":"" ("" , " -a")[ value ] statistics Save statistics summary for run (-m) Boolean 0 ($value) ? " -m\\#":"" ("" , " -m#")[ value ] This option directs tRNAscan-SE to write a brief summary to a file which contains the run options selected as well as statistics on the number of tRNAs detected at each phase of the search, search speed, and other bits of information. See Manual documentation for explanation of each statistic. progress Display program progress (-d) Boolean 0 ($value) ? " -d":"" ("" , " -d")[ value ] Messages indicating which phase of the tRNA search are printed to standard output. If final results are also being sent to standard output, some of these messages will be suppressed so as to not interrupt display of the results. log Save log of program progress (-l) Boolean 0 ($value) ? " -l\\#":"" ("" , " -l#")[ value ] quiet Quiet mode (-q) Boolean 0 ($value) ? " -q":"" ("" , " -q")[ value ] The credits & run option selections normally printed to standard error at the beginning of each run are suppressed. brief Use brief output format (-b) Boolean 0 ($value) ? " -b":"" ("" , " -b")[ value ] This eliminates column headers that appear by default when writing results in tabular output format. Useful if results are to be parsed or piped to another program. trna_codon Output a tRNA's corresponding codon in place of its anticodon (-N) Boolean 0 ($value) ? " -N":"" ("" , " -N")[ value ] label Use prefix for all default output file names (-p) Filename (defined $value) ? " -p $value":"" ("" , " -p "+str(value))[ value is not None ] scanners Displays which of the first-pass scanners detected the tRNA being output (-y) Boolean 0 ($value) ? " -y":"" ("" , " -y")[ value ] 1 'Ts', 'Eu', or 'Bo' will appear in the last column of Tabular output, indicating that either tRNAscan 1.4, EufindtRNA, or both scanners detected the tRNA, respectively. results Results files Text "*.stats" "*.log" "*.ss" "*.fpos" "*.stats" "*.log" "*.ss" "*.fpos" first_pass_scan_results First pass scan result TrnaScanFirstPassResult AbstractText "*.fpass.out" "*.fpass.out" Programs-5.1.1/treealign.xml0000644000175000001560000001462211767572177014677 0ustar bneronsis treealign treealign Phylogenetic alignment of homologous sequences J. Hein Hein, J.: Unified approach to alignment and phylogenies. Meth. Enzymol. 183:626-645 (1990). Hein, J.: A new method that simultaneously aligns and reconstruct ancestral sequences for any number of homologous sequences, when the phylogeny is given. Mol. Biol. Evol. 6:649-668 (1989). Hein, J.: A tree reconstruction method that is economical in the number of pairwise comparisons used. Mol. Biol. Evol. 6:669-684 (1989). alignment:pairwise phylogeny:parsimony treealign fileseq Sequences File Sequence NBRF "$value\\n" str(value)+ "\n" 30 The sequences should be homologous and there should be a history to be found. If you give it a set of COMPLETELY unrelated sequences, it is possible that it will not be able to align them, since it cannot allocate enough memory. The sequences should not vary in length because they have been sequenced unequally much. Length differences should be due to evolution. Thus it should not be used to look for local homologies. par.dat >P1;alpha GDGKMTADKLNFPGNS* >P1;beta GDGKNTRDKINFPGNS* >P1;gamma GDGKNTADKINFPGNS* seqtype Sequence type Choice null null 1 0 "$value" str(value) 11 par.dat nuseq Number of sequences Integer " $value" " "+str(value) 12 par.dat gap_open Gap open penalty Integer " $value" " "+str(value) Enter a non-negative value $value >= 0 value >= 0 13 par.dat gap_ext Gap extension penalty Integer " $value\\n" " " + str(value) + "\n" 14 par.dat other_options Other options ancesterout Present ancestral sequences Boolean 0 ($value) ? "1" : "0" ( "0" , "1" )[ value ] 21 par.dat filetree Output tree file Text "$fileseq.tree\\n" str(fileseq)+ ".tree\n" 40 par.dat "*.tree" "*.tree" fileali Output alignment file Text "$fileseq.ali\\n" str(fileseq)+".ali\n" 50 par.dat "*.ali" "*.ali" Programs-5.1.1/maskfeat.xml0000644000175000001560000002031612072525233014472 0ustar bneronsis maskfeat EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net maskfeat Write a sequence with masked features http://bioweb2.pasteur.fr/docs/EMBOSS/maskfeat.html http://emboss.sourceforge.net/docs/themes sequence:edit:feature_table maskfeat e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_additional Additional section e_type Type of feature to mask String ("", " -type=" + str(value))[value is not None] 2 By default any feature in the feature table with a type starting 'repeat' is masked. You can set this to be any feature type you wish to mask. See http://www.ebi.ac.uk/embl/WebFeat/ for a list of the EMBL feature types and see Appendix A of the Swissprot user manual in http://www.expasy.org/sprot/userman.html for a list of the Swissprot feature types. The type may be wildcarded by using '*'. If you wish to mask more than one type, separate their names with spaces or commas, eg: *UTR repeat* e_tolower Change masked region to lower-case Boolean 0 ("", " -tolower")[ bool(value) ] 3 The region can be 'masked' by converting the sequence characters to lower-case, some non-EMBOSS programs e.g. fasta can interpret this as a masked region. The sequence is unchanged apart from the case change. You might like to ensure that the whole sequence is in upper-case before masking the specified regions to lower-case by using the '-supper' flag. e_maskchar Character to mask with String not e_tolower ("", " -maskchar=" + str(value))[value is not None] 4 Character to use when masking. Default is 'X' for protein sequences, 'N' for nucleic sequences. If the mask character is set to be the SPACE character or a null character, then the sequence is 'masked' by changing it to lower-case, just as with the '-lowercase' flag. e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename maskfeat.e_outseq ("" , " -outseq=" + str(value))[value is not None] 5 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 6 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/equicktandem.xml0000644000175000001560000002157312072525233015357 0ustar bneronsis equicktandem EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net equicktandem Finds tandem repeats in nucleotide sequences http://bioweb2.pasteur.fr/docs/EMBOSS/equicktandem.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:repeats equicktandem e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_maxrepeat Maximum repeat size Integer 600 ("", " -maxrepeat=" + str(value))[value is not None and value!=vdef] 2 e_threshold Threshold score Integer 20 ("", " -threshold=" + str(value))[value is not None and value!=vdef] 3 e_output Output section e_outfile Name of the report file Filename report.qtan ("" , " -outfile=" + str(value))[value is not None] 4 e_rformat_outfile Choose the report output format Choice TABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 5 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile e_origfile Name of the output file (e_origfile) Filename outfile.oldqtan ("" , " -origfile=" + str(value))[value is not None] 6 e_origfile_out origfile_out option QuicktandemReport Report e_origfile auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/mspcrunch.xml0000644000175000001560000005410411767572177014726 0ustar bneronsis mspcrunch 2.5 MSPcrunch A BLAST post-processing filter Sonnhammer, Durbin http://bioweb2.pasteur.fr/docs/mspcrunch/MSPcrunch2.pdf http://sonnhammer.sbc.su.se/MSPcrunch.html http://sonnhammer.sbc.su.se/download/software/MSPcrunch+Blixem/ database:search:display mspcrunch String "MSPcrunch" "MSPcrunch" 0 input_options Input Options 1 blast_output BLAST output File BlastTextReport Report " $value" " "+str(value) 2 force_blastp Force Blastp mode (default Blastx) (-p) Boolean 0 ($value) ? " -p" : "" ( "" , " -p" )[ value ] force_blastn Force Blastn mode (default Blastx) (-n) Boolean 0 ($value) ? " -n" : "" ( "" , " -n" )[ value ] analyse_options Control options 1 gapped Make gapped alignment of ungapped-MSP contigs (-G) Boolean 0 ($value) ? " -G" : "" ( "" , " -G" )[ value ] cov_limit Set coverage limit (-l) Integer 10 (defined $value and $value != $vdef) ? " -l $value" : "" ( "" , " -l "+str(value) )[ value is not None and value !=vdef] 0 = No coverage rejection old_cutoff Use old step cutoffs for adjacency instead of the new continuous system. (-O) Boolean 0 ($value) ? " -O" : "" ( "" , " -O" )[ value ] dont_reject Don't reject any MSPs (-w) Boolean 0 ($value) ? " -w" : "" ( "" , " -w" )[ value ] report_rejected Report only rejected MSPs (-r) Boolean 0 ($value) ? " -r" : "" ( "" , " -r" )[ value ] threshold_id Reject all matches with less than this % identity (-I) Float (defined $value) ? " -I $value" : "" ( "" , " -I " + str(value) )[ value is not None ] threshold_length Reject all matches with length less than this value (-L) Float (defined $value) ? " -L $value" : "" ( "" , " -L " + str(value) )[ value is not None ] expect Reject all matches with E-value higher than this value (-e) Float (defined $value) ? " -e $value" : "" ( "" , " -e " + str(value))[ value is not None ] query Read in query seq (for rereading .seqbl files) (-Q) Sequence FASTA (defined $value)? " -Q $query" : "" ( "" , " -Q " + str(value) )[ value is not None ] whole_contig Coverage limitation requires whole contig to be covered (always for Blastp) (-a) Boolean 0 ($value) ? " -a" : "" ( "" , " -a" )[ value ] hits_to_self Accept hits to self (-s) Boolean 0 ($value) ? " -s" : "" ( "" , " -s" )[ value ] no_hits_to_earlier Ignore hits to earlier seqnames (for All-vs-All) (-A) Boolean 0 ($value) ? " -A" : "" ( "" , " -A" )[ value ] stats_without_X Recalculate percentage identity, ignoring X residues. (-j) Boolean 0 ($value) ? " -j" : "" ( "" , " -j" )[ value ] stats_without_end Recalculate percentage identity, ignoring mismatches at ends. (-J) Boolean 0 ($value) ? " -J" : "" ( "" , " -J" )[ value ] silent_mutations Do Statistics of Silent mutations (only cDNA!) (-S) Boolean 0 ($value) ? " -S" : "" ( "" , " -S" )[ value ] matrix_stats Print statistics on used matrices (-E) Boolean 0 ($value) ? " -E" : "" ( "" , " -E" )[ value ] all_expected Print all Expected scores (default only when positive) (-X) Boolean 0 ($value) ? " -X" : "" ( "" , " -X" )[ value ] line_length Line length of Wrapped alignment (-W) Integer (defined $value) ? " -W $value" : "" ( "" , " -W " +str(value) )[ value is not None ] output_options Output Options 1 outfile Result file Report "mspcrunch.out" "mspcrunch.out" big_pict Big Picture output (-P) Boolean 0 ($value) ? " -P" : "" ( "" , " -P" )[ value ] matches_one_line For Big Picture output, force all matches to the same subject on one line (-F) Boolean $big_pict big_pict 0 ($value) ? " -F" : "" ( "" , " -F" )[ value ] sfs Produce SFS output (-H) Boolean 0 ($value) ? " -H" : "" ( "" , " -H" )[ value ] seqbl Produce seqbl output for Blixem (-q) Boolean 0 ($value) ? " -q" : "" ( "" , " -q" )[ value ] wublast_numbered Indicate query insertions with numbers (For seqbl output from Wublast) (-N) Boolean $seqbl seqbl 0 ($value) ? " -N" : "" ( "" , " -N" )[ value ] ace Produce .ace output (for ACEDB 4) (-4) Boolean 0 ($value) ? " -4" : "" ( "" , " -4" )[ value ] dont_mirror Don't mirror (i.e. print the subject object) in ACE4 format (-M) Boolean $ace ace 0 ($value) ? " -M" : "" ( "" , " -M" )[ value ] exblx Produce exblx output (for easy parsing) (-x) Boolean 0 ($value) ? " -x" : "" ( "" , " -x" )[ value ] exbldb Produce exbldb output (as exblx with query names) (-d) Boolean 0 ($value) ? " -d" : "" ( "" , " -d" )[ value ] fasta Produce fasta output (unaligned, for mult.alignm.) (-2) Boolean 0 ($value) ? " -2" : "" ( "" , " -2" )[ value ] three_frame Print 3 frame translation (blastn only) (-3) Boolean 0 ($value) ? " -3" : "" ( "" , " -3" )[ value ] footer Print footer with parameters and stats (-f) Boolean 0 ($value) ? " -f" : "" ( "" , " -f" )[ value ] percentage_id Print percentage identity (seqbl output only) (-i) Boolean $seqbl seqbl 0 ($value) ? " -i" : "" ( "" , " -i" )[ value ] stats Output coverage stats? Boolean 0 "" "" stats_file Coverage stats outputfile Text $stats stats ($value) ? " -o mspcrunch.stats" : "" ( "" , " -o mspcrunch.stats" )[ value is not None ] "mspcrunch.stats" "mspcrunch.stats" domainer Produce output for Domainer (trim overlaps) (-D) Boolean 0 ($value) ? " -D" : "" ( "" , " -D" )[ value ] Programs-5.1.1/hmmbuild.xml0000644000175000001560000010136011767572177014522 0ustar bneronsis hmmbuild HMMBUILD Build a profile HMM from an input multiple alignment hmm:building hmmbuild alignfile Aligned sequences File Alignment STOCKHOLM " $value" " "+str(value) 30 alphabet Forcing an alphabet in input alignment Choice null null --amino --dna --rna (defined $value and $value ne $vdef)? " $value" : "" ("", " " + str(value) )[ value is not None and value != vdef] The alphabet type (amino, DNA, or RNA) is autodetected by default, by looking at the composition of the msafile. Autodetection is normally quite reliable, but occasionally alphabet type may be ambiguous and autodetection can fail (for instance, on tiny toy alignments of just a few residues). To avoid this, or to increase robustness in automated analysis pipelines, you may specify the alphabet type of msafile with these options. Protein: Specify that all sequences in seqfile are proteins. By default, alphabet type is autodetected from looking at the residue composition. DNA: Specify that all sequences in seqfile are DNAs. RNA: Specify that all sequences in seqfile are RNAs. 2 hmm_textfile String " $alignfile.hmm" " " + str(alignfile) + ".hmm" 20 output_options Output options 1 hmmname Name the HMM (-n) String (defined $value) ? " -n $value" : "" ( "" , " -n " + str(value) )[ value is not None ] 1 Name the new profile. The default is to use the name of the alignment (if one is present in the msafile, or, failing that, the name of the hmmfile. If msafile contains more than one alignment, -n doesn't work, and every alignment must have a name annotated in the msafile (as in Stockholm #=GF ID annotation). re_save Re_save annotated, possibly modified MSA to 'file', in Stockholm format. (-O) Filename (defined $value)? " -O $value" : "" ( "" , " -O " + str(value) )[ value is not None ] 1 After each model is constructed, resave annotated, possibly modified source alignments to a file in Stockholm format. The alignments are annotated with a reference annotation line indicating which columns were assigned as consensus, and sequences are annotated with what relative sequence weights were assigned. Some residues of the alignment may have been shifted to accommodate restrictions of the Plan7 profile architecture, which disallows transitions between insert and delete states.. AlternativeConstruction Alternative model construction strategies 1 These options control how consensus columns are defined in an alignment. fast Quickly and heuristically determine the architecture of the model (fast) Boolean 0 ($value) ? " --fast" : "" ( "" , " --fast" )[ value ] 1 Define consensus columns as those that have a fraction >= symfrac of residues as opposed to gaps. (See the --symfrac option.) This is the default. symfrac Sets sym fraction controlling for the --fast model construction algorithm, (symfrac) Float $fast fast 0.5 (defined $value and $value != $vdef) ? " --symfrac $value" : "" ( "" , " --symfrac " + str(value) )[ value is not None and value !=vdef ] Enter a value >= 0 and <= 1. Define the residue fraction threshold necessary to define a consensus column when using the --fast option. The default is 0.5. The symbol fraction in each column is calculated after taking relative sequence weighting into account, and ignoring gap characters corresponding to ends of sequence fragments (as opposed to internal insertions/deletions). Setting this to 0.0 means that every alignment column will be assigned as consensus, which may be useful in some cases. Setting it to 1.0 means that only columns that have no gap characters at all will be assigned as consensus. Enter a value >= 0 and <= 1 0 <= $value <= 1 0 <= value <= 1 fragthresh Tag sequence as a fragment, (fragthresh) Float 0.5 (defined $value and $value != $vdef) ? " --fragthresh $value" : "" ( "" , " --fragthresh " + str(value) )[ value is not None and value !=vdef ] Enter a value >= 0 and <= 1. We only want to count terminal gaps as deletions if the aligned sequence is known to be full-length, not if it is a fragment (for instance, because only part of it was sequenced). HMMER uses a simple rule to infer fragments: if the sequence length L is less than a fraction x times the mean sequence length of all the sequences in the alignment, then the sequence is handled as a fragment. The default is 0.5. Enter a value >= 0 and <= 1 0 <= $value <= 1 0 <= value <= 1 advanced Advanced options 1 relativeWeight Alternative relative sequence weighting strategies Choice wpb wpb wgsc wblosum wnone infoWgiven ($value ne $vdef and $value ne 'infoWgiven') ? " --$value" : "" ( "" , ' --'+ str(value) )[ value != vdef and value != 'infoWgiven' ] HMMER uses an ad hoc sequence weighting algorithm to downweight closely related sequences and upweight distantly related ones. This has the effect of making models less biased by uneven phylogenetic representation. For example, two identical sequences would typically each receive half the weight that one sequence would. These options control which algorithm gets used. wnp: Use the Henikoff position-based sequence weighting scheme [Henikoff and Henikoff, J. Mol. Biol. 243:574, 1994]. This is the default. wgsc: Use the Gerstein/Sonnhammer/Chothia weighting algorithm [Gerstein et al, J. Mol. Biol. 235:1067, 1994]. wblosum: Use the same clustering scheme that was used to weight data in calculating BLOSUM subsitution matrices [Henikoff and Henikoff, Proc. Natl. Acad. Sci 89:10915, 1992]. Sequences are single-linkage clustered at an identity threshold (default 0.62; see --wid) and within each cluster of c sequences, each sequence gets relative weight 1/c. wnone: No relative weights. All sequences are assigned uniform weight wgiven Personal weights in file MSAFile AbstractText $relativeWeight eq 'infoWgiven' relativeWeight == 'infoWgiven' (defined $value) ? " --wgiven $value" : "" ("", "--wgiven " + str( value ))[value is not None] wid Set identity cutoff for BLOSUM filtering algorithm option (wid) Float $wblosum eq 'wblosum' and not $eset relativeWeight == 'wblosum' and not eset 0.62 (defined $value and $value != $vdef) ? " --wid $value" : "" ("", " --wid " + str(value))[value is not None and value != vdef] Sets the identity threshold used by single-linkage clustering when using --wblosum. Invalid with any other weighting scheme. Default is 0.62. Enter a value >= 0 and <= 1 Enter a value >= 0 and <= 1 0 <= $value and $value <= 1 0 <= value and value <= 1 effectiveWeight Alternate effective sequence weighting strategies Choice not $eset not eset --eent --eent --eclust --enone turnOff ($value ne $vdef and $value ne 'turnOff') ? " $value" : "" ( "" , " " +str(value) )[ value != vdef and value != 'turnOff'] After relative weights are determined, they are normalized to sum to a total effective sequence number, eff nseq. This number may be the actual number of sequences in the alignment, but it is almost always smaller than that. The default entropy weighting method (--eent) reduces the effective sequence number to reduce the information content (relative entropy, or average expected score on true homologs) per consensus position. The target relative entropy is controlled by a two-parameter function, where the two parameters are settable with --ere and --esigma. --eent: Adjust effective sequence number to achieve a specific relative entropy per position (see --ere). This is the default. --eclust: Set effective sequence number to the number of single-linkage clusters at a specific identity threshold (see --eid). This option is not recommended; it's for experiments evaluating how much better --eent is. --enone: Turn off effective sequence number determination and just use the actual number of sequences. One reason you might want to do this is to try to maximize the relative entropy/position of your model, which may be useful for short models eset Set personal effective sequence weighting for all models to value (eset) Float $effectiveWeight eq 'turnOff' effectiveWeight == 'turnOff' (defined $value) ? " --eset $value" : "" ( "" , " --eset " + str(value) )[ value is not None ] Explicitly set the effective sequence number for all models to value ere For personal adjustment of effective sequence weighting: set minimum relative entropy/position to value (ere) Float $effectiveWeight eq "--eent" effectiveWeight == "--eent" (defined $value) ? " --ere $value" : "" ( "" , " --ere " + str(value) )[ value is not None ] Set the minimum relative entropy/position target to value. Requires --eent. Default depends on the sequence alphabet; for protein sequences, it is 0.59 bits/position. esigma For personal adjustment of effective sequence weighting: set sigma parameter to value (esigma) Float $effectiveWeight eq "--eent" effectiveWeight == "--eent" 45.0 (defined $value and $value!=$vdef) ? " --esigma $value" : "" ( "" , " --esigma " + str(value) )[ value is not None and value !=vdef ] Sets the minimum relative entropy contributed by an entire model alignment, over its whole length. This has the effect of making short models have higher relative entropy per position than --ere alone would give. The default is 45.0 bits. eid For single linkage clustering: set fractional identity cutoff to value (eid) Float $effectiveWeight eq "--eclust" and not $eset effectiveWeight == "--eclust" and not eset 0.62 (defined $value and $value!=$vdef) ? " --eid $value" : "" ( "" , " --eid " + str(value) )[ value is not None and value !=vdef ] Enter a value >= 0 and <= 1. Sets the fractional pairwise identity cutoff used by single linkage clustering with the --eclust option. The default is 0.62. Enter a value >= 0 and <= 1 0 <= $value <= 1 0 <= value <= 1 ECalibration Control of E-value calibration 1 The location parameters for the expected score distributions for MSV filter scores, Viterbi filter scores, and Forward scores require three short random sequence simulations. EmL Lengt of sequences for MSV Gumbel mu fit (EmL) Integer 200 (defined $value and $value!=$vdef) ? " --EmL $value" : "" ( "" , " --EmL " + str(value) )[ value is not None and value !=vdef ] Enter a value > 0. Sets the sequence length in simulation that estimates the location parameter mu for MSV filter E-values. Default is 200. Enter a value > 0 $value > 0 value > 0 EmN Number of sequences for MSV Gumbel mu fit (EmN) Integer 200 (defined $value and $value!=$vdef) ? " --EmN $value" : "" ( "" , " --EmN " + str(value) )[ value is not None and value !=vdef ] Enter a value > 0. Sets the number of sequences in simulation that estimates the location parameter mu for MSV filter E-values. Default is 200. Enter a value > 0. $value > 0 value > 0 EvL Lengt of sequences for Viterbi Gumbel mu fit (EvL) Integer 200 (defined $value and $value!=$vdef) ? " --EvL $value" : "" ( "" , " --EvL " + str(value) )[ value is not None and value !=vdef ] Enter a value > 0. Sets the sequence length in simulation that estimates the location parameter mu for Viterbi filter E-values. Default is 200. Enter a value > 0 $value > 0 value > 0 EvN Number of sequences for Viterbi Gumbel mu fit (EvN) Integer 200 (defined $value and $value!=$vdef) ? " --EvN $value" : "" ( "" , " --EvN " + str(value) )[ value is not None and value !=vdef ] Enter a value > 0. Sets the number of sequences in simulation that estimates the location parameter mu for Viterbi filter E-values. Default is 200. Enter a value > 0. $value > 0 value > 0 EfL Lengt of sequences for Forward exp tail tau fit (EfL) Integer 100 (defined $value and $value!=$vdef) ? " --EfL $value" : "" ( "" , " --EfL " + str(value) )[ value is not None and value !=vdef ] Enter a value > 0. Sets the sequence length in simulation that estimates the location parameter tau for Forward E-values. Default is 100. Enter a value > 0 $value > 0 value > 0 EfN Number of sequences for Forward exp tail tau fit (EfN) Integer 200 (defined $value and $value!=$vdef) ? " --EfN $value" : "" ( "" , " --EfN " + str(value) )[ value is not None and value !=vdef ] Enter a value > 0. Sets the number of sequences in simulation that estimates the location parameter tau for Forward E-values. Default is 200. Enter a value > 0 $value > 0 value > 0 Eft Tail mass for Forward exponential tail tau fit (Eft) Float 0.04 (defined $value and $value!=$vdef) ? " --Eft $value" : "" ( "" , " --Eft " + str(value) )[ value is not None and value !=vdef ] Enter a value > 0 and < 1. Sets the tail mass fraction to fit in the simulation that estimates the location parameter tau for Forward evalues. Default is 0.04. Enter a value > 0 and < 1 $value > 0 and $value < 1 value > 0 and value < 1 other Other options 1 seed Set random number seed (seed) Integer 42 (defined $value and $value != $vdef) ? " --seed $value" : "" ( "" , " --seed " + str(value))[ value is not None and value != vdef ] Seed the random number generator with the value, an integer >= 0. If the value is nonzero, any stochastic simulations will be reproducible; the same command will give the same results. If the number is 0, the random number generator is seeded arbitrarily, and stochastic simulations will vary from run to run of the same command. The default seed is 42. laplace Use a Laplace +1 prior Boolean 0 ($value) ? " --laplace" : "" ( "" , " --laplace " )[ value ] hmmfile_res Hmm profile HmmProfile AbstractText HMMER3 *.hmm "*.hmm" re_save_file Alignment file Alignment STOCKHOLM $re_save re_save Programs-5.1.1/bigorf_extract.xml0000644000175000001560000000416011752456727015717 0ustar bneronsis bigorf_extract 1.0 bigorf_extract extract sequences with the largest ORF from a sequence translated with EMBOSS transseq and checktrans E. Deveaud sequence:protein:composition bigorf_extract.py protein_sequences Protein sequences to filter Protein Sequence FASTA This is where you should enter the list of candidate ORFs for each gene 2 " " + value characters_to_strip Integer Integer Number of characters that should be stripped. Default value is 4 because transseq adds 2 characters (_[frame index]) and checktrans another 2 (_[candidate index]) 1 " -s %d" % value 4 protein_sequence_out Sequence Protein Sequence FASTA "bigorf_extract.out" "bigorf_extract.out" Programs-5.1.1/kitsch.xml0000644000175000001560000005521511724156742014202 0ustar bneronsis kitsch kitsch Fitch-Margoliash and Least Squares Methods with Evolutionary Clock http://bioweb2.pasteur.fr/docs/phylip/doc/kitsch.html This program carries out the Fitch-Margoliash and Least Squares methods, plus a variety of others of the same family, with the assumption that all tip species are contemporaneous, and that there is an evolutionary clock (in effect, a molecular clock). This means that branches of the tree cannot be of arbitrary length, but are constrained so that the total length from the root of the tree to any species is the same. phylogeny:distance kitsch String "kitsch <kitsch.params" "kitsch <kitsch.params" 0 infile Distances matrix File PhylipDistanceMatrix AbstractText $infile ne "infile" infile != "infile" "ln -s $infile infile && " "ln -s " + str(infile) + " infile && " -5 Give a file containing a distance matrix obtained by distance matrix programs like prodist or dnadist 5 Alpha 0.000000 0.330447 0.625670 1.032032 1.354086 Beta 0.330447 0.000000 0.375578 1.096290 0.677616 Gamma 0.625670 0.375578 0.000000 0.975798 0.861634 Delta 1.032032 1.096290 0.975798 0.000000 0.226703 Epsilon 1.354086 0.677616 0.861634 0.226703 0.000000 fitch_options Fitch options Method Program method (D) Choice F F "" "" M "D\\n" "D\n" 1 kitsch.params negative_branch Negative branch lengths allowed (-) Boolean 0 ($value) ? "-\\n" : "" ( "" , "-\n")[ value ] 1 kitsch.params power Power (P) Float 2.0 (defined $value and $value != $vdef) ? "P\\n$value\\n" : "" ( "" , "P\n" +str( value ) +"\n")[ value is not None and vdef != value ] 1 For the Fitch-Margoliash method, which is the default method with this program, P is 2.0. For the Cavalli-Sforza and Edwards least squares method it should be set to 0 (so that the denominator is always 1). An intermediate method is also available in which P is 1.0, and any other value of P, such as 4.0 or -2.3, can also be used. This generates a whole family of methods. Please read the documentation (man distance). kitsch.params jumble_options Randomize options jumble Randomize (jumble) input order (J) Boolean not $user_tree not user_tree 0 ($value) ? "J\\n$jumble_seed\\n$jumble_number\\n" : "" ( "" , "J\n" + str( jumble_seed ) + "\n"+str( jumble_number )+ "\n")[ value ] 20 kitsch.params jumble_seed Random number seed (must be odd) Integer $jumble jumble "" "" Random number seed must be odd $value >= 0 and ($value % 2) != 0 value >= 0 and (value % 2) != 0 19 jumble_number Number of times to jumble Integer $jumble jumble 1 "" "" 19 bootstrap Bootstrap options multiple Analyze multiple data sets (M) Boolean 0 ($value) ? "M\\n$numtiple_number\\n$multiple_seed\\n" : "" ( "" , "M\n"+str(multiple_number)+"\n"+str(multiple_seed)+"\n")[ value ] 10 kitsch.params multiple_number How many data sets Integer $multiple multiple "" "" There must be no more than 1000 datasets for this server $value <= 1000 value <= 1000 9 multiple_seed Random number seed (must be odd) Integer $multiple multiple "" "" Random number seed must be odd $value >= 0 and ($value % 2) != 0 value >= 0 and (value % 2) != 0 19 consense Compute a consensus tree Boolean $multiple and $print_treefile multiple and print_treefile 0 ($value) ? " && cp infile kitsch.infile && cp kitsch.outtree intree && consense <consense.params && mv outtree consense.outtree && mv outfile consense.outfile" : "" ("" , " && cp infile kitsch.infile && cp kitsch.outtree intree && consense <consense.params && mv outtree consense.outtree && mv outfile consense.outfile" )[ value ] 10 consense_confirm String $consense consense "Y\\n" "Y\n" 1000 consense.params consense_terminal_type String $consense consense "T\\n" "T\n" -2 consense.params consense_outfile Consense output file Text $consense consense "consense.outfile" "consense.outfile" consense_treefile Consense tree file Tree NEWICK $consense consense "consense.outtree" "consense.outtree" user_tree_opt User tree options user_tree Use User tree (default: No, search for best tree) (U) Boolean 0 ($value) ? "U\\n" : "" ( "" , "U\n" )[ value ] You cannot randomize (jumble) your dataset and give a user tree at the same time not ( $user_tree and $jumble ) not ( user_tree and jumble ) 1 The U (User Tree) option requires a bifurcating tree, unlike FITCH, which requires an unrooted tree with a trifurcation at its base. If a tree with a trifurcation at the base is by mistake fed into the U option of KITSCH then some of its species (the entire rightmost furc, in fact) will be ignored and too small a tree read in. This should result in an error message and the program should stop. It is important to understand the difference between the User Tree formats for KITSCH and FITCH. You may want to use RETREE to convert a user tree that is suitable for FITCH into one suitable for KITSCH or vice versa. kitsch.params tree_file User Tree file Tree NEWICK $user_tree user_tree (defined $value) ? "cat $tree_file >> intree && " : "" ( "" , "cat "+ str( value ) +" >> intree && " ) [ value is not None ] -1 Note that the User Trees (used by option U) must be rooted trees (with a bifurcation at their base). If you take a user tree from FITCH and try to evaluate it in KITSCH, it must first be rooted. This can be done using RETREE output Output options print_tree Print out tree (3) Boolean 1 ($value) ? "" : "3\\n" ( "3\n" ,"" )[ value ] 1 Tells the program to print a semi-graphical picture of the tree in the outfile. kitsch.params print_treefile Write out trees onto tree file (4) Boolean 1 ($value) ? "" : "4\\n" ( "4\n" , "" )[ value ] 1 Tells the program to save the tree in a treefile (a standard representation of trees where the tree is specified by a nested pairs of parentheses, enclosing names and separated by commas). kitsch.params printdata Print out the data at start of run (1) Boolean 0 ($value)? "1\\n" : "" ("", "1\n" )[ value ] 1 kitsch.params other_options Other options triangular Matrix format Choice square square "" "" lower "L\\n" "L\n" upper "R\\n" "R\n" 1 kitsch.params subreplicates Subreplicates (S) Boolean 0 ($value) ? "S\\n" : "" ( "" , "S\n" )[ value ] 1 If the S (subreplication) option is in effect, the above degrees of freedom must be modified by noting that N is not n(n-1)/2 but is the sum of the numbers of replicates of all cells in the distance matrix read in, which may be either square or triangular. A further explanation of the statistical test of the clock is given in a paper of mine (Felsenstein, 1986). kitsch.params outfile Kitsch output file Text " && mv outfile kitsch.outfile" " && mv outfile kitsch.outfile" "kitsch.outfile" "kitsch.outfile" treefile Kitch tree file Tree NEWICK $print_treefile print_treefile " && mv outtree kitsch.outtree" " && mv outtree kitsch.outtree" "kitsch.outtree" "kitsch.outtree" confirm String "Y\\n" "Y\n" 1000 kitsch.params terminal_type String "0\\n" "0\n" -1 kitsch.params Programs-5.1.1/octanol.xml0000644000175000001560000002372012072525233014340 0ustar bneronsis octanol EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net octanol Draw a White-Wimley protein hydropathy plot http://bioweb2.pasteur.fr/docs/EMBOSS/octanol.html http://emboss.sourceforge.net/docs/themes sequence:protein:composition octanol e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_datafile White-wimley data file WhiteWimleyDatafile AbstractText ("", " -datafile=" + str(value))[value is not None ] 2 e_additional Additional section e_width Window size (value from 1 to 200) Integer 19 ("", " -width=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 Value less than or equal to 200 is required value <= 200 3 e_output Output section e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 4 xy_goutfile Name of the output graph Filename octanol_xygraph ("" , " -goutfile=" + str(value))[value is not None] 5 xy_outgraph_png Graph file Picture Binary e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_graph == "data" "*.dat" e_plotoctanol Display the octanol plot Boolean 0 ("", " -plotoctanol")[ bool(value) ] 6 e_plotinterface Display the interface plot Boolean 0 ("", " -plotinterface")[ bool(value) ] 7 e_plotdifference Display the difference plot Boolean 1 (" -noplotdifference", "")[ bool(value) ] 8 auto Turn off any prompting String " -auto -stdout" 9 Programs-5.1.1/supermatcher.xml0000644000175000001560000005430712072525233015410 0ustar bneronsis supermatcher EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net supermatcher Calculate approximate local pair-wise alignments of larger sequences http://bioweb2.pasteur.fr/docs/EMBOSS/supermatcher.html http://emboss.sourceforge.net/docs/themes alignment:pairwise:local supermatcher e_input Input section e_asequence asequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -asequence=" + str(value))[value is not None] 1 e_bsequence bsequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 2,n ("", " -bsequence=" + str(value))[value is not None] 2 e_datafile Matrix file Choice mobyle_null mobyle_null EBLOSUM30 EBLOSUM35 EBLOSUM40 EBLOSUM45 EBLOSUM50 EBLOSUM55 EBLOSUM60 EBLOSUM62 EBLOSUM62-12 EBLOSUM65 EBLOSUM70 EBLOSUM75 EBLOSUM80 EBLOSUM85 EBLOSUM90 EBLOSUMN EDNAFULL EDNAMAT EDNASIMPLE EPAM10 EPAM100 EPAM110 EPAM120 EPAM130 EPAM140 EPAM150 EPAM160 EPAM170 EPAM180 EPAM190 EPAM20 EPAM200 EPAM210 EPAM220 EPAM230 EPAM240 EPAM250 EPAM260 EPAM270 EPAM280 EPAM290 EPAM30 EPAM300 EPAM310 EPAM320 EPAM330 EPAM340 EPAM350 EPAM360 EPAM370 EPAM380 EPAM390 EPAM40 EPAM400 EPAM410 EPAM420 EPAM430 EPAM440 EPAM450 EPAM460 EPAM470 EPAM480 EPAM490 EPAM50 EPAM500 EPAM60 EPAM70 EPAM80 EPAM90 SSSUB ("", " -datafile=" + str(value))[value is not None and value!=vdef] 3 This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. e_minscore Minimum alignment score (value greater than or equal to 0) Float 0 ("", " -minscore=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 4 Minimum alignment score to report an alignment. e_required Required section e_gapopen Gap opening penalty (value from 0.0 to 100.0) Float ("", " -gapopen=" + str(value))[value is not None] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 100.0 is required value <= 100.0 5 10.0 for any sequence type e_gapextend Gap extension penalty (value from 0.0 to 10.0) Float ("", " -gapextend=" + str(value))[value is not None] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 10.0 is required value <= 10.0 6 0.5 for any sequence type e_additional Additional section e_width Alignment width (value greater than or equal to 1) Integer 16 ("", " -width=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 7 e_wordlen Word length for initial matching (value greater than or equal to 3) Integer 6 ("", " -wordlen=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 3 is required value >= 3 8 e_output Output section e_outfile Name of the output alignment file Filename supermatcher.align ("" , " -outfile=" + str(value))[value is not None] 9 e_aformat_outfile Choose the alignment output format Choice SIMPLE FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH ("", " -aformat=" + str(value))[value is not None and value!=vdef] 10 e_outfile_out outfile_out option Alignment e_aformat_outfile in ['FASTA', 'MSF'] e_outfile e_outfile_out2 outfile_out2 option Text e_aformat_outfile in ['PAIR', 'MARKX0', 'MARKX1', 'MARKX2', 'MARKX3', 'MARKX10', 'SRS', 'SRSPAIR', 'SCORE', 'UNKNOWN', 'MULTIPLE', 'SIMPLE', 'MATCH'] e_outfile e_errorfile errorfile option Filename supermatcher.e_errorfile ("" , " -errorfile=" + str(value))[value is not None] 11 Error file to be written to e_errorfile_out errorfile_out option SupermatcherError AbstractText e_errorfile auto Turn off any prompting String " -auto -stdout" 12 Programs-5.1.1/blast2genoclass.xml0000644000175000001560000005121011767572177016005 0ustar bneronsis blast2genoclass 1.0 blast2genoclass One-line description of Blast program filtering C. Maufrais database:search:filter blast2genoclass infile Blast output file BlastTextReport Report " -i $value" " -i " + str(value) 20 blastfilter Filter the one-line description of Blast program with: Choice Null Null M F ($value ne vdef) ? " -$value" : "" ("", " -" + str(value))[value != vdef] nbofhit Number of hsp to consider (-x) Integer 10 (defined $value) ? " -x $value" : "" ("", " -x " + str(value) )[value is not None and value != vdef] 0: all hsp genomic_name Filter the one-line description of Blast program with user name (-p) String (defined $value) ? " -p $value" : "" ("", " -p " + str(value).replace(' ','_') )[value is not None] Choose only one of the one-line description filter option: User name or not. (defined $blastfilter and (not defined $genomic_name)) or (defined $genomic_name and (not defined $blastfilter)) (blastfilter is not None and (genomic_name is None)) or (genomic_name is not None and (blastfilter is None)) taxonomic_name Filter the hit of Blast with Taxonomic hierarchy name (-n) String (defined $value) ? " -n $value" : "" ("", " -n " + str(value).replace(' ','_') )[value is not None] output Output option verboseall Report detailed results matching "Description filter option" for all blast (-v) Boolean 0 ($value)? " -v" : "" ("" , " -v") [value] In "res4individualBlast.txt" file, for all input blast, are details: For all matching "Description filter option": - query name, (query letter): percentage of matching description and for all corresponding hits: - Database sequence's species, accession number and letters - Hsp description verbose Detailed report for database sequence(s) matching "Description filter option" (-V) Boolean 0 ($value)? " -V" : "" ("" , " -V") [value] In "res4allBlast.txt" file, for the best database sequence(s), are details: For database sequence matching "Description filter option": - Database sequence's species, accession number, letters and description - Number of query matching this sequence. - Query name, (letters) and for all corresponding hsp: - Hsp description option Hsp(s) selection (-m) Choice 1 0 1 2 3 ($value ne vdef) ? " -m $value" : "" ("", " -m " + str(value))[value != vdef] align Produce alignment: database sequence matching "Description filter option" vs queries (-a) Choice 0 0 1 2 3 4 5 6 ($value ne vdef)? " -a $value" : "" ("" , " -a " +str(value)) [value != vdef] For 1,2,3 hsps alignments (Sbjt and Query) are re-aligned on the reference sequence extract from database. For 4,5,6 part of queries corresponding to hsps are re-aligned on the reference sequence extract from database. picture Produce graphical alignment summary images: database sequence matching "Description filter option" vs queries (-g) Boolean 0 ($value)? " -g" : "" ("" , " -g") [value] blastout Blast output file(s) sort/split by specific taxonomic hierarchy (-b) Boolean 0 ($value)? " -b" : "" ("" , " -b") [value] hspSeq Extract Hsp(s) fragment from Query sequence(s) (-Q) Boolean 0 ($value)? " -Q" : "" ("" , " -Q") [value] queryout Query name write in file(s) sort/split by specific taxonomic hierarchy (-q) Boolean 0 ($value)? " -q" : "" ("" , " -q") [value] besthitseq Report database sequence(s) matching option in fasta file (-s) Boolean 0 ($value)? " -s" : "" ("" , " -s") [value] fastaExtract Extraction of fasta sequences. Boolean 0 Query name write in file must be checked and query sequences must be done. $fastaExtract == 1 and $queryout == 1 and defined $query_seq (fastaExtract and (queryout and query_seq is not None)) or (not fastaExtract) Extract fasta sequence, matching specified taxonomic filter, from file containing query sequences witch are used to made blast. query_seq Query sequences witch are used to made blast. Sequence FASTA 1,n query_seq_run1 Query sequences witch are used to made blast. Sequence FASTA 1,n defined $hspSeq and defined $query_seq hspSeq and query_seq (defined $value)? " -f $query_seq": "" (""," -f "+ str(query_seq)) [query_seq is not None] query_seq_run2 Query sequences witch are used to made blast. Sequence FASTA 1,n defined $fastaExtract and defined $queryout and defined $query_seq fastaExtract and queryout and query_seq (defined $value)? " && extractfasta -i $query_seq *.qry": "" (""," && extractfasta -i "+ str(query_seq) + " *.qry") [query_seq is not None] 100 outfile Output file Blast2taxoclassReport Report "blast2genoclass.out" "blast2genoclass.out" pictureout Graphical output Picture Binary defined $picture picture "*.png" "*.png" alignout Alignment GenoClasAln Report defined $align align "*.aln" "*.aln" verboseoutall Verbose output file for all blast VerboseReport Report defined $verboseall verboseall "res4individualBlast.txt" "res4individualBlast.txt" verboseout Verbose output file for database sequence(s) VerboseReport Report defined $verbose verbose "res4allBlast.txt" "res4allBlast.txt" blastoutfile Blast output file(s) BlastTextReport Report defined $blastout blastout "*.blast" "*.blast" queryoutfile Query name file QueryNameReport Report defined $queryout queryout "*.qry" "*.qry" besthitseqfile Database sequence(s) fasta file Sequence FASTA defined $besthitseq besthitseq "*.dbfasta" "*.dbfasta" fastafile Fasta file Sequence FASTA defined $fastaExtract or defined $hspSeq fastaExtract or hspSeq "*.fasta" "*.fasta" Programs-5.1.1/fetchSequences.xml0000644000175000001560000000542111767572177015667 0ustar bneronsis fetchSequences 1.0 fetch sequences Retrieve sequences in databases from list of identifier (USA list) EMBOSS http://emboss.sourceforge.net/ http://bioweb2.pasteur.fr/docs/EMBOSS/seqret.html http://emboss.sourceforge.net/docs/themes database:search:sequence seqret input Input section USAList list of sequences identifier in USA format GenesId AbstractText USAList 1 " @" + str(value) 2 list of identifiers in USA format: databank:Acc ( one item per line ) sp:Q74K65 sp:Q2W4W1 sp:P63394 sp:P63393 sp:P18767 sp:Q042G7 output Output section sequence_out the sequences Sequence (1,n) "fetchSequences.out" auto Turn off any prompting String " -auto -stdout" 10 Programs-5.1.1/cpgplot.xml0000644000175000001560000004276212072525233014360 0ustar bneronsis cpgplot EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net cpgplot Identify and plot CpG islands in nucleotide sequence(s) http://bioweb2.pasteur.fr/docs/EMBOSS/cpgplot.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:cpg_islands cpgplot e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_window Window size (value greater than or equal to 1) Integer 100 ("", " -window=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 2 The percentage CG content and the Observed frequency of CG is calculated within a window whose size is set by this parameter. The window is moved down the sequence and these statistics are calculated at each position that the window is moved to. e_minlen Minimum length of an island (value greater than or equal to 1) Integer 200 ("", " -minlen=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 3 This sets the minimum length that a CpG island has to be before it is reported. e_minoe Minimum observed/expected (value from 0. to 10.) Float 0.6 ("", " -minoe=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0. is required value >= 0. Value less than or equal to 10. is required value <= 10. 4 This sets the minimum average observed to expected ratio of C plus G to CpG in a set of 10 windows that are required before a CpG island is reported. e_minpc Minimum percentage (value from 0. to 100.) Float 50. ("", " -minpc=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0. is required value >= 0. Value less than or equal to 100. is required value <= 100. 5 This sets the minimum average percentage of G plus C a set of 10 windows that are required before a CpG island is reported. e_output Output section e_outfile Name of the output file (e_outfile) Filename cpgplot.e_outfile ("" , " -outfile=" + str(value))[value is not None] 6 This sets the name of the file holding the report of the input sequence name, CpG island parameters and the output details of any CpG islands that are found. e_outfile_out outfile_out option CpgplotReport Report e_outfile e_plot Plot cpg island score Boolean 1 (" -noplot", "")[ bool(value) ] 7 e_graph Choose the e_graph output format Choice e_plot png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 8 xy_goutfile Name of the output graph Filename e_plot cpgplot_xygraph ("" , " -goutfile=" + str(value))[value is not None] 9 xy_outgraph_png Graph file Picture Binary e_plot and e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_plot and e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_plot and e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_plot and e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_plot and e_graph == "data" "*.dat" e_obsexp Show observed/expected threshold line Boolean 1 (" -noobsexp", "")[ bool(value) ] 10 If this is set to true then the graph of the observed to expected ratio of C plus G to CpG within a window is displayed. e_cg Show cpg rich regions Boolean 1 (" -nocg", "")[ bool(value) ] 11 If this is set to true then the graph of the regions which have been determined to be CpG islands is displayed. e_pc Show percentage line Boolean 1 (" -nopc", "")[ bool(value) ] 12 If this is set to true then the graph of the percentage C plus G within a window is displayed. e_outfeat Name of the output feature file (e_outfeat) DNA Filename cpgplot.e_outfeat ("" , " -outfeat=" + str(value))[value is not None] 13 File for output features e_offormat_outfeat Choose the feature output format DNA Choice GFF GFF EMBL SWISSPROT NBRF CODATA ("", " -offormat=" + str(value))[value is not None and value!=vdef] 14 e_outfeat_out outfeat_out option DNA Feature AbstractText e_outfeat auto Turn off any prompting String " -auto -stdout" 15 Programs-5.1.1/einverted.xml0000644000175000001560000002303112072525233014661 0ustar bneronsis einverted EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net einverted Finds inverted repeats in nucleotide sequences http://bioweb2.pasteur.fr/docs/EMBOSS/einverted.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:repeats sequence:nucleic:2D_structure structure:2D_structure einverted e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_gap Gap penalty Integer 12 ("", " -gap=" + str(value))[value is not None and value!=vdef] 2 e_threshold Minimum score threshold Integer 50 ("", " -threshold=" + str(value))[value is not None and value!=vdef] 3 e_match Match score Integer 3 ("", " -match=" + str(value))[value is not None and value!=vdef] 4 e_mismatch Mismatch score Integer -4 ("", " -mismatch=" + str(value))[value is not None and value!=vdef] 5 e_additional Additional section e_maxrepeat Maximum extent of repeats Integer 2000 ("", " -maxrepeat=" + str(value))[value is not None and value!=vdef] 6 Maximum separation between the start of repeat and the end of the inverted repeat (the default is 2000 bases). e_output Output section e_outfile Name of the output file (e_outfile) Filename outfile.inv ("" , " -outfile=" + str(value))[value is not None] 7 e_outfile_out outfile_out option InvertedReport Report e_outfile e_outseq Name of the output sequence file (e_outseq) Filename einverted.e_outseq ("" , " -outseq=" + str(value))[value is not None] 8 The sequence of the inverted repeat regions without gap characters. e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 9 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 10 Programs-5.1.1/seqretsetall.xml0000644000175000001560000001347312072525233015415 0ustar bneronsis seqretsetall EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net seqretsetall Reads and writes (returns) many sets of sequences http://bioweb2.pasteur.fr/docs/EMBOSS/seqretsetall.html http://emboss.sourceforge.net/docs/themes sequence:edit seqretsetall e_input Input section e_feature Use feature information Boolean 0 ("", " -feature")[ bool(value) ] 1 e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 2,n ("", " -sequence=" + str(value))[value is not None] 2 e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename seqretsetall.e_outseq ("" , " -outseq=" + str(value))[value is not None] 3 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 4 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/pepwindowall.xml0000644000175000001560000002272411672346320015414 0ustar bneronsis pepwindowall EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net pepwindowall Draw Kyte-Doolittle hydropathy plot for a protein alignment http://bioweb2.pasteur.fr/docs/EMBOSS/pepwindowall.html http://emboss.sourceforge.net/docs/themes sequence:protein:composition pepwindowall e_input Input section e_sequences sequences option Protein Alignment FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH 1,n ("", " -sequences=" + str(value))[value is not None] 1 File containing a sequence alignment e_datafile Aaindex entry data file Protein AaindexData AbstractText ("", " -datafile=" + str(value))[value is not None ] 2 e_additional Additional section e_length Window size (value from 1 to 200) Integer 19 ("", " -length=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 Value less than or equal to 200 is required value <= 200 3 e_normalize Normalize data values Boolean 0 ("", " -normalize")[ bool(value) ] 4 e_output Output section e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 5 xy_goutfile Name of the output graph Filename pepwindowall_xygraph ("" , " -goutfile=" + str(value))[value is not None] 6 xy_outgraph_png Graph file Picture Binary e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_graph == "data" "*.dat" auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/rnafold.xml0000644000175000001560000004747011672710655014347 0ustar bneronsis rnafold RNAfold Calculate secondary structures of RNAs Hofacker, Fontana, Bonhoeffer, Stadler I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994) Fast Folding and Comparison of RNA Secondary Structures. Monatshefte f. Chemie 125: 167-188 A. Walter, D Turner, J Kim, M Lyttle, P Muller, D Mathews, M Zuker Coaxial stacking of helices enhances binding of oligoribonucleotides. PNAS, 91, pp 9218-9222, 1994 M. Zuker, P. Stiegler (1981) Optimal computer folding of large RNA sequences using thermodynamic and auxiliary information, Nucl Acid Res 9: 133-148 J.S. McCaskill (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structures, Biopolymers 29: 11051119 D.H. Turner N. Sugimoto and S.M. Freier (1988) RNA structure prediction, Ann Rev Biophys Biophys Chem 17: 167-192 D. Adams (1979) The hitchhiker's guide to the galaxy, Pan Books, London http://www.tbi.univie.ac.at/RNA/RNAfold.html RNAfold reads RNA sequences, calculates their minimum free energy (mfe) structure and prints the mfe structure in bracket notation and its free energy. It also produces PostScript files with plots of the resulting secondary structure graph and a "dot plot" of the base pairing matrix. sequence:nucleic:2D_structure structure:2D_structure RNAfold seqin RNA/DNA Sequence File RNA DNA SequenceWithStructureConstraint AbstractText FASTA " <$value" " <"+str(value) The string for the structure constraint must be of the length of the sequence. Rq: No constraint should be applied during structure predictions. Structure constraint: | : paired with another base > : base i is paired with a base j>i x : base must not pair < : base i is paired with a base j<i . : no constraint at all matching brackets ( ): base i pairs base j 1000 control Control options 2 partition Calculate the partition function and base pairing probability matrix (-p) Boolean not $pf not pf 0 ($value)? " -p" : "" ( "" , " -p" )[ value ] Calculate the partition function and base pairing probability matrix in addition to the mfe structure. Default is calculation of mfe structure only. Prints a coarse representation of the pair probabilities in form of a pseudo bracket notation, the ensemble free energy, the frequency of the mfe structure, and the structural diversity. Note that unless you also specify -d2 or -d0, the partition function and mfe calculations will use a slightly different energy model. See the discussion of dangling end options below. pf Calculate the partition function but not the pair probabilities (-p0) Boolean not $partition not partition 0 (defined $value)? " -p0" : "" ( "" , " -p0" )[ value ] Calculate the partition function but not the pair probabilities, saving about 50% in runtime. Prints the ensemble free energy -kT ln(Z). p2 In addition to pair probabilities compute stack probabilities (-p2) Boolean 0 (defined $value)? " -p2" : "" ( "" , " -p2 " )[ value ] In addition to pair probabilities compute stack probabil- ities, i.e. the probability that a pair (i,j and the immediately interior pair (i+1,j-1) are formed simultane- ously. A second postscript dot plot called "name_dp2.ps", or "dot2.ps" (if the sequence does not have a name), is produced that contains pair probabilities in the upper right half and stack probabilities in the lower left. mea Calculate an MEA (maximum expected accuracy) structure Float defined $partition partition 1 (defined $value and $value != $vdef)? " -MEA $value" : "" ( "" , " -MEA " + str(value) )[ value is not None and value != vdef] Calculate an MEA (maximum expected accuracy) structure, where the expected accuracy is computed from the pair probabilities: each base pair (i,j) gets a score 2*gamma*p_ij and the score of an unpaired base is given by the probability of not forming a pair. The parameter gamma tunes the importance of correctly predicted pairs versus unpaired bases. Thus, for small values of gamma the MEA structure will contain only pairs with very high probability. The default value is gamma=1. Using -MEA implies -p for computing the pair probabilities. temperature Rescale energy parameters to a temperature of temp C. (-T) Integer 37 (defined $value and $value != $vdef)? " -T $value" : "" ( "" , " -T " + str(value) )[ value is not None and value != vdef] tetraloops Do not include special stabilizing energies for certain tetraloops (-4) Boolean 0 ($value)? " -4" : "" ( "" , " -4" )[ value ] dangling How to treat dangling end energies for bases adjacent to helices in free ends and multiloops (-d) Choice -d1 -d1 -d -d2 (defined $value and $value ne $vdef)? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] How to treat 'dangling end' energies for bases adjacent to helices in free ends and multiloops: Normally only unpaired bases can participate in at most one dangling end. With -d2 this check is ignored, this is the default for partition function folding (-p). -d ignores dangling ends altogether. Note that by default pf and mfe folding treat dangling ends differently, use -d2 (or -d) in addition to -p to ensure that both algorithms use the same energy model. scale Use scale*mfe as an estimate for the free energy (-S) Integer (defined $value)? " -S $value" : "" ( "" , " -S " + str(value) )[ value is not None ] In the calculation of the pf use scale*mfe as an estimate for the ensemble free energy (used to avoid overflows). The default is 1.07, usefull values are 1.0 to 1.2. Occasionally needed for long sequences. You can also recompile the programm to use double precision. input Input parameters 2 constraints Calculate structures subject to constraints (-C) Boolean 0 ($value)? " -C" : "" ( "" , " -C" )[ value ] The programm reads first the sequence then the a string containg constraints on the structure encoded with the symbols: | (the corresponding base has to be paired x (the base is unpaired) < (base i is paired with a base j>i) > (base i is paired with a base j<i) matching brackets ( ) (base i pairs base j) Pf folding ignores constraints of type '|' '<' and '>', but disallow all pairs conflicting with a constraint of type 'x' or '( )'. This is usually sufficient to enforce the constraint. noLP Avoid lonely pairs (helices of length 1) (-noLP) Boolean 0 ($value)? " -noLP" : "" ( "" , " -noLP" )[ value ] Produce structures without lonely pairs (helices of length 1). For partition function folding this only disallows pairs that can only occur isolated. Other pairs may still occasionally occur as helices of length 1. noGU Do not allow GU pairs (-noGU) Boolean 0 ($value)? " -noGU" : "" ( "" , " -noGU" )[ value ] noCloseGU Do not allow GU pairs at the end of helices (-noCloseGU) Boolean 0 ($value)? " -noCloseGU" : "" ( "" , " -noCloseGU" )[ value ] nsp Non standard pairs (comma seperated list) (-nsp) String (defined $value)? " -nsp $value" : "" ( "" , " -nsp " + str(value) )[ value is not None ] Allow other pairs in addition to the usual AU, GC and GU pairs. Pairs is a comma seperated list of additionally allowed pairs. If a the first character is a '-' then AB will imply that AB and BA are allowed pairs. e.g. RNAfold -nsp -GA will allow GA and AG pairs. Nonstandard pairs are given 0 stacking energy. parameter Energy parameter file (-P) EnergyParameter AbstractText (defined $value)? " -P $value" : "" ( "" , " -P " + str(value) )[ value is not None ] Read energy parameters from paramfile, instead of using the default parameter set. output_options Output options outfile_name Result file SequenceWithStructureConstraint AbstractText "$seqin.tmp" str(seqin)+'.tmp' outfile Result file RnafoldOutput AbstractText "rnafold.out" "rnafold.out" readseq String not $constraints/ not constraints "mv $seqin $seqin.ori && readseq -f=19 -a $seqin.ori > $seqin && " "mv %s %s.ori && readseq -f=19 -a %s.ori > %s && " %( seqin , seqin , seqin , seqin ) -10 psfiles Postscript file PostScript Binary "*.ps" "*.ps" Programs-5.1.1/rnaup.xml0000644000175000001560000005026411643535773014046 0ustar bneronsis rnaup 1.8.4 RNAup Calculate the thermodynamics of RNA-RNA interactions Ivo L Hofacker, Peter F Stadler, Stephan Bernhart I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994) Fast Folding and Comparison of RNA Secondary Structures. Monatshefte f. Chemie 125: 167-188 S.H.Bernhart, Ch. Flamm, P.F. Stadler, I.L. Hofacker Partition Function and Base Pairing Probabilities of RNA Heterodimers Algorithms Mol. Biol. (2006) D.H. Mathews, J. Sabina, M. Zuker and H. Turner "Expanded Sequence Dependence of Thermodynamic Parameters Provides Robust Prediction of RNA Secondary Structure" JMB, 288, pp 911-940, 1999 http://bioweb2.pasteur.fr/gensoft/sequence/nucleic/2D_structure.html#ViennaRNa RNAup calculates the thermodynamics of RNA-RNA interactions, by decomposing the binding into two stages. (1) First the probability that a potential binding sites remains unpaired (equivalent to the free energy needed to open the site) is computed. (2) Then this accessibility is combined with the interaction energy to obtain the total binding energy. All calculations are done by computing partition functions over all possible conformations. sequence:nucleic:2D_structure RNAup seq RNA Sequences File RNA RNASequence AbstractText " < $value" " < " + str(value) 1000 Each line of file corresponds to one sequence, except for lines starting with ">" which contain the name of the next sequence. To compute the hybrid structure of two molecules, the two sequences must be concatenated using the & character as separator. ACGAUCAGAGAUCAGAGCAUACGACAGCAG&ACGAAAAAAAGAGCAUACGACAGCAG freeEnergie Options for calculation of free energies 2 partition_free Specifies the length of the unstructured region (-u) String 4 ($value)? " -u $value" : "" ( "" , " -u "+ str(value) )[ value is not None and value != vdef ] Specifies the length (len) of the unstructured region in the output. The default value is 4. The probability of being unpaired is plotted on the right border of the unpaired region. You can specify up to 20 different length values: use "-" to specify a range of continuous values (e.g. -u 4-8) or specify a list of comma separated values (e.g. -u 4,8,15). shime Option allows to get the different contributions to the probability of being unpaired. (-c) String ($value)? " -c $value" : "" ( "" , " -c " +str(value) )[ value is not None ] by default only the full probability of being unpaired is plot- ted. The -c option allows to get the different contributions (c) to the probability of being unpaired: The full probability of being unpaired ("S") is the sum of the probability of being unpaired in the exterior loop ("E"), within a hairpin loop ("H"), within an interior loop ("I") and within a multiloop ("M"). Any combination of these letters may be given. interaction Options for calculation of interaction 2 Length Determines the maximal length of the region of interaction (-w) Integer 25 (defined $value and $value != $vdef)? " -w $value" : "" ( "" , " -w " + str(value) )[ value is not None and value != vdef] Determines the maximal length of the region of interaction, the default is 25. unpaired Include the probability of unpaired regions in both RNAs (-b) Boolean 0 (defined $value and $value != $vdef)? " -b " : "" ( "" , " -b " )[ value ] Include the probability of unpaired regions in both (b) RNAs. By default only the probability of being unpaired in the longer RNA (target) is used. side5 Extend the region of interaction in the target by len residues to the 5' side Integer (defined $value and $value != $vdef)? " -5 $value" : "" ( "" , " -5 " + str(value) )[ value is not None ] These options extend the region of interaction in the target by len residues to the 5' and 3' side, respectively. The underlying assumption is that it is favorable for an interaction if not only the direct region of contact is unpaired but also a few residues 5' and 3' of this region. side3 Extend the region of interaction in the target by len residues to the 3' side Integer (defined $value and $value != $vdef)? " -3 $value" : "" ( "" , " -3 " + str(value) )[ value is not None ] These options extend the region of interaction in the target by len residues to the 5' and 3' side, respectively. The underlying assumption is that it is favorable for an interaction if not only the direct region of contact is unpaired but also a few residues 5' and 3' of this region. interaction Interaction mode Choice null null -Xp -Xf (defined $value and $value != $vdef)? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] Xp: Pairwise (p) interaction is calculated: The first sequence interacts with the 2nd, the third with the 4th etc. If -Xp is selected two interacting sequences may be given in a single line separated by "&" or each sequence may be given on an extra line. Xf: The interaction of each sequence with the first one is calculated (e.g. interaction of one mRNA with many small RNAs). Each sequence has to be given on an extra line. target Use the first sequence in the input file as the target Boolean 0 (defined $value and $value != $vdef)? " -target " : "" ( "" , " -target " )[ value ] Use the first sequence in the input file as the target. No length check is done general General option 2 temperature Rescale energy parameters to a temperature of temp C. (-T) Integer 37 (defined $value and $value != $vdef)? " -T $value" : "" ( "" , " -T " + str(value) )[ value is not None and value != vdef] tetraloops Do not include special stabilizing energies for certain tetraloops (-4) Boolean 0 ($value)? " -4" : "" ( "" , " -4" )[ value ] dangling How to treat dangling end energies for bases adjacent to helices in free ends and multiloops (-d) Choice -d1 -d1 -d -d2 (defined $value and $value ne $vdef)? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] How to treat 'dangling end' energies for bases adjacent to helices in free ends and multiloops: Normally only unpaired bases can participate in at most one dangling end. With -d2 this check is ignored, this is the default for partition function folding (-p). -d ignores dangling ends altogether. Note that by default pf and mfe folding treat dangling ends differently, use -d2 (or -d) in addition to -p to ensure that both algorithms use the same energy model. The -d2 options is available for RNAfold, RNAeval, and RNAinverse only. scale Use scale*mfe as an estimate for the free energy (-S) Float (defined $value)? " -S $value" : "" ( "" , " -S " + str(value) )[ value is not None ] In the calculation of the pf use scale*mfe as an estimate for the ensemble free energy (used to avoid overflows). The default is 1.07, usefull values are 1.0 to 1.2. Occasionally needed for long sequences. You can also recompile the programm to use double precision (see the README file). noLP Avoid lonely pairs (helices of length 1) (-noLP) Boolean 0 ($value)? " -noLP" : "" ( "" , " -noLP" )[ value ] Produce structures without lonely pairs (helices of length 1). For partition function folding this only disallows pairs that can only occur isolated. Other pairs may still occasionally occur as helices of length 1. noGU Do not allow GU pairs (-noGU) Boolean 0 ($value)? " -noGU" : "" ( "" , " -noGU" )[ value ] noCloseGU Do not allow GU pairs at the end of helices (-noCloseGU) Boolean 0 ($value)? " -noCloseGU" : "" ( "" , " -noCloseGU" )[ value ] nsp Non standard pairs (comma seperated list) (-nsp) String (defined $value)? " -nsp $value" : "" ( "" , " -nsp " + str(value) )[ value is not None ] Allow other pairs in addition to the usual AU,GC,and GU pairs. pairs is a comma seperated list of additionally allowed pairs. If a the first character is a '-' then AB will imply that AB and BA are allowed pairs. e.g. RNAfold -nsp -GA will allow GA and AG pairs. Nonstandard pairs are given 0 stacking energy. parameter Energy parameter file (-P) EnergyParameterFile AbstractText (defined $value)? " -P $value" : "" ( "" , " -P " + str(value) )[ value is not None ] Read energy parameters from paramfile, instead of using the default parameter set. A sample parameterfile should accompany your distribution. See the RNAlib documentation for details on the file format. outfiles RNAup output RNAupOut AbstractText "RNA*.out" "RNA*.out" Programs-5.1.1/rnasubopt.xml0000644000175000001560000003263711767572177014750 0ustar bneronsis rnasubopt RNAsubopt Calculate suboptimal secondary structures of RNAs Wuchty, Hofacker, Fontana S. Wuchty, W. Fontana, I. L. Hofacker and P. Schuster Complete Suboptimal Folding of RNA and the Stability of Secondary Structures, Biopolymers, 49, 145-165 (1999) RNAsubopt reads RNA sequences from file and (in the default -e mode) calculates all suboptimal secondary structures within a user defined energy range above the minimum free energy (mfe).It prints the suboptimal structures in bracket notation followed by the energy in kcal/mol to stdout. Be careful, the number of structures returned grows exponentially with both sequence length and energy range. Alternatively, when used with the -p option, RNAsubopt produces Boltzmann weighted samples of secondary structures. sequence:nucleic:2D_structure structure:2D_structure rnasubopt String rnasubopt "RNAsubopt" "RNAsubopt" seq RNA Sequences File RNA Sequence FASTA " < $value" " < " + str(value) 1000 control Control options 2 mfe Calculate suboptimal structures within this range kcal/mol of the mfe (-e) Integer 1 (defined $value and $value != $vdef)? " -e $value" : "" ("", " -e " + str(value))[value is not None and value != vdef] temperature Rescale energy parameters to a temperature of temperature Celcius (-T) Integer 37 (defined $value and $value != $vdef)? " -T $value" : "" ( "" , " -T " + str(value) )[ value is not None and value != vdef] tetraloops Do not include special stabilizing energies for certain tetraloops (-4) Boolean 0 ($value)? " -4" : "" ( "" , " -4" )[ value ] dangling How to treat dangling end energies for bases adjacent to helices in free ends and multiloops (-d) Choice -d2 -d1 -d0 -d2 -d3 (defined $value and $value ne $vdef)? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] Change treatment of dangling ends, as in RNAfold and RNAeval. The default is -d2 (as in partition function folding). If -d1 or -d3 are specified the structures are generated as with -d2 but energies are re-evaluated before printing. logML Recalculate energies of structures using a logarithmic energy function for multi-loops (-logML) Boolean 0 ($value)? " -logML" : "" ("", " -logML")[ value ] This option does not effect structure generation, only the energies that is printed out. Since logML lowers energies somewhat, some structures may be missing. ep Only print structures with energy within this prange of the mfe (-ep) Integer (defined $value)? " -ep $value" : "" ("", " -ep " + str(value))[ value is not None] Only print structures with energy within prange of the mfe. Useful in conjunction with -logML, -d1 or -d3: while the -e option specifies the range before energies are re-evaluated, -ep specifies the maximum energy after re-evaluation. sort Sort the structures by energy (-s) Boolean 0 ($value)? " -s" : "" ("", " -s")[value] Since the sort in is done in memory, this becomes impractical when the number of structures produced goes into millions. input Input parameters 2 noLP Avoid lonely pairs (helices of length 1) (-noLP) Boolean 0 ($value)? " -noLP" : "" ("", " -noLP")[value] Only produce structures without lonely pairs (helices of length 1). This reduces the number of structures drastically and should therefore be used for longer sequences and larger energy ranges. noGU Do not allow GU pairs (-noGU) Boolean 0 ($value)? " -noGU" : "" ("", " -noGU")[value] noCloseGU Do not allow GU pairs at the end of helices (-noCloseGU) Boolean 0 ($value)? " -noCloseGU" : "" ("", " -noCloseGU")[value] nsp Non standard pairs (comma seperated list) (-nsp) String (defined $value)? " -nsp $value" : "" ( "" , " -nsp " + str(value) )[ value is not None ] Allow other pairs in addition to the usual AU,GC,and GU pairs. pairs is a comma seperated list of additionally allowed pairs. If a the first character is a '-' then AB will imply that AB and BA are allowed pairs. e.g. RNAfold -nsp -GA will allow GA and AG pairs. Nonstandard pairs are given 0 stacking energy. parameter Parameter file (-P) EnergyParameterFile AbstractText (defined $value)? " -P $value" : "" ( "" , " -P " + str(value) )[ value is not None ] Read energy parameters from paramfile, instead of using the default parameter set. A sample parameterfile should accompany your distribution. See the RNAlib documentation for details on the file format. z Calculate z score Boolean 0 (defined $value)? " -z" : "" ( "" , " -z" )[ value ] readseq String "readseq -f=19 -a $seq > $seq.tmp && (cp $seq $seq.orig && mv $seq.tmp $seq) ; " "readseq -f=19 -a "+ str(seq) + " > "+ str(seq) +".tmp && (cp "+ str(seq) +" "+ str(seq) +".orig && mv "+ str(seq) +".tmp "+ str(seq) +") ; " -10 psfiles Postscript file PostScript Binary "*.ps" "*.ps" Programs-5.1.1/gblocks.xml0000644000175000001560000004374511767572177014361 0ustar bneronsis gblocks Version 0.91b Gblocks Selection of conserved blocks from multiple sequence alignment Jose Castresana Castresana, J. (2000). Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution 17, 540-552 Talavera, G., and Castresana, J. (2007). Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Systematic Biology 56, 564-577 http://bioweb2.pasteur.fr/docs/gblocks/ http://molevol.cmima.csic.es/castresana/Gblocks.html http://molevol.cmima.csic.es/castresana/Gblocks.html alignment:multiple:information gblocks input Input 2 infile Alignment Protein DNA Alignment FASTA " $value -g" " "+str(value)+" -g" FASTA-formatted alignments are accepted by gblocks. There is no limit for the number of sequences or positions in the alignment as long as there is enough memory available for the program. input_type Type of sequence (-t) Choice auto auto Protein DNA (defined $value and $value ne $vdef) ? " -t=$value" : "" ("", " -t="+str(value))[value is not None and value!=vdef] Automatic: For FASTA-formatted alignments, gblocks assigned the type Protein automatically. In other cases, precise. sel_options Options for selection 3 MNOSFACP Minimum Number Of Sequences For A Conserved Position(-b1) Integer (defined $value) ? " -b1=$value" : "" ("" , " -b1="+str(value))[value is not None] Any integer BIGGER than half the number of sequences and SMALLER OR EQUAL than the total number of sequences. If you put an integer not in this interval, gblocks will run the program with default value, check the warning in output. By default, = 50% of the number of sequences +1. MNOSFOFP Minimum Number Of Sequences For A Flank Position(-b2) Integer (defined $value) ? " -b2=$value" : "" ("" , " -b2="+str(value))[value is not None] Any integer EQUAL OR SMALLER than Minimum Number Of Sequences For A Conserved Position. If you put an integer not in this interval, gblocks will run the program with default value, check the warning in output. By default, = 85% of the number of sequences. MLOAB Minimum Length Of A Block(-b4) Integer 10 (defined $value and $value != $vdef) ? " -b4=$value" : "" ("" , " -b4="+str(value))[value is not None and value!=vdef] Any integer equal or bigger than 2 (= 10 by default). gap Allowed Gap Positions (-b5) Choice n n h a (defined $value and $value ne $vdef) ? " -b5=$value" : "" ("", " -b5="+str(value))[value is not None and value!=vdef] None(default value): No gap positions are allowed in the final alignment. With Half: Positions with a gap in less than 50% of the sequences selected in the final alignment (if they're in an appropriate block). All: Positions with gaps are not treated differently from others positions. MNOCNP Maximum Number Of Contiguous Nonconserved Positions(-b3) Integer 8 (defined $value and $value != $vdef) ? " -b3=$value" : "" ("" , " -b3="+str(value))[value is not None and value!=vdef] All segment with contiguous non-conserved positions bigger than this value is rejected (=8 by default). saving_options Saving options 4 sb Selected Blocks (-s) Boolean 1 ($value) ? "" , " -s=n" (" -s=n" , "")[value] Saving or Not the alignment file with the selected blocks. Res_Param Results And Parameters File(-p) Choice y y t s n (defined $value and $value ne $vdef) ? " -p=$value" : "" ("" , " -p="+str(value))[value is not None and value!=vdef] Saving an HTML file (Yes), saving a text file (Text), saving a short text file (Short Text) or not saving any of them (No). With the first two options the original file is shown with the selected blocks underlined and, in the HTML file, with colored conserved positions. The parameters used and the flank positions of the selected blocks are written in these files. PerLine Characters Per Line In Results And Parameters File (>50)(-v) Integer 60 (defined $value and $value != $vdef) ? " -v=$value" : "" ("" , " -v="+str(value))[value is not None and value!=vdef] Number of characters per line in the alignment shown in the Results And Parameters File. Any integer bigger than 50 is accepted (60 by default). nc NonConserved Blocks (-n) Boolean 0 ($value) ? "-n=y" , "" ("" , " -n=y")[value] Saving or Not a the alignment file with the blocks NOT selected (i.e., the complementary of the selected blocks). ua Ungapped Alignment (-u) Boolean 0 ($value) ? "-u=y" , "" ("" , " -u=y")[value] Saving or Not the alignment file where only gap positions (i.e. positions with at least one gap) have been removed. mask Mask File With The Selected Blocks (-k) Boolean 0 ($value) ? "-k=y" , "" ("" , " -k=y")[value] Saving or Not a file that can be read by the program SeqPup. In this file, conserved positions as defined by Gblocks are shadowed and selected blocks underlined ps Postscript File With The Selected Blocks (-d) Boolean 0 ($value) ? "-d=y" , "" ("" , " -d=y")[value] Saving or Not a Postscript file that shows schematically the selected blocks. You need a postscript viewer or editor to view this file. Res_Param_HTML Results and Params (HTML format) Text $Res_Param ne 't' and $Res_Param ne 's' and $Res_Param ne 'n' Res_Param != 't' and Res_Param != 's' and Res_Param != 'n' "*-gb.htm" "*-gb.htm" Res_Param_t Results and Params (Text format) Text $Res_Param eq 't' Res_Param == 't' "*-gb.txt" Res_Param_st Results and Params (ShortText format) Text $Res_Param eq 's' Res_Param == 's' "*-gb.txts" "*-gb.txts" alignment_result Alignment file with selected blocks Alignment FASTA $sb sb "*-gb" "*-gb" nc_file NonConserved Blocks File Alignment FASTA $nc nc "*-gbComp" "*-gbComp" ua_file Ungapped Alignment File Alignment FASTA $ua ua "*--" "*--" mask_file Mask File With The Selected Blocks Text $mask mask "*-gbMask" "*-gbMask" ps_file Postscript File Text $ps ps "*-gbPS" "*-gbPS" Programs-5.1.1/COPYING.LIB0000644000175000001560000006350410775415502013630 0ustar bneronsis GNU LESSER GENERAL PUBLIC LICENSE Version 2.1, February 1999 Copyright (C) 1991, 1999 Free Software Foundation, Inc. 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. [This is the first released version of the Lesser GPL. It also counts as the successor of the GNU Library Public License, version 2, hence the version number 2.1.] Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public Licenses are intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This license, the Lesser General Public License, applies to some specially designated software packages--typically libraries--of the Free Software Foundation and other authors who decide to use it. You can use it too, but we suggest you first think carefully about whether this license or the ordinary General Public License is the better strategy to use in any particular case, based on the explanations below. When we speak of free software, we are referring to freedom of use, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish); that you receive source code or can get it if you want it; that you can change the software and use pieces of it in new free programs; and that you are informed that you can do these things. To protect your rights, we need to make restrictions that forbid distributors to deny you these rights or to ask you to surrender these rights. These restrictions translate to certain responsibilities for you if you distribute copies of the library or if you modify it. For example, if you distribute copies of the library, whether gratis or for a fee, you must give the recipients all the rights that we gave you. You must make sure that they, too, receive or can get the source code. If you link other code with the library, you must provide complete object files to the recipients, so that they can relink them with the library after making changes to the library and recompiling it. And you must show them these terms so they know their rights. We protect your rights with a two-step method: (1) we copyright the library, and (2) we offer you this license, which gives you legal permission to copy, distribute and/or modify the library. To protect each distributor, we want to make it very clear that there is no warranty for the free library. Also, if the library is modified by someone else and passed on, the recipients should know that what they have is not the original version, so that the original author's reputation will not be affected by problems that might be introduced by others. Finally, software patents pose a constant threat to the existence of any free program. We wish to make sure that a company cannot effectively restrict the users of a free program by obtaining a restrictive license from a patent holder. Therefore, we insist that any patent license obtained for a version of the library must be consistent with the full freedom of use specified in this license. Most GNU software, including some libraries, is covered by the ordinary GNU General Public License. This license, the GNU Lesser General Public License, applies to certain designated libraries, and is quite different from the ordinary General Public License. We use this license for certain libraries in order to permit linking those libraries into non-free programs. When a program is linked with a library, whether statically or using a shared library, the combination of the two is legally speaking a combined work, a derivative of the original library. The ordinary General Public License therefore permits such linking only if the entire combination fits its criteria of freedom. The Lesser General Public License permits more lax criteria for linking other code with the library. We call this license the "Lesser" General Public License because it does Less to protect the user's freedom than the ordinary General Public License. It also provides other free software developers Less of an advantage over competing non-free programs. These disadvantages are the reason we use the ordinary General Public License for many libraries. However, the Lesser license provides advantages in certain special circumstances. For example, on rare occasions, there may be a special need to encourage the widest possible use of a certain library, so that it becomes a de-facto standard. To achieve this, non-free programs must be allowed to use the library. A more frequent case is that a free library does the same job as widely used non-free libraries. In this case, there is little to gain by limiting the free library to free software only, so we use the Lesser General Public License. In other cases, permission to use a particular library in non-free programs enables a greater number of people to use a large body of free software. For example, permission to use the GNU C Library in non-free programs enables many more people to use the whole GNU operating system, as well as its variant, the GNU/Linux operating system. Although the Lesser General Public License is Less protective of the users' freedom, it does ensure that the user of a program that is linked with the Library has the freedom and the wherewithal to run that program using a modified version of the Library. The precise terms and conditions for copying, distribution and modification follow. Pay close attention to the difference between a "work based on the library" and a "work that uses the library". The former contains code derived from the library, whereas the latter must be combined with the library in order to run. GNU LESSER GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License Agreement applies to any software library or other program which contains a notice placed by the copyright holder or other authorized party saying it may be distributed under the terms of this Lesser General Public License (also called "this License"). Each licensee is addressed as "you". A "library" means a collection of software functions and/or data prepared so as to be conveniently linked with application programs (which use some of those functions and data) to form executables. The "Library", below, refers to any such software library or work which has been distributed under these terms. A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".) "Source code" for a work means the preferred form of the work for making modifications to it. For a library, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the library. Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running a program using the Library is not restricted, and output from such a program is covered only if its contents constitute a work based on the Library (independent of the use of the Library in a tool for writing it). Whether that is true depends on what the Library does and what the program that uses the Library does. 1. You may copy and distribute verbatim copies of the Library's complete source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and distribute a copy of this License along with the Library. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Library or any portion of it, thus forming a work based on the Library, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) The modified work must itself be a software library. b) You must cause the files modified to carry prominent notices stating that you changed the files and the date of any change. c) You must cause the whole of the work to be licensed at no charge to all third parties under the terms of this License. d) If a facility in the modified Library refers to a function or a table of data to be supplied by an application program that uses the facility, other than as an argument passed when the facility is invoked, then you must make a good faith effort to ensure that, in the event an application does not supply such function or table, the facility still operates, and performs whatever part of its purpose remains meaningful. (For example, a function in a library to compute square roots has a purpose that is entirely well-defined independent of the application. Therefore, Subsection 2d requires that any application-supplied function or table used by this function must be optional: if the application does not supply it, the square root function must still compute square roots.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Library, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Library, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Library. In addition, mere aggregation of another work not based on the Library with the Library (or with a work based on the Library) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may opt to apply the terms of the ordinary GNU General Public License instead of this License to a given copy of the Library. To do this, you must alter all the notices that refer to this License, so that they refer to the ordinary GNU General Public License, version 2, instead of to this License. (If a newer version than version 2 of the ordinary GNU General Public License has appeared, then you can specify that version instead if you wish.) Do not make any other change in these notices. Once this change is made in a given copy, it is irreversible for that copy, so the ordinary GNU General Public License applies to all subsequent copies and derivative works made from that copy. This option is useful when you wish to copy part of the code of the Library into a program that is not a library. 4. You may copy and distribute the Library (or a portion or derivative of it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange. If distribution of object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place satisfies the requirement to distribute the source code, even though third parties are not compelled to copy the source along with the object code. 5. A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License. However, linking a "work that uses the Library" with the Library creates an executable that is a derivative of the Library (because it contains portions of the Library), rather than a "work that uses the library". The executable is therefore covered by this License. Section 6 states terms for distribution of such executables. When a "work that uses the Library" uses material from a header file that is part of the Library, the object code for the work may be a derivative work of the Library even though the source code is not. Whether this is true is especially significant if the work can be linked without the Library, or if the work is itself a library. The threshold for this to be true is not precisely defined by law. If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.) Otherwise, if the work is a derivative of the Library, you may distribute the object code for the work under the terms of Section 6. Any executables containing that work also fall under Section 6, whether or not they are linked directly with the Library itself. 6. As an exception to the Sections above, you may also combine or link a "work that uses the Library" with the Library to produce a work containing portions of the Library, and distribute that work under terms of your choice, provided that the terms permit modification of the work for the customer's own use and reverse engineering for debugging such modifications. You must give prominent notice with each copy of the work that the Library is used in it and that the Library and its use are covered by this License. You must supply a copy of this License. If the work during execution displays copyright notices, you must include the copyright notice for the Library among them, as well as a reference directing the user to the copy of this License. Also, you must do one of these things: a) Accompany the work with the complete corresponding machine-readable source code for the Library including whatever changes were used in the work (which must be distributed under Sections 1 and 2 above); and, if the work is an executable linked with the Library, with the complete machine-readable "work that uses the Library", as object code and/or source code, so that the user can modify the Library and then relink to produce a modified executable containing the modified Library. (It is understood that the user who changes the contents of definitions files in the Library will not necessarily be able to recompile the application to use the modified definitions.) b) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (1) uses at run time a copy of the library already present on the user's computer system, rather than copying library functions into the executable, and (2) will operate properly with a modified version of the library, if the user installs one, as long as the modified version is interface-compatible with the version that the work was made with. c) Accompany the work with a written offer, valid for at least three years, to give the same user the materials specified in Subsection 6a, above, for a charge no more than the cost of performing this distribution. d) If distribution of the work is made by offering access to copy from a designated place, offer equivalent access to copy the above specified materials from the same place. e) Verify that the user has already received a copy of these materials or that you have already sent this user a copy. For an executable, the required form of the "work that uses the Library" must include any data and utility programs needed for reproducing the executable from it. However, as a special exception, the materials to be distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. It may happen that this requirement contradicts the license restrictions of other proprietary libraries that do not normally accompany the operating system. Such a contradiction means you cannot use both them and the Library together in an executable that you distribute. 7. You may place library facilities that are a work based on the Library side-by-side in a single library together with other library facilities not covered by this License, and distribute such a combined library, provided that the separate distribution of the work based on the Library and of the other library facilities is otherwise permitted, and provided that you do these two things: a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities. This must be distributed under the terms of the Sections above. b) Give prominent notice with the combined library of the fact that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work. 8. You may not copy, modify, sublicense, link with, or distribute the Library except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, link with, or distribute the Library is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 9. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Library or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Library (or any work based on the Library), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Library or works based on it. 10. Each time you redistribute the Library (or any work based on the Library), the recipient automatically receives a license from the original licensor to copy, distribute, link with or modify the Library subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties with this License. 11. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Library at all. For example, if a patent license would not permit royalty-free redistribution of the Library by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Library. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply, and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 12. If the distribution and/or use of the Library is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Library under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 13. The Free Software Foundation may publish revised and/or new versions of the Lesser General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Library specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Library does not specify a license version number, you may choose any version ever published by the Free Software Foundation. 14. If you wish to incorporate parts of the Library into other free programs whose distribution conditions are incompatible with these, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Libraries If you develop a new library, and you want it to be of the greatest possible use to the public, we recommend making it free software that everyone can redistribute and change. You can do so by permitting redistribution under these terms (or, alternatively, under the terms of the ordinary General Public License). To apply these terms, attach the following notices to the library. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Also add information on how to contact you by electronic and paper mail. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the library, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the library `Frob' (a library for tweaking knobs) written by James Random Hacker. , 1 April 1990 Ty Coon, President of Vice That's all there is to it! Programs-5.1.1/sigcleave.xml0000644000175000001560000002147212072525233014645 0ustar bneronsis sigcleave EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net sigcleave Reports on signal cleavage sites in a protein sequence http://bioweb2.pasteur.fr/docs/EMBOSS/sigcleave.html http://emboss.sourceforge.net/docs/themes sequence:protein:motifs sigcleave e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_minweight Minimum weight (value from 0. to 100.) Float 3.5 ("", " -minweight=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0. is required value >= 0. Value less than or equal to 100. is required value <= 100. 2 Minimum scoring weight value for the predicted cleavage site e_additional Additional section e_prokaryote Use prokaryotic cleavage data Boolean 0 ("", " -prokaryote")[ bool(value) ] 3 Specifies the sequence is prokaryotic and changes the default scoring data file name e_output Output section e_outfile Name of the report file Filename report.sig ("" , " -outfile=" + str(value))[value is not None] 4 e_rformat_outfile Choose the report output format Choice MOTIF DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 5 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 6 Programs-5.1.1/biosed.xml0000644000175000001560000001766412072525233014160 0ustar bneronsis biosed EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net biosed Replace or delete sequence sections http://bioweb2.pasteur.fr/docs/EMBOSS/biosed.html http://emboss.sourceforge.net/docs/themes sequence:edit biosed e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_targetregion Sequence section to match String N ("", " -targetregion=" + str(value))[value is not None and value!=vdef] 2 e_delete Delete the target sequence sections Boolean 0 ("", " -delete")[ bool(value) ] 3 e_replace Replacement sequence section String not e_delete A ("", " -replace=" + str(value))[value is not None and value!=vdef] 4 e_additional Additional section e_position Sequence position to match (value greater than or equal to 0) Integer 0 ("", " -position=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 5 e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename biosed.e_outseq ("" , " -outseq=" + str(value))[value is not None] 6 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 7 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 8 Programs-5.1.1/boxshade.xml0000644000175000001560000014572311643326777014525 0ustar bneronsis boxshade 3.31 BOXSHADE printouts from multiple-aligned protein or DNA sequences Hofmann, Baron http://www.ch.embnet.org/software/BOX_form.html ftp://www.isrec.isb-sib.ch/pub/software/unix/boxshade/ alignment:multiple:display boxshade String "boxshade <boxshade.params" "boxshade <boxshade.params" 0 alignment Alignment File Alignment CLUSTAL "$value\\n" str(value) + "\n" 1 boxshade.params input_format String "2\\n" "2\n" 2 boxshade.params output_params Output parameters output_format Output format Choice e 1 2 3 4 6 7 8 9 a c d e "$value\\n" str(value) + "\n" 3 boxshade.params print_name Should sequence name be printed Boolean 1 ($value) ? "y\\n" : "n\\n" ( "n\n" , "y\n" )[ value ] 13 boxshade.params ruler Display ruler line Boolean 0 ($value) ? "y\\n" : "n\\n" ( "n\n" , "y\n" )[ value ] 6 boxshade.params print_position Should position numbers be printed? Boolean not $ruler not ruler 0 ($value) ? "y\\n" : "n\\n" ( "n\n" , "y\n" )[ value ] 14 boxshade.params sequence_characters How many sequence characters per line Integer 60 "$value\\n" str( value ) + "\n" Maximum value is 254 $value <= 254 value <= 254 12 boxshade.params lines How many lines between two sequence blocks Integer 1 "$value\\n" str(value) + "\n" Enter a value > 0 $value > 0 value > 0 17 boxshade.params character_size Character size in Points (except for HTML and ASCII output formats) Integer $output_format !~ /^[89be]$/ output_format not in ["8","9","b","e"] 10 "$value\\n" str(value) + "\n" 28 boxshade.params save_shading Save Shading/Text Choice $output_format eq "d" output_format == "d" T S T "$value\\n" str(value) + "\n" 29 boxshade.params rotate Rotate plot Boolean $output_format eq "1" or $output_format eq "3" or $output_format eq "d" output_format == "1" or output_format == "3" or output_format == "d" 0 ($value) ? "y\\n" : "n\\n" ( "n\n" , "y\n" )[ value ] 31 boxshade.params sequence_params Sequence properties label_similar Special label for similar residues Boolean 1 ($value) ? "y\\n" : "n\\n" ( "n\n" , "y\n" )[ value ] 18 boxshade.params label_identical Special label for identical residues in all sequences Boolean 0 ($value) ? "y\\n" : "n\\n" ( "n\n" , "y\n" )[ value ] 19 boxshade.params consensus Display consensus line Boolean 0 ($value) ? "y\\n .*\\n" : "n\\n" ( "n\n" , "y\n .*\n" )[ value ] 5 boxshade.params threshold Identity threshold Float $output_format ne "b" output_format != "b" 0.50 "$value\\n" str(value) + "\n" The fraction must be between 0 and 1 0 <= $value <= 1 0 <= value <= 1 10 The threshold is the fraction of residues that must be identical or similar for shading to occur. boxshade.params letters Letters foreground and background colors different_background Background for different residues Choice $output_format ne "b" output_format != "b" W B W 1 2 3 4 R G L Y M C "$value\\n" str(value) + "\n" 20 boxshade.params different_foreground Foreground for different residues (lowercase choices mean lowercase letters in the sequence) Choice B B b W w 1 2 3 4 5 6 7 8 R r G g L l Y y M m C c "$value\\n" str(value) + "\n" 21 boxshade.params identical_background Background for identical residues Choice $output_format ne "b" output_format != "b" B B W 1 2 3 4 R G L Y M C "$value\\n" str(value) + "\n" 22 boxshade.params identical_foreground Foreground for identical residues (lowercase choices mean lowercase letters in the sequence) Choice W B b W w 1 2 3 4 5 6 7 8 R r G g L l Y y M m C c "$value\\n" str(value) + "\n" 23 boxshade.params similar_background Background for similar residues Choice $label_similar and $output_format ne "b" label_similar and output_format != "b" 1 B W 1 2 3 4 R G L Y M C "$value\\n" str(value) + "\n" 24 boxshade.params similar_foreground Foreground for similar residues (lowercase choices mean lowercase letters in the sequence) Choice $label_similar label_similar B B b W w 1 2 3 4 5 6 7 8 R r G g L l Y y M m C c "$value\\n" str(value) + "\n" 25 boxshade.params conserved_background Background for conserved residues (if special label for identical residues) Choice label_identical and output_format ne "b" label_identical and output_format != "b" 1 B W 1 2 3 4 R G L Y M C "$value\\n" str(value) + "\n" 26 boxshade.params conserved_foreground Foreground for conserved residues (lowercase choices mean lowercase letters in the sequence) Choice $label_identical label_identical B B b W w 1 2 3 4 5 6 7 8 R r G g L l Y y M m C c "$value\\n" str(value) + "\n" 27 boxshade.params single_comparison Comparison to a single sequence single Similarity to a single sequence Boolean $output_format ne "b" output_format != "b" 0 ($value) ? "y\\n" : "n\\n" ( "n\n" , "y\n" )[ value ] 4 boxshade.params seq_no Which sequence (give its number) Integer $single single 1 "$value\\n" str(value) + "\n" Give a sequence NUMBER $seq_no >= 0 seq_no >= 0 40 boxshade.params hide Hide this sequence Boolean $single single 0 ($value) ? "y\\n" : "n\\n" ( "n\n" , "y\n" )[ value ] 41 boxshade.params show_normal Show this sequence in all-normal rendition Boolean $single single 0 ($value) ? "y\\n" : "n\\n" ( "n\n" , "y\n" )[ value ] 42 boxshade.params matrix Create identity / similarity matrix Boolean 0 ($value) ? "y\\n" : "n\\n" ( "n\n" , "y\n" )[ value ] 34 boxshade.params outfileName Filename $output_format ne "1" and $output_format ne "e" and $output_format ne "2" and $output_format ne "d" and $output_format ne "c" and $output_format ne "4" output_format != "1" and output_format != "e" and output_format != "2" and output_format != "d" and output_format != "c" and output_format != "4" "boxshade.result\\n" "boxshade.result\n" 32 boxshade.params outFile Output file Text $output_format ne "1" and $output_format ne "e" and $output_format ne "2" and $output_format ne "d" and $output_format ne "c" and $output_format ne "4" output_format != "1" and output_format != "e" and output_format != "2" and output_format != "d" and output_format != "c" and output_format != "4" "boxshade.result" "boxshade.result" psFileName Filename $output_format eq "1" or $output_format eq "2" output_format == "1" or output_format == "2" "boxshade.ps\\n" "boxshade.ps\n" 33 boxshade.params psFile Postscript output file PostScript Binary $output_format eq "1" or $output_format eq "2" output_format == "1" or output_format == "2" "boxshade.ps" "boxshade.ps" htmlFileName Filename $output_format eq "e" output_format == "e" "boxshade.html\\n" "boxshade.html\n" 33 boxshade.params htmlFile Html output file BoxshadeHtmlReport Report $output_format eq "e" output_format == "e" "boxshade.html" "boxshade.html" rtfFileName Filename $output_format eq "4" output_format == "4" "boxshade.rtf\\n" "boxshade.rtf\n" 33 boxshade.params rtfFile Rich text format output file BoxshadeRtfReport Report $output_format eq "4" output_format == "4" "boxshade.rtf" "boxshade.rtf" figFileName Filename $output_format eq "c" output_format == "c" "boxshade.fig\\n" "boxshade.fig\n" 33 boxshade.params figFile Xfig output file BoxshadeXfigReport Report $output_format eq "c" output_format == "c" "boxshade.fig" "boxshade.fig" pictFileName Filename $output_format eq "d" output_format == "d" "boxshade.pict\\n" "boxshade.pict\n" 33 boxshade.params pictFile Picture in pict format Picture Binary $output_format eq "d" output_format == "d" "boxshade.pict" "boxshade.pict" matrixFileName Filename $matrix matrix "boxshade.matrix\\n" "boxshade.matrix\n" 35 boxshade.params matrixFile Output matrix Text $matrix matrix "boxshade.matrix" "boxshade.matrix" Programs-5.1.1/unroot.xml0000644000175000001560000000573611724156742014246 0ustar bneronsis unroot unroot Unroot a tree http://bioweb2.pasteur.fr/docs/phylip/doc/retree.html Unroot use RETREE a tree editor. It reads in a tree. phylogeny:tree_analyser unroot String "retree < retree.params" "retree < retree.params" 0 treefile Tree File Tree NEWICK "ln -s $treefile intree && " "ln -s "+str( treefile ) +" intree && " -10 The program hangs when provided a tree with [...] added to branch lengths. (A,(B,(H,(D,(J,(((G,E),(F,I)),C)))))); (A,(B,(D,((J,H),(((G,E),(F,I)),C))))); (A,(B,(D,(H,(J,(((G,E),(F,I)),C)))))); outtree Tree output file Tree NEWICK " && mv outtree unroot.outtree" " && mv outtree unroot.outtree" 40 "unroot.outtree" "unroot.outtree" commands String "0\\nY\\nW\\nU\\nQ\\n" "0\nY\nW\nU\nQ\n" 1000 retree.params Programs-5.1.1/hmmsearch.xml0000644000175000001560000011004611767572177014671 0ustar bneronsis hmmsearch HMMSEARCH Search a sequence database with a profile HMM hmmsearch reads an HMM from hmmfile and searches seqfile for significantly similar sequence matches. hmmsearch may take minutes or even hours to run, depending on the size of the sequence database. The output consists of four sections: a ranked list of the best scoring sequences, a ranked list of the best scoring domains, alignments for all the best scoring domains, and a histogram of the scores. sequence score may be higher than a domain score for the same sequence if there is more than domain in the sequence; the sequence score takes into account all the domains. All sequences scoring above the -E and -T cutoffs are shown in the first list, then every domain found in this list is shown in the second list of domain hits. If desired, E-value and bit score thresholds may also be applied to the domain list using the --domE and --domT options. hmm:database:search database:search:hmm hmmsearch hmmfile HMM file HmmProfile AbstractText " $value" " "+str(value) 2 public_seq_DB Choose one public protein sequence database Choice not $public_seq_DB not perso_seq_DB or ( perso_seq_DB and public_seq_DB ) null null " $value" " "+str(value) Can not handle both public AND personal protein sequence database at the same time not defined $perso_seq_DB not perso_seq_DB 3 perso_seq_DB OR paste a personal protein sequence database Sequence FASTA not $perso_seq_DB not public_seq_DB or ( perso_seq_DB and public_seq_DB ) " $value" " "+str(value) Can not handle both public AND personal protein sequence database at the same time not defined $perso_seq_DB not public_seq_DB 3 thresholds_report Options controlling reporting thresholds 1 'Reporting' thresholds control which hits are reported in output files. Sequence hits and domain hits are ranked by statistical significance (E-value) and output is generated in two sections called 'per-target' and 'per-domain' output. In per-target output, by default, all sequence hits with an E-value <= 10 are reported. In the per-domain output, for each target that has passed per-target reporting thresholds, all domains satisfying per-domain reporting thresholds are reported. By default, these are domains with conditional E-values of <= 10. The following options allow you to change the default E-value reporting thresholds, or to use bit score thresholds instead. E_value_cutoff E_value cutoff (-E) Float not defined $Bit_cutoff and $model_specific ne '--cut_ga' and $model_specific ne '--cut_nc' Bit_cutoff is None and model_specific != '--cut_ga' and model_specific != '--cut_nc' 10.0 (defined $value and $value != $vdef) ? " -E $value" : "" ( "" , " -E " + str(value) )[ value is not None and value != vdef] 1 In the per-target output, report target profiles with an E-value of <= value. The default is 10.0, meaning that on average, about 10 false positives will be reported per query, so you can see the top of the 'noise' and decide for yourself if it's really noise. Bit_cutoff Bit score cutoff (-T) Float $E_value_cutoff == 10.0 and $model_specific ne '--cut_ga' and $model_specific ne '--cut_nc' E_value_cutoff == 10.0 and model_specific != '--cut_ga' and model_specific != '--cut_nc' (defined $value)? " -T $value" : "" ( "" , " -T " + str(value) )[ value is not None ] 1 Instead of thresholding per-profile output on E-value, instead report target profiles with a bit score of >= value. domE E-value cutoff for the per-domain ranked hit list (--domE) Float not defined $domT and $model_specific ne '--cut_ga' and $model_specific ne '--cut_nc' domT is None and model_specific != '--cut_ga' and model_specific != '--cut_nc' 10.0 (defined $value and $value != $vdef) ? " --domE $value" : "" ( "" , " --domE " + str(value) )[ value is not None and value !=vdef ] In the per-domain output, for target profiles that have already satisfied the perprofile reporting threshold, report individual domains with a conditional E-value of <= value. The default is 10.0. A 'conditional' E-value means the expected number of additional false positive domains in the smaller search space of those comparisons that already satisfied the per-profile reporting threshold (and thus must have at least one homologous domain already). domT Bit score cutoff for the per-domain ranked hit list (--domT) Float $domE == 10.0 and $model_specific ne '--cut_ga' and $model_specific ne '--cut_nc' domE == 10.0 and model_specific != '--cut_ga' and model_specific != '--cut_nc' (defined $value) ? " --domT $value" : "" ( "" , " --domT " + str(value) )[ value is not None ] Instead of thresholding per-domain output on E-value, instead report domains with a bit score of >= value. thresholds_output Options controlling inclusion (significance) thresholds. 1 'Inclusion' thresholds are stricter than reporting thresholds. Inclusion thresholds control which hits are considered to be reliable enough to be included in an output alignment or a subsequent search round. In hmmscan, which does not have any alignment output nor any iterative search steps, inclusion thresholds have little effect. They only affect what domains get marked as significant ('!') or questionable ('?') in domain output. incE Include sequences lower than this E-value threshold in output alignment (--incE) Float not defined $incT and $model_specific ne '--cut_ga' incT is None and model_specific != '--cut_ga' 0.01 (defined $value and $value != vdef) ? " --incE $value" : "" ( "" , " --incE " + str(value) )[ value is not None and value != vdef] Use an E-value of <= value as the per-target inclusion threshold. The default is 0.01, meaning that on average, about 1 false positive would be expected in every 100 searches with different query sequences. incdomE Include domains lower than this E-value threshold in output alignment (--incdomE) Float defined $incdomT and not defined model_specific incdomT is not None and model_specific is None 0.01 (defined $value and $value != vdef) ? " --incdomE $value" : "" ( "" , " --incdomE " + str(value) )[ value is not None and value != vdef] Use a conditional E-value of <= value as the per-domain inclusion threshold, in targets that have already satisfied the overall per-target inclusion threshold. The default is 0.01. incT Include sequences upper than this score threshold in output alignment (--incT) Float $incE == 0.01 and $model_specific ne '--cut_ga' incE == 0.01 and model_specific != '--cut_ga' (defined $value) ? " --incT $value" : "" ( "" , " --incT " + str(value) )[ value is not None ] Instead of using E-values for setting the inclusion threshold, instead use a bit score of >= the value as the per-target inclusion threshold. It would be unusual to use bit score thresholds with hmmscan, because you don't expect a single score threshold to work for different profiles; different profiles have slightly different expected score distributions. incdomT Include domans upper than this score threshold in output alignment (--incdomT) Float $incdomE == 0.01 and not defined $model_specific incdomE == 0.01 and model_specific is None (defined $value) ? " --incdomT $value" : "" ( "" , " --incdomT " + str(value) )[ value is not None ] Instead of using E-values, instead use a bit score of >= value as the per-domain inclusion threshold. As with --incT above, it would be unusual to use a single bit score threshold in hmmscan. model_specific Options controlling model-specific thresholding Choice not defined $Bit_cutoff and not $E_value_cutoff == 10.0 and not defined $incdomT and $incdomE == 0.01 not Bit_cutoff and E_value_cutoff == 10.0 and incdomT is None and incdomE == 0.01 null null --cut_ga --cut_nc --cut_tc (defined $value and $value ne $vdef) ? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] Curated profile databases may define specific bit score thresholds for each profile, superseding any thresholding based on statistical significance alone. To use these options, the profile must contain the appropriate (GA, TC, and/or NC) optional score threshold annotation; this is picked up by hmmbuild from Stockholm format alignment files. Each thresholding option has two scores: the per-sequence threshold x1 value and the per-domain threshold x2 value. These act as if -T x1 --incT x1 --domT x2 --incdomT x2 has been applied specifically using each model's curated thresholds. cut ga: Use the GA (gathering) bit scores in the model to set per-sequence (GA1) and per-domain (GA2) reporting and inclusion thresholds. GA thresholds are generally considered to be the reliable curated thresholds defining family membership; for example, in Pfam, these thresholds define what gets included in Pfam Full alignments based on searches with Pfam Seed models. cut_nc: Use the NC (noise cutoff) bit score thresholds in the model to set per-sequence (NC1) and per-domain (NC2) reporting and inclusion thresholds. NC thresholds are generally considered to be the score of the highest-scoring known false positive. cut_tc: Use the NC (trusted cutoff) bit score thresholds in the model to set per-sequence (TC1) and per-domain (TC2) reporting and inclusion thresholds. TC thresholds are generally considered to be the score of the lowest-scoring known true positive that is above all known false positives. acceleration Options controlling acceleration heuristics 1 HMMER3 searches are accelerated in a three-step filter pipeline: the MSV filter, the Viterbi filter, and the Forward filter. The first filter is the fastest and most approximate; the last is the full Forward scoring algorithm. There is also a 'bias filter' step between MSV and Viterbi. Targets that pass all the steps in the acceleration pipeline are then subjected to 'postprocessing' -- domain identification and scoring using the Forward/Backward algorithm. Changing filter thresholds only removes or includes targets from consideration; changing filter thresholds does not alter bit scores, E-values, or alignments, all of which are determined solely in 'postprocessing'. max Turn all heuristic filters off (less speed, more power) (--max) Boolean 0 ($value) ? " --max" : "" ( "" , " --max " )[ value ] Turn off all filters, including the bias filter, and run full Forward/Backward postprocessing on every target. This increases sensitivity somewhat, at a large cost in speed. F1 Stage 1 (MSV) threshold Float not $max not max 0.02 (defined $value and $value != $vdef ) ? " --F1 $value" : "" ( "" , " --F1 " + str(value) )[ value is not None and value != vdef] Set the P-value threshold for the MSV filter step. The default is 0.02, meaning that roughly 2% of the highest scoring nonhomologous targets are expected to pass the filter. F2 Stage 1 (Vit) threshold Float not $max not max 0.001 ( defined $value and $value != $vdef ) ? " --F2 $value" : "" ( "" , " --F2 " + str(value) )[ value is not None and value != vdef] Set the P-value threshold for the Viterbi filter step. The default is 0.001. F3 Stage 1 (Fwd) threshold Float not $max not max 0.00001 (defined $value and $value != $vdef ) ? " --F3 $value" : "" ( "" , " --F3 " + str(value) )[ value is not None and value != vdef] Set the P-value threshold for the Forward filter step. The default is 1e-5. nobias Turn off composition bias filter (--nobias) Boolean not $max not max 0 ($value) ? " --nobias" : "" ( "" , " --nobias " )[ value ] Turn off the bias filter. This increases sensitivity somewhat, but can come at a high cost in speed, especially if the query has biased residue composition (such as a repetitive sequence region, or if it is a membrane protein with large regions of hydrophobicity). Without the bias filter, too many sequences may pass the filter with biased queries, leading to slower than expected performance as the computationally intensive Forward/Backward algorithms shoulder an abnormally heavy load. expert Other expert options 1 nonull2 Turn off biased composition score corrections (--nonull2) Boolean 0 ($value) ? " --nonull2" : "" ( "" , " --nonull2 " )[ value ] Turn off the 'null2' score corrections for biased composition. E_value_calculation Control of E_value calculation (-Z) Integer (defined $value) ? " -Z $value" : "" ( "" , " -Z " + str(value) )[ value is not None ] 1 Assert that the total number of targets in your searches is the value, for the purposes of per-sequence E-value calculations, rather than the actual number of targets seen. domZ Set Z score of significant sequences, for domain E-value calculation (--domZ) Float (defined $value) ? " --domZ $value" : "" ( "" , " --domZ " + str(value) )[ value is not None ] Assert that the total number of targets in your searches is the value, for the purposes of per-domain conditional E-value calculations, rather than the number of targets that passed the reporting thresholds. seed Set RNG seed number (--seed) Integer 42 (defined $value and $value != $vdef) ? " --seed $value " : "" ( "" , " --seed " + str(value) )[ value is not None and value !=vdef ] Set the random number seed to value. Some steps in postprocessing require Monte Carlo simulation. The default is to use a fixed seed (42), so that results are exactly reproducible. Any other positive integer will give different (but also reproducible) results. A choice of 0 uses a 'randomly chosen' seed. Enter a value >= 0 0 <= $value 0 <= value output_options Options directing output 1 textw Set max width of ASCII text output lines (--textw) Integer 120 (defined $value and $value != $vdef) ? " --textw $value " : "" ( "" , " --textw " + str(value) )[ value is not None and value !=vdef ] Set the main output's line length limit to value> characters per line. The default is 120. notextw Unlimit ASCII text output line width (--notextw) Boolean $textw == 120 textw == 120 0 ($value) ? " --notextw " : "" ( "" , " --notextw " )[ value ] Unlimit the length of each line in the main output. The default is a limit of 120 characters per line, which helps in displaying the output cleanly on terminals and in editors, but can truncate target profile description lines. acc Prefer accessions over names in output (--acc) Boolean 0 ($value) ? " --acc " : "" ( "" , " --acc " )[ value ] Use accessions instead of names in the main output, where available for profiles and/or sequences outfile_name Name of the sequence(s) file (-o) Filename (defined $value) ? " -o $value" : "" ( " " , " -o " + str(value) )[ value is not None ] 1 output_file_name Output file Text $outfile_name outfile_name $outfile_name str(outfile_name) noali Don't output alignments, so output is smaller (--noali) Boolean 0 ( $value ) ? " --noali " : "" ( "" , " --noali " )[ value ] Omit the alignment section from the main output. This can greatly reduce the output volume. alnfile_name File name of the multiple alignment of all hits (-A) Filename (defined $value) ? " -A $value" : "" ( "" , " -A " + str(value) )[ value is not None ] 1 Save a multiple alignment of all significant hits (those satisfying inclusion thresholds) to the file. output_align_name Output align file Alignment STOCKHOLM $alnfile_name alnfile_name "$alnfile_name" str(alnfile_name) perseqfile_name File name of parseable table of per-sequence hits (--tblout) Filename (defined $value) ? " --tblout $value" : "" ( "" , " --tblout " + str(value) )[ value is not None ] 1 Save a simple tabular (space-delimited) file summarizing the 'per-target' output, with one data line per homologous target model found output_perseqfile_name Output parseable table of per-sequence hits Text $perseqfile_name perseqfile_name "$perseqfile_name" str(perseqfile_name) perdomfile_name File name of parseable table of per-domain hits (--domtblout) Filename (defined $value) ? " --domtblout $value" : "" ( "" , " --domtblout " + str(value) )[ value is not None ] 1 Save a simple tabular (space-delimited) file summarizing the 'per-domain' output, with one data line per homologous domain detected in a query sequence for each homologous model. output_perdomfile_name Output parseable table of per-domain hits Text $perdomfile_name perdomfile_name "$perdomfile_name" str(perdomfile_name) Programs-5.1.1/tacg.xml0000644000175000001560000012040311767572177013636 0ustar bneronsis tacg 4.1.0 TACG Restriction Enzyme analysis Mangalam http://bioweb2.pasteur.fr/docs/tacg/tacg4.0.main.html http://tacg.sourceforge.net/ http://sourceforge.net/projects/tacg/files/ sequence:nucleic:pattern tacg sequence DNA Sequence DNA Sequence FASTA " < $value " " < " + str( value ) + " " 100 input_options Input options 2 beginning Beginning of a subsequence in your sequence (-b) Integer (defined $value)? " -b $value" : "" ( "" , " -b " + str( value ) )[ value is not None ] Select the beginning of a subsequence from a larger sequence file. The smallest sequence that tacg can handle is 4 bases, 10 for the ladders map (-l). This allows analysis of primers and linkers. end End of a subsequence in your sequence (-e) Integer (defined $value)? " -e $value" : "" ( "" , " -e " + str(value) )[ value is not None ] topology Form (or topology) of DNA (-f) Choice 1 0 1 (defined $value and $value ne $vdef)? " -f $value" : "" ( "" , " -f " + str(value) )[ value is not None and value != vdef] degeneracy Degeneracy flag - controls input and analysis of degenerate sequence input (-D) Choice 1 0 1 2 3 4 (defined $value and $value ne $vdef)? " -D $value" : "" ( "" , " -D " + str(value) )[ value is not None and value != vdef] The pattern matching is adaptive; given a small window of nondegenerate sequence, the algorithm will match very fast; if degenerate sequence is detected, it will switch to a slower, iterative approach. This results in speed that is proportional to degeneracy for most cases. codon Codon Usage table to use for translation (-C) Choice 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 (defined $value and $value ne $vdef)? " -C $value" : "" ( "" , " -C " + str(value) )[ value is not None and value != vdef] output_options Output options 2 order_by_cut Order the output by number of cuts/fragments (-c) Boolean $print_fragments != 0 or $sites or $ladder_map or $gel_map print_fragments != 0 or sites or ladder_map or gel_map 0 ($value)? " -c" : "" ( "" , " -c" )[ value ] width Output width (-w) Integer 60 (defined $value and $value != $vdef)? " -w $value" : "" ( "" , " -w " + str(value) )[ value is not None and value != vdef] You must enter an integer between 60 and 210. $value >= 60 and $value <= 210 value >= 60 and value <= 210 The number is truncated to a # exactly divisible by 15 (-w 100 will be interpreted as -w 90) and actual printed output will be about 20 characters wider. Also applies to output of the ladder and gel maps, so if you are trying to get more accuracy and your output device can display small fonts, you may want to use this flag to widen the output. graphic Histogram output (-G) Choice defined $binsize binsize is not None X X Y L (defined $value)? " -G$binsize,$value" : "" ( "" , " -G" + str(binsize) +"," + str(value) )[ value is not None ] binsize Step size in histogram Integer $graphic graphic "" "" N bases for which hits should be pooled (integer) idonly Controls the output for sequences (-i) Choice -1 -1 0 1 2 (defined $value and $value ne $vdef)? " -i $value" : "" ( "" , " -i " + str(value) )[ value is not None and value != vdef] Controls the output for sequences (in a collection) that have no hits for the options selected. html Generates HTML tags (-H) Boolean 0 ($value)? " -H" : "" ( "" , " -H")[value] ps Generates a postscript plasmid map (--ps) Boolean 0 ($value)? " --ps" : "" ( "" , " --ps")[value] selection_options Enzymes Selection options 2 max_cut Maximum number of cuts allowed in sequence (-M) Integer (defined $value)? " -M $value" : "" ( "" , " -M " + str(value) )[ value is not None ] min_cut Minimum number of cuts in sequence for the enzyme to be selected (-m) Integer (defined $value)? " -m $value" : "" ( "" , " -m " + str(value) )[ value is not None ] magnitude Select enzymes by magnitude of recognition site (-n) Choice 3 3 4 5 6 7 8 (defined $value and $value ne $vdef)? " -n $value" : "" ( "" , " -n " + str(value) )[ value is not None and value != vdef] 2 The 'magnitude' of the recognition sequence depends on the number of defined bases that make up the site. Degenerate bases can also contribute: acgt each count '1' magnitude point yrwsmk each count '1/2' magnitude point bdhu each count '1/4' magnitude point n doesn't count at all Those enzymes sequences' patterns that 'sum' to the equivalent of at least the given magnitude pass the test The values are upwardly inclusive (5=5,6,7,8 6=6,7,8 ...) tgca=4, tgyrca=5, tgcnnngca=6 overhang Select enzymes by overhang generated (-o) Choice 1 5 3 0 1 (defined $value and $value ne $vdef)? " -o $value" : "" ( "" , " -o " + str(value) )[ value is not None and value != vdef] analyses Analyses 2 summary Summary of site information (-s) Boolean 1 ($value)? " -s" : "" ( "" , " -s" )[ value ] Prints the summary of site information, describing how many times each pattern matches the sequence. Those that match zero times are shown first. In Ver >2, only those that match at least once are shown in the second part (the 0 matchers are not reiterated) print_fragments Print/sort table of fragments (-F) Choice 0 0 1 2 3 (defined $value and $value ne $vdef)? " -F $value" : "" ( "" , " -F " + str(value) )[ value is not None and value != vdef] sites Table of actual cut sites (-S) Boolean 0 ($value)? " -S 1" : "" ( "" , " -S 1" )[ value ] ladder_map Ladder map of selected enzymes (-l) Boolean 0 ($value)? " -l" : "" ( "" , " -l" )[ value ] Specify if you want a ladder map of selected enzymes, much like the GCG MAPPLOT output. Also appends a summary of those enzymes that match a few times. The number of matches that is included in the summary is length-sensitive in the distributed source code. gel_map Print a pseudo-graphic gel map (-g) Boolean 0 ($value)? " -g $cutoff" : "" ( "" , " -g " + str(cutoff) )[ value ] cutoff Low-end cutoff in number of bases for gel map (>= 10) Integer $gel_map gel_map "" "" You can cut off any size in 10^n increments (as you might want to cut off very large fragments if you were doing chromosomal digests). linear_map_options Linear map 2 linear_map Specify if you want linear map (-L) Boolean 0 ($value)? " -L" : "" ( "" , " -L" )[ value ] This spews the most output (about 10x the # of input characters) and depending on what other options are specified, can be of moderate to very little use. If you want the co-translation, you'll have to specify it via the -T flag below. The Linear map also no longer shows ALL the patterns that match from the pattern file. It now obeys the same filtering rules that the Sites, Fragments, Ladder Map and other analyses do. translation Linear co-translation (-T) Boolean $linear_map and defined $three_letter and defined $translation_frames linear_map and three_letter is not None and translation_frames is not None 0 ($value)? " -T $translation_frames,$three_letter" : "" ( "" , " -T " + str(translation_frames) + "," + str(three_letter) )[ value ] Requests frames 1, 1-3, or 1-6 to be cotranslated with the Linear Map using 1 or 3 letter codes. "-T3,3" translates Frames 1,2,3 with 3 letter labels. "-T1,1" translates Frame 1 with 1 letter labels. translation_frames Translation in how many frames Choice $translation translation 1 1 3 6 three_letter Translation code ( 1 or 3-letter) Choice $translation translation 1 1 3 orf_options Open Reading Frames 2 orf Do an ORF analysis (-O) Boolean defined $frame and defined $min_size frame is not None and min_size is not None 0 ($value)? " -O $frame,$min_size" : "" ( "" , " -O " + str(frame)+","+ str(min_size) )[ value ] -O 145,25 frames 1,4,5 with a min ORF size of 25 AAs -O 35x,200 frames 3 & 5 with a min ORF size of 200 AAs, with extended info. -O 2,66 frame 2 with a min ORF size of 66 AAs frame Frames to search MultipleChoice $orf orf 1 1 2 3 4 5 6 "" "" min_size Min ORF size Integer $orf orf "" "" pattern_search_options Pattern Search 2 pattern_search Do a pattern search (-p) Boolean defined $Name and defined $pattern and defined $errors Name is not None and pattern is not None and errors is not None 0 ($value)? " -p $Name,$pattern,$errors" : "" ( "" , " -p " + str(Name)+","+ str(pattern) +"," + str(errors) )[ value ] Name = Pattern name (1-10 chars) Pattern = <30 IUPAC characters (ie. gryttcnnngt) Err = max # of errors that are tolerated (<6). pattern Pattern (<30 IUPAC character) String $pattern_search pattern_search "" "" errors Max number of errors that are tolerated (<6) (-p) Integer $pattern_search pattern_search 0 "" "" Name Label of pattern String $pattern_search pattern_search pattern1 "" "" proximity_options Search for spatial relationships between factors (-P) 2 proximity Do a proximity search Boolean defined $nameA and defined $distance and defined $nameB nameA is not None and distance is not None and nameB is not None 0 ($value)? " -P $nameA,$distance,$nameB" : "" ( "" , " -P " + str(nameA)+"," + str(distance)+ "," + str(nameB) )[ value ] Use this option to search for spacial relationships between factors, 2 at a time (up to a total of 10). -PHindIII,350,bamhi Match all HindIII sites within 350 bases of BamHI sites -PPit1,-30-2500,Tataa Match all Pit1 sites that are 30 to 2500 bases UPSTREAM of a Tataa site. distance Distance between factors String $proximity proximity "" "" Distance specification: [+-][lg]Dist_Lo[-Dist_Hi + NameA is DOWNSTREAM of NameB (default is either) - NameA is UPSTREAM of NameB (ditto) l NameA is LESS THAN Dist_Lo from NameB (default) g NameA is GREATER THAN Dist_Lo from NameB nameA Name of first factor (nameA) String $proximity proximity "" "" NameA must be enzymes names (Rebase db) HindIII, bamhi, PsiI, PvuI, RsrII, ... nameB Name of second factor (nameB) String $proximity proximity "" "" NameB must be enzymes names (Rebase db) HindIII, bamhi, PsiI, PvuI, RsrII, .... outfile Tacg output file TacgTextReport Report not $html not html "tacg.out" "tacg.out" psfile Postscript file PostScript Binary $ps ps "*.ps" "*.ps" htmlfile Html file TacgHtmlReport Report $html html " > tacg.html" " > tacg.html" 1000 "*.html" "*.html" Programs-5.1.1/jackhmmer.xml0000644000175000001560000020620111767572177014662 0ustar bneronsis jackhmmer JACKHMMER Iteratively search protein sequence(s) against a protein database hmm:database:search database:search:hmm jackhmmer qsequence Query sequence(s) Protein Sequence FASTA " $value" " "+str(value) 13 db Choose a protein sequence database Choice null null " $value" " "+str(value) 14 n_it Maximum number of iterations (-N) Integer 5 ($value != $vdef) ? " -N $value" :"" ("", " -N " + str(value)) [ value != vdef] Enter a value > 0. 0 <$value 0 < value Set the maximum number of iterations (default is 5). If =1, the result is equivalent to a phmmer search. 1 output Directing output 2 By default, output for each iteration appears on stdout in a somewhat human readable, somewhat parseable format. These options allow redirecting that output or saving additional kinds of output to files, including checkpoint files for each iteration. outfile Direct output to file (-o) Boolean 0 ($value != $vdef) ? " -o jackmmer.output" : "" ("", " -o jackhmmer.output") [ value != vdef] Direct the main "human-readable" output to a file instead of the default stdout. aligfile Save multiple alignment of hits to file (-A) Boolean 0 ($value != $vdef) ? " -A jackhmmer.align" : "" ("", " -A jackhmmer.align") [ value != vdef] After the final iteration, save an annotated multiple alignment of all hits satisfying inclusion thresholds (also including the original query) to a file (Stockholm format). seqtab Save parseable table of per-sequence hits to file (--tblout) Boolean 0 $value != $vdef) ? " --tblout jackhmmer.tblout" : "" ("", " --tblout jackhmmer.tblout") [ value != vdef] After the final iteration, save a tabular summary of top sequence hits to a file in a readily parseable, columnar, whitespace-delimited format. domaintab Save parseable table of per-domain hits to file (--domtblout) Boolean 0 ($value != $vdef) ? " --domtblout jackhmmer.domtblout" : "" ("", " --domtblout jackhmmer.domtblout") [ value != vdef] After the final iteration, save a tabular summary of top domain hits to a file in a readily parseable, columnar, whitespace-delimited format. chkhmm Save HMM checkpoints (--chkhmm) Boolean 0 ($value != $vdef) ? " --chkhmm jackhmmer" : "" ("", " --chkhmm jackhmmer") [ value != vdef] At the start of each iteration, checkpoint the query HMM, saving it to a file named phmmer-n.hmm where n is the iteration number (from 1..N). chkali Save alignment checkpoints (--chkali) Boolean 0 ($value != $vdef) ? " --chkali jackhmmer" : "" ("", " --chkali jackhmmer") [ value != vdef] At the end of each iteration, checkpoint an alignment of all domains satisfying inclusion thresholds (e.g. what will become the query HMM for the next iteration), saving it to a file named phmmer-n.sto in Stockholm format, where n is the iteration number (from 1..N). acc Prefer accessions over names in output (--acc) Boolean 0 ($value != $vdef) ? " --acc" : "" ("", " --acc") [ value != vdef] Use accessions instead of names in the main output, where available for profiles and/or sequences. noali Don't output alignments, so output is smaller (--noali) Boolean 0 ($value != $vdef) ? " --noali" : "" ("", " --noali") [ value != vdef] Omit the alignment section from the main output. This can greatly reduce the output volume. notextw Unlimit ASCII text output line width (--notextw) Boolean 0 ($value != $vdef) ? " --notextw" : "" ("", " --notextw" ) [ value != vdef] Unlimit the length of each line in the main output. The default is a limit of 120 characters per line, which helps in displaying the output cleanly on terminals and in editors, but can truncate target profile description lines. textw Max width of ASCII text output lines (--textw) Integer 120 $notextw == 0 notextw == 0 ($value != $vdef) ? " --textw $value" : "" ("", " --textw " + str(value) ) [ value != vdef] Enter a value >=120. 120 <=$value 120 <=value scoringsys Controlling single sequence scoring in first iteration 3 By default, the first iteration uses a search model constructed from a single query sequence. This model is constructed using a standard 20x20 substitution matrix for residue probabilities, and two additional parameters for position-independent gap open and gap extend probabilities. These options allow the default single-sequence scoring parameters to be changed. popen Gap open probability (--popen) Float 0.02 ($value != $vdef) ? " --popen $value" : "" ("", " --popen " + str(value)) [ value != vdef] Enter a value >= 0 and <0.5 0 <= $value <0.5 0 <= value <0.5 Set the gap open probability for a single sequence query model. This probability has to be >= 0 and <0.5. Default value = 0.02. pextend Gap extend probability (--pextend) Float 0.4 ($value != $vdef) ? " --pextend $value" : "" ("", " --pextend " + str(value)) [ value != vdef] Enter a value >= 0 and <1 0 <= $value <1 0 <= value <1 Set the gap extend probability for a single sequence query model. This probability has to be >= 0 and <1. Default value: 0.4. matrix Substitution score matrix (--mxfile) Choice BLOSUM62 BLOSUM62 BLOSUM30 BLOSUM35 BLOSUM40 BLOSUM45 BLOSUM50 BLOSUM55 BLOSUM60 BLOSUM65 BLOSUM70 BLOSUM75 BLOSUM80 BLOSUM85 BLOSUM90 PAM10 PAM20 PAM30 PAM40 PAM50 PAM60 PAM70 PAM80 PAM90 PAM100 PAM110 PAM120 PAM130 PAM140 PAM150 PAM160 PAM170 PAM180 PAM190 PAM200 PAM210 PAM220 PAM230 PAM240 PAM250 PAM260 PAM270 PAM280 PAM290 PAM300 PAM310 PAM320 PAM330 PAM340 PAM350 PAM360 PAM370 PAM380 PAM390 PAM400 ($value != $vdef) ? " --mxfile $value" : "" ("", " --mxfile " + str(value)) [ value != vdef] To obtain residue alignment probabilities from a substitution matrix. The default score matrix is BLOSUM62 report Controlling significance thresholds for reporting 4 "Reporting" thresholds control which hits are reported in output files (the main output, --tblout, and -- domtblout). In each iteration, sequence hits and domain hits are ranked by statistical significance (E-value) and output is generated in two sections called "per-target" and "per-domain" output. The following options allow you to change the default E-value reporting thresholds, or to use bit score thresholds instead. e_threshold Thresholds for Sequences: E-value (-E) Float 10.0 ($value != $vdef) ? " -E $value" : "" ("", " -E " + str(value)) [ value != vdef] $s_threshold is None s_threshold is None Enter a value > 0. 0 <$value 0 < value Report sequences <= this E-value threshold in per-sequence output. [Default value: 10]. s_threshold Score (-T) Float ($value) ? " -T $value" : "" ("", " -T " + str(value)) [ value is not None] Enter a value > 0. 0 <$value 0 < value Use a bit score threshold for per-sequence output instead of an E-value threshold (any setting of -E is ignored). Report sequences with a bit score of >= this score threshold in output. By default this option is unset. d_e_threshold Thresholds for Domains: E-value (--domE) Float 10.0 $d_s_threshold is None d_s_threshold is None (defined $value and $value != $vdef) ? " --domE $value" : "" ("", " --domE " + str(value)) [ value is not None and value != vdef] Enter a value > 0. 0 <$value 0 < value Report domains with conditional E-values < or = this E-value threshold in per-domain output, in addition to the top-scoring domain per significant sequence hit.[Default value: 10] d_s_threshold Score (--domT) Float ($value) ? " --domT $value" : "" ("", " --domT " + str(value)) [ value is not None] Enter a value > 0. 0 <$value 0 < value Use a bit score threshold for per-domain output instead of an E-value threshold (any setting of --domE is ignored). Report domains with a bit score of >= this score threshold in per-domain output, in addition to the top-scoring domain per significant sequence hit. By default this option is unset. inclusion_A Controlling significance thresholds for inclusion in next round 5 Inclusion thresholds control which hits are included in the final multiple alignment (if the -A option is used) and which hits actually get used in the next iteration. a_e_threshold Thresholds for Sequences: E-value (--incE) Float 0.001 $a_s_threshold is None a_s_threshold is None (defined $value and $value != $vdef) ? " --incE $value" : "" ("", " --incE " + str(value)) [ value is not None and value != vdef] Include sequences with E-values <= this E-value threshold in subsequent iteration or final alignment output (-A option). The default is 0.001. a_s_threshold Score (--incT) Float ($value) ? " --incT $value" : "" ("", " --incT " + str(value)) [ value is not None] Use a bit score threshold for per-sequence inclusion instead of an E-value threshold (any setting of --incE is ignored). Include sequences with a bit score of >= this score threshold. By default this option is unset. a_d_e_threshold Thresholds for Domains: E-value (--incdomE) Float 0.001 $a_d_s_threshold is None a_d_s_threshold is None (defined $value and $value != $vdef) ? " --incdomE $value" : "" ("", " --incdomE " + str(value)) [ value is not None and value != vdef] Include domains with conditional E-values <= this E-value threshold in subsequent iteration or final alignment output (-A option), in addition to the top-scoring domain per significant sequence hit. The default is 0.001. a_d_s_threshold Score (--incdomT) Float ($value) ? " --incdomT $value" : "" ("", " --incdomT " + str(value)) [ value is not None] Use a bit score threshold for per-domain inclusion instead of an E-value threshold (any setting of --incdomE is ignored). Include domains with a bit score of > = this score threshold. By default this option is unset. heuristic Controlling acceleration heuristics 6 HMMER3 searches are accelerated in a three-step filter pipeline: - the MSV filter (the fastest and most approximate), - the Viterbi filter, - and the Forward filter (full Forward scoring algorithm, slowest but most accurate), + There is also a "bias filter" step between MSV and Viterbi. Targets that pass all the steps in the acceleration pipeline are then subjected to "postprocessing" (domain identification and scoring using the Forward/Backward algorithm). Essentially the only free parameters that control HMMER's heuristic filters are the P-value thresholds controlling the expected fraction of non-homologous sequences that pass the filters. - Setting the default thresholds higher will pass a higher proportion of non-homologous sequence, increasing sensitivity at the expense of speed, - Setting lower P-value thresholds will pass a smaller proportion, decreasing sensitivity and increasing speed, - Setting a filter's P-value threshold to 1.0 means it will passing all sequences, and effectively disables the filter. Changing filter thresholds only removes or includes targets from consideration; it does not alter bit scores, E-values, or alignments, all of which are determined solely in "postprocessing". max Turn all heuristic filters off (less speed, more power) (--max) Boolean 0 ($value != $vdef) ? " --max" : "" ("", " --max") [ value != vdef] Maximum sensitivity. Turn off all filters, including the bias filter, and run full Forward/ Backward postprocessing on every target. This increases sensitivity slightly, at a large cost in speed. F1 Stage 1 (MSV) threshold: (--F1) Float 0.02 $max==0 max==0 ($value != $vdef) ? " --F1 $value" : "" ("", " --F1 " + str(value) ) [ value != vdef] First filter threshold; set the P-value threshold for the MSV filter step. The default is 0.02, meaning that roughly 2% of the highest scoring non-homologous targets are expected to pass the filter. F2 Stage 2 (Vit) threshold: (--F2) Float 0.001 $max==0 max==0 ($value != $vdef) ? " --F2 $value" : "" ("", " --F2 " + str(value) ) [ value != vdef] Second filter threshold; set the P-value threshold for the Viterbi filter step. The default is 0.001. F3 Stage 3 (Fwd) threshold: (--F3) Float 0.00001 $max==0 max==0 ($value != $vdef) ? " --F3 $value" : "" ("", " --F3 " + str(value) ) [ value != vdef] Third filter threshold; set the P-value threshold for the Forward filter step. The default is 1e-5. nobias Turn off composition bias filter (--nobias) Boolean 0 $max==0 max==0 ($value != $vdef) ? " --nobias" : "" ("", " --nobias" ) [ value != vdef] Turn off the bias filter increases sensitivity somewhat, but can come at a high cost in speed, especially if the query has biased residue composition (such as a repetitive sequence region, or if it is a membrane protein with large regions of hydrophobicity). Without the bias filter, too many sequences may pass the filter with biased queries, leading to slower than expected performance as the computationally intensive Forward/Backward algorithms shoulder an abnormally heavy load. model_constr Controlling profile construction (later iteration) 7 These options control how consensus columns are defined in multiple alignments when building profiles. By default, jackhmmer always includes your original query sequence in the alignment result at every iteration, and consensus positions are defined by that query sequence: that is, a default jackhmmer profile is always the same length as your original query, at every iteration. fast Quickly and heuristically determine the architecture of the model (--fast) Boolean 0 ($value != $vdef) ? " --fast" : "" ("", " --fast" ) [ value != vdef] Define consensus columns as those that have a fraction >= symfrac of residues as consensus/opposed to gaps. (See below for the --symfrac option.) This option may have undesirable effects in jackhmmer, because a profile could iteratively walk in sequence space away from your original query, leaving few or no consensus columns corresponding to its residues. symfrac Symbol fraction controlling --fast construction (--symfrac) Float $fast==1 fast==1 0.5 ($value != $vdef) ? " --symfrac $value" : "" ("", " --symfrac " + str(value) ) [ value != vdef] Enter a value >=0 and <=1 . 0 <=$value <=1 0 < value <=1 Define the residue fraction threshold necessary to define a consensus column when using the --fast option. The default is 0.5. The symbol fraction in each column is calculated after taking relative sequence weighting into account, and ignoring gap characters corresponding to ends of sequence fragments (as opposed to internal insertions/deletions). - Setting this to 1.0 means that every alignment column will be assigned as consensus, which may be useful in some cases. - Setting it to 0.0 is a bad idea, because no columns will be assigned as consensus, and you will get a model of zero length. fragthresh Threshold to tag sequence as a fragment (--fragthresh) Float 0.5 $fast==1 fast==1 ($value != $vdef) ? " --fragthresh $value" : "" ("", " --fragthresh " + str(value) ) [ value != vdef] Enter a value >=0 and <=1 . 0 <=$value <=1 0 < value <=1 We only want to count terminal gaps as deletions if the aligned sequence is known to be full-length, not if it is a fragment (for instance, because only part of it was sequenced). HMMER uses a simple rule to infer fragments: if the sequence length is less than a fraction threshold times the mean sequence length of all the sequences in the alignment, then the sequence is handled as a fragment. The default is 0.5. w_option Controlling relative weights and effective sequence number in models after first iteration 8 Whenever a profile is built from a multiple alignment, HMMER uses an ad hoc sequence weighting algorithm to downweight closely related sequences and upweight distantly related ones. This has the effect of making models less biased by uneven phylogenetic representation. After relative weights are determined, they are normalized to sum to a total effective sequence number (eff_nseq). This number may be the actual number of sequences in the alignment, but it is almost always smaller than that. wmodel Relative weights in models Choice wpb wpb wgsc wblosum wnone ($value != $vdef) ? " --$value" : "" ("", " --" + str(value) ) [ value != vdef] These option controls which ad hoc sequence weighting algorithm gets used: - Use the Henikoff position-based sequence weighting scheme [Henikoff and Henikoff, J. Mol. Biol. 243:574, 1994]. This is the default. - Use the Gerstein/Sonnhammer/Chothia weighting algorithm [Gerstein et al, J. Mol. Biol. 235:1067, 1994]. - Use the same clustering scheme that was used to weight data in calculating BLOSUM subsitution matrices [Henikoff and Henikoff, Proc. Natl. Acad. Sci 89:10915,1992]. Sequences are single-linkage clustered at an identity threshold (default 0.62; see --wid option) and within each cluster of c sequences, each sequence gets relative weight 1/c. - No relative weights. All sequences are assigned uniform weight. wid Set identity cutoff in case of Henikoff simple filter weights (--wblosum) selection (--wid) Float wmodel eq "wblosum" wmodel == "wblosum" 0.62 ($value != $vdef) ? " --wid $value" : "" ("", " --wid " + str(value) ) [ value != vdef] Enter a value >=0 and <=1 . 0 <=$value <=1 0 < value <=1 Sets the identity threshold used by single-linkage clustering when using --wblosum. Invalid with any other weighting scheme. Default is 0.62. seqnum_model Effective sequence number in models Choice eent eent eclust enone ($value != $vdef) ? " --$value" : "" ("", " --" + str(value) ) [ value != vdef] Choice between: - eent: Adjust effective sequence number to achieve a specific relative entropy per position (see --ere). This is the default. This method reduces the effective sequence number to reduce the information content (relative entropy, or average expected score on true homologs) per consensus position. - eclust: Set effective sequence number to the number of single-linkage clusters at a specific identity threshold (see --eid). This option is not recommended; it is for experiments evaluating how much better --eent is. - enone: Turn off effective sequence number determination and just use the actual number of sequences. One reason you might want to do this is to try to maximize the relative entropy/position of your model, which may be useful for short models. eff_snum Effective sequence number for all model (--eset) Integer seqnum_model ne "enone" seqnum_model != "enone" ($value != $vdef) ? " --eset $value" : "" ("", " --eset " + str(value) ) [ value != vdef] ere Minimum relative entropy/position target for --eent (--ere) Float $seqnum_model eq "eent" and not $eff_snum seqnum_model == "eent" and not eff_snum ($value) ? " --ere $value" : "" ("", " --ere " + str(value) ) [ value is not None] Enter a value >0. 0 <$value 0 < value Set the minimum relative entropy/position target. Requires --eent. Default depends on the sequence alphabet; for protein sequences, it is 0.59 bits/position. esigma Sigma parameter for --eent (--esigma) Float $seqnum_model eq "eent" and not $eff_snum seqnum_model == "eent" and not eff_snum 45.0 (defined $value and $value!=$vdef) ? " --esigma $value" : "" ( "" , " --esigma " + str(value) )[ value is not None and value !=vdef ] Enter a value >0. 0 <$value 0 < value Sets the minimum relative entropy contributed by an entire model alignment, over its whole length. This has the effect of making short models have higher relative entropy per position than --ere alone would give. The default is 45.0 bits. eid Fractional Identity cutoff for --eclust (--eid) Float $seqnum_model eq "eclust" and not $eff_snum seqnum_model == "eclust" and not eff_snum 0.62 ($value != $vdef) ? " --eid $value" : "" ("", " --eid " + str(value) ) [ value != vdef ] Enter a value >=0 and <=1 . 0 <=$value <=1 0 <= value <=1 Sets the fractional pairwise identity cutoff used by single linkage clustering with the --eclust option. The default is 0.62. MSV Controlling E-value calibration for Stage 1 - MSV Gumbel mu fit 9 Estimating the location parameters for the expected score distributions for MSV filter scores, Viterbi filter scores, and Forward scores requires three short random sequence simulations. eml Length of sequences (--EmL) Integer 200 ($value != $vdef) ? " --EmL $value" : "" ("", " --EmL " + str(value) ) [ value != vdef] Enter a value > 0. 0 <$value 0 < value Sets the sequence length in simulation that estimates the location parameter mu for MSV filter E-values. Default is 200. emn Number of sequences (--EmN) Integer 200 ($value != $vdef) ? " --EmN $value" : "" ("", " --EmN " + str(value) ) [ value != vdef] Enter a value > 0. 0 <$value 0 < value Sets the number of sequences in simulation that estimates the location parameter mu for MSV filter E-values. Default is 200. Ecalibration2 Controlling E-value calibration for Stage 2 - Viterbi Gumbel mu fit 10 Estimating the location parameters for the expected score distributions for MSV filter scores, Viterbi filter scores, and Forward scores requires three short random sequence simulations. evl Length of sequences (--EvL) Integer 200 ($value != $vdef) ? " --EvL $value" : "" ("", " --EvL " + str(value) ) [ value != vdef] Enter a value > 0. 0 <$value 0 < value Sets the sequence length in simulation that estimates the location parameter mu for Viterbi filter E-values. Default is 200. evn Number of sequences (--EvN) Integer 200 ($value != $vdef) ? " --EvN $value" : "" ("", " --EvN " + str(value) ) [ value != vdef] Enter a value > 0. 0 <$value 0 < value Sets the number of sequences in simulation that estimates the location parameter mu for Viterbi filter E-values. Default is 200. Ecalibration3 Controlling E-value calibration for Stage 3 - Forward exponential tail tau fit 11 Estimating the location parameters for the expected score distributions for MSV filter scores, Viterbi filter scores, and Forward scores requires three short random sequence simulations. efl Length of sequences (--EfL) Integer 100 ($value != $vdef) ? " --EfL $value" : "" ("", " --EfL " + str(value) ) [ value != vdef] Enter a value > 0. 0 <$value 0 < value Sets the sequence length in simulation that estimates the location parameter tau for Forward E-values. Default is 100. efn Number of sequences (--EfN) Integer 200 ($value != $vdef) ? " --EfN $value" : "" ("", " --EfN " + str(value) ) [ value != vdef] Enter a value > 0. 0 <$value 0 < value Sets the number of sequences in simulation that estimates the location parameter tau for Forward E-values. Default is 200. eft Tail mass (--Eft) Float 0.04 ($value != $vdef) ? " --Eft $value" : "" ("", " --Eft " + str(value) ) [ value != vdef] Enter a value > 0 and <1. 0 <$value 0 < value Sets the tail mass fraction to fit in the simulation that estimates the location parameter tau for Forward evalues. Default is 0.04. other Expert options 12 nonull Turn off biased composition score corrections (--nonull2) Boolean 0 $max==0 max==0 ($value != $vdef) ? " --nonull2" : "" ("", " --nonull2" ) [ value != vdef] Turn off the "null2" score corrections for biased composition. z Number of comparisons done, for E-value calculation (-Z) Integer ($value) ? " -Z $value" : "" ("", " -Z " + str(value)) [ value is not None] Enter a value > 0. 0 <$value 0 < value Assert that the total number of targets in your searches is this number, for the purposes of per-sequence E-value calculations, rather than the actual number of targets seen. d_z Number of significant sequences, for domain E-value calculation (--domZ) Integer ($value) ? " --domZ $value" : "" ("", " --domZ " + str(value)) [ value is not None] Enter a value > 0. 0 <$value 0 < value Assert that the total number of targets in your searches is this number, for the purposes of per-domain conditional E-value calculations, rather than the number of targets that passed the reporting thresholds. seed Set Random Number Generator seed to (--seed) Integer 42 ($value != $vdef) ? " --seed $value" : "" ("", " --seed " + str(value) ) [ value != vdef] Seed the random number generator with this, an integer >= 0. The default seed is 42. If >0, any stochastic simulations will be reproducible; the same command will give the same results. If = 0, the random number generator is seeded arbitrarily, and stochastic simulations will vary from run to run of the same command. out_file Output file Text $outfile==1 outfile==1 *.output "*.output" ali_file Alignment file Protein Alignment STOCKHOLM $aligfile==1 aligfile==1 *.align "*.align" seq_file Parseable table of per-sequence hits Text $seqtab==1 seqtab==1 *.tblout "*.tblout" dom_file Parseable table of per-domain hits Text $domaintab==1 domaintab==1 *.domtblout "*.domtblout" chkhmm_file HMM checkpoints files Text $chkhmm==1 chkhmm==1 *.hmm "*.hmm" chkali_file Alignment checkpoints files Protein Alignment STOCKHOLM $chkali==1 chkali==1 *.sto "*.sto" Programs-5.1.1/muscle.xml0000644000175000001560000004217211663746471014211 0ustar bneronsis muscle 3.8.31 Muscle MUSCLE is a program for creating multiple alignments of amino acid or nucleotide sequences. Edgar, R.C. Edgar, Robert C. (2004), MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research 32(5), 1792-97. http://www.drive5.com/muscle/ http://www.drive5.com/muscle/ http://www.drive5.com/muscle/downloads.htm alignment:multiple muscle quiet String " -quiet" " -quiet" inputs Inputs options sequence Sequences (-in) Sequence FASTA 2,n not defined($profile1) and not defined($profile2) profile1 is None and profile2 is None "-in $value" " -in " + str(value) 10 seqtype Determining sequence type (-seqtype) Choice auto auto protein dna rna (defined $value and $value ne $vdef) ? " -seqtype $value " : "" ( "" , " -seqtype " + str(value) )[ value is not None and value != vdef ] By default, MUSCLE looks at the first 100 letters in the input sequence data (excluding gaps). If 95% or more of those letters are valid nucleotides (AGCTUN), then the file is treated as nucleotides, otherwise as amino acids. This method almost always guesses correctly, but you can make sure by specifying the sequence type on the command line. optimization Optimization parameters maxiters Maximum number of iterations (-maxiters) Integer 16 (defined $value and $value != $vdef) ? " -maxiters $value" : "" ( "" , " -maxiters " + str( value ) )[ value is not None and value != vdef] You can control the number of iterations that MUSCLE does by specifying the -maxiters option. If you specify 1, 2 or 3, then this is exactly the number of iterations that will be performed. If the value is greater than 3, then muscle will continue up to the maximum you specify or until convergence is reached, which ever happens sooner. The default is 16. If you have a large number of sequences, refinement may be rather slow. maxtrees Maximum number of trees (-maxtrees) Integer 1 (defined $value and $value != $vdef) ? "-maxtrees $value" : "" ( "" , " -maxtrees " + str( value ) )[ value is not None and value != vdef] Option controls the maximum number of new trees to create in iteration 2. experience suggests that a point of diminishing returns is typically reached after the first tree, so the default value is 1. If a larger value is given, the process will repeat until convergence or until this number of trees has been created, which ever comes first. maxhours Maximum time to run in hours (-maxhours) Float (defined $maxhours) ? "-maxhours $value" : "" ( "" , " -maxhours " + str( value ) )[ maxhours is not None ] If you have a large alignment, muscle may take a long time to complete. It is sometimes convenient to say "I want the best alignment I can get in 24 hours" rather than specifying a set of options that will take an unknown length of time. This is done by using -maxhours, which specifies a floating-point number of hours. If this time is exceeded, muscle will write out current alignment and stop. For example, muscle -in huge.fa -out huge.afa -maxiters 9999 -maxhours 24.0 diags Find diagonals (faster for similar sequences) (-diags) Boolean 0 (defined $value and $value != $vdef) ? " -diags " : "" ( "" , " -diags " )[ value is not None and value != vdef] Creating a pair-wise alignment by dynamic programming requires computing an L1 * L2 matrix, where L1 and L2 are the sequence lengths. A trick used in algorithms such as BLAST is to reduce the size of this matrix by using fast methods to find "diagonals", i.e. short regions of high similarity between the two sequences. This speeds up the algorithm at the expense of some reduction in accuracy. scoring The profile scoring function (for protein only) Choice $seqtype ne "nucleo" seqtype != "nucleo" le le "" "" sp " -sp " " -sp " sv " -sv " " -sv " Three different protein profile scoring functions are supported, - the log-expectation score (-le option) - and a sum of pairs score using either the PAM200 matrix (-sp) - or the VTML240 matrix (-sv). The log-expectation score is the default as it gives better results on our tests, but is typically somewhere between two or three times slower than the sum-of-pairs score. For nucleotides, -spn is currently the only option (which is of course the default for nucleotide data, so you don't need to specify this option). profile_option Profile Alignments parameters To align two sequence alignments. Not compatible with Input options. profile1 Profile 1 Alignment FASTA not defined($sequence) sequence is None "-in1 $value" " -in1 " + str(value) profile2 Profile 2 Alignment FASTA not defined($sequence) sequence is None "-in2 $value" " -in2 " + str(value) profile (-profile) Integer not defined($sequence) and defined($profile1) and defined($profile2) sequence is None and profile1 is not None and profile2 is not None "" " -profile " outpout_options Output Options outformat output format Choice fasta fasta "" "" html " -html " " -html " msf " -msf " " -msf " phyi " -phyi " " -phyi " clw " -clw " " -clw " clwstrict " -clwstrict " " -clwstrict " fasta : Write output in Fasta format html : Write output in HTML format msf : Write output in GCG MSF format phylip : Write output in Phylip (interleaved) format muscle : Write output in CLUSTALW format with muscle header clustalw : Write output in CLUSTALW format with CLUSTAL W (1.81) outfile Filename (-out) Filename " -out $value" ("", " -out " + str(value))[value is not None] alignmentout Alignment Alignment FASTA MSF PHYLIPI CLUSTAL MUSCLE $outformat =~ /^(fasta|msf|phyi|clwstrict|clw)$/ outformat in [ 'fasta' , 'msf' , 'phyi' , 'clwstrict' , 'clw'] (defined $outfile) ? "$outfile" : "muscle.out" ( outfile , "muscle.out")[outfile is None] muscleHtmlout Alignment MuscleHtmlAlignment AbstractText $outformat == 'html' outformat == 'html' (defined $outfile) ? "$outfile" : "muscle.out" ( outfile , "muscle.out")[outfile is None] Programs-5.1.1/dreg.xml0000644000175000001560000001644412072525233013627 0ustar bneronsis dreg EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net dreg Regular expression search of nucleotide sequence(s) http://bioweb2.pasteur.fr/docs/EMBOSS/dreg.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:motifs dreg e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_pattern Regular expression pattern DNA Pattern AbstractText ("", " -pattern=@" + str(value))[value is not None] 2 e_output Output section e_outfile Name of the report file Filename dreg.report ("" , " -outfile=" + str(value))[value is not None] 3 e_rformat_outfile Choose the report output format Choice SEQTABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 4 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/compseq.xml0000644000175000001560000002565012072525233014354 0ustar bneronsis compseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net compseq Calculate the composition of unique words in sequences http://bioweb2.pasteur.fr/docs/EMBOSS/compseq.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:composition sequence:protein:composition compseq e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_infile Program compseq output file (optional) CompseqReport Report ("", " -infile=" + str(value))[value is not None] 2 This is a file previously produced by 'compseq' that can be used to set the expected frequencies of words in this analysis. The word size in the current run must be the same as the one in this results file. Obviously, you should use a file produced from protein sequences if you are counting protein sequence word frequencies, and you must use one made from nucleotide frequencies if you are analysing a nucleotide sequence. e_required Required section e_word Word size to consider (e.g. 2=dimer) (value greater than or equal to 1) Integer 2 ("", " -word=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 3 This is the size of word (n-mer) to count. Thus if you want to count codon frequencies for a nucleotide sequence, you should enter 3 here. e_additional Additional section e_frame Frame of word to look at (0=all frames) (value greater than or equal to 0) Integer 0 ("", " -frame=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 4 The normal behaviour of 'compseq' is to count the frequencies of all words that occur by moving a window of length 'word' up by one each time. This option allows you to move the window up by the length of the word each time, skipping over the intervening words. You can count only those words that occur in a single frame of the word by setting this value to a number other than zero. If you set it to 1 it will only count the words in frame 1, 2 will only count the words in frame 2 and so on. e_ignorebz Ignore the amino acids b and z and just count them as 'other' Boolean 1 (" -noignorebz", "")[ bool(value) ] 5 The amino acid code B represents Asparagine or Aspartic acid and the code Z represents Glutamine or Glutamic acid. These are not commonly used codes and you may wish not to count words containing them, just noting them in the count of 'Other' words. e_reverse Count words in the forward and reverse sense Boolean 0 ("", " -reverse")[ bool(value) ] 6 Set this to be true if you also wish to also count words in the reverse complement of a nucleic sequence. e_calcfreq Calculate expected frequency from sequence Boolean 0 ("", " -calcfreq")[ bool(value) ] 7 If this is set true then the expected frequencies of words are calculated from the observed frequency of single bases or residues in the sequences. If you are reporting a word size of 1 (single bases or residues) then there is no point in using this option because the calculated expected frequency will be equal to the observed frequency. Calculating the expected frequencies like this will give an approximation of the expected frequencies that you might get by using an input file of frequencies produced by a previous run of this program. If an input file of expected word frequencies has been specified then the values from that file will be used instead of this calculation of expected frequency from the sequence, even if 'calcfreq' is set to be true. e_output Output section e_outfile Name of the output file (e_outfile) Filename outfile.composition ("" , " -outfile=" + str(value))[value is not None] 8 This is the results file. e_outfile_out outfile_out option CompseqReport Report e_outfile e_zerocount Display the words that have a frequency of zero Boolean 1 (" -nozerocount", "")[ bool(value) ] 9 You can make the output results file much smaller if you do not display the words with a zero count. auto Turn off any prompting String " -auto -stdout" 10 Programs-5.1.1/hmmfetch.xml0000644000175000001560000001045511767572177014520 0ustar bneronsis hmmfetch HMMFETCH Retrieve an HMM from pfam an HMM database hmmfetch is a small utility that retrieves an HMM called name from a HMMER model database called database. in a new format, and prints that model to standard output. For example, hmmfetch Pfam rrm retrieves the RRM (RNA recognition motif) model from Pfam. The retrieved HMM file is written in HMMER 2 ASCII format. hmm:database:search database:search:hmm hmmfetch Name Name of the HMM HMMKeys AbstractText " $value" " "+str(value) A file containing a list of one or more keys is read instead. The first white space delimited field on each non-blank non-comment line of the file is used as a key, and any remaining data on the line is ignored; this allows a variety of whitespace delimited datafiles to be used as files. 11 HMMDB HMM database Choice Pfam-A.hmm Pfam-A.hmm Pfam-B.hmm " -f $value" " -f "+str(value) 10 outfile_name Name of the HMM output file (-o) Filename (defined $value ) ? " -o $value" : "" ( "" , " -o " + str(value) )[ value is not None ] 1 Save the synthetic sequences to file rather than writing them to stdout. output_file_name Output file HmmProfile AbstractText defined $outfile_name outfile_name is not None $outfile_name str(outfile_name) output_file Output file HmmProfile AbstractText not defined $outfile_name outfile_name is None "hmmfetch.out" "hmmfetch.out" Programs-5.1.1/seqret.xml0000644000175000001560000001460012072525233014201 0ustar bneronsis seqret EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net seqret Reads and writes (returns) sequences http://bioweb2.pasteur.fr/docs/EMBOSS/seqret.html http://emboss.sourceforge.net/docs/themes sequence:edit seqret e_input Input section e_feature Use feature information Boolean 0 ("", " -feature")[ bool(value) ] 1 e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 2 e_advanced Advanced section e_firstonly Read one sequence and stop Boolean 0 ("", " -firstonly")[ bool(value) ] 3 e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename seqret.e_outseq ("" , " -outseq=" + str(value))[value is not None] 4 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 5 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 6 Programs-5.1.1/splitter.xml0000644000175000001560000002003012072525233014536 0ustar bneronsis splitter EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net splitter Split sequence(s) into smaller sequences http://bioweb2.pasteur.fr/docs/EMBOSS/splitter.html http://emboss.sourceforge.net/docs/themes sequence:edit splitter e_input Input section e_feature Use feature information Boolean 0 ("", " -feature")[ bool(value) ] 1 e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 2 e_additional Additional section e_size Size to split at (value greater than or equal to 1) Integer 10000 ("", " -size=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 3 e_overlap Overlap between split sequences (value greater than or equal to 0) Integer 0 ("", " -overlap=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 4 e_advanced Advanced section e_addoverlap Include overlap in output sequence size Boolean 0 ("", " -addoverlap")[ bool(value) ] 5 e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename splitter.e_outseq ("" , " -outseq=" + str(value))[value is not None] 6 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 7 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 8 Programs-5.1.1/targetp.xml0000644000175000001560000004067311767575053014374 0ustar bneronsis targetp 1.1 targetp predicts the subcellular location of eukaryotic proteins. http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?targetp Olof Emanuelsson, olof@sbc.su.se Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. Olof Emanuelsson, Henrik Nielsen, Søren Brunak and Gunnar von Heijne. J. Mol. Biol., 300: 1005-1016, 2000. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Henrik Nielsen, Jacob Engelbrecht, Søren Brunak and Gunnar von Heijne. Protein Engineering, 10:1-6, 1997. http://www.cbs.dtu.dk/services/TargetP/

targetp predicts the subcellular location of eukaryotic protein sequences. The assignment is based on the predicted presence of any of the N-terminal presequences: chloroplast transit peptide (cTP), mitochondrial targeting peptide (mTP) or secretory pathway signal peptide (SP).

targetp comes in two versions, one for plant proteins (-P) and one for non-plant proteins (-N). In the lat†ter case cTP is a forbidden prediction. For the sequences predicted to contain an N-terminal presequence a prediction of its length can be provided (-c).

CAVEATS :
Submit if possible 130 N-terminal residues. The suggested length is due to the fact that targetp was trained taking into account the 130 N-terminal residues, and the fact that using longer sequences does not influence the prediction in any way (apart from making it slower). The cTP and mTP cleavage site predic†tions are restricted to search for a potential cleavage site within the 100 or 120 N-terminal amino acids, respectively.

sequence:protein:localization targetp String "targetp " "targetp " sequence Input Sequence Sequence FASTA " $value" " " + str( value ) 50 >P48786; PATHOGENESIS-RELATED HOMEODOMAIN PROTEIN (PRHP). MEEISDPKPNALEQVLPTVPNGKCTAPVQMESLAVDVQKVSGEAKVRICSCWCEIVRSPEDLTKLVPCNDFAEDIKLFDS DPMQQEAESSIGIPLIPKQVTMSHNHDHESGSEMVSNEVMQENHVIATENTYQKSDFDRINMGQKETMPEEVIHKSFLES STSSIDILLNNHNSYQSGLPPENAVTDCKQVQLGHRSDDAIKNSGLVELVIGQKNVAKSPSQLVETGKRGRGRPRKVQTG LEQLVIGQKTAAKSSSQLGDTGKRSRGRPRKVQNSPTSFLENINMEQKETIPEQVTQNSILESLTIPTDNQSRTYNSDQS ELPPENAAKNCNHAQFGHQSDDTTKISGFKELVIGQETVAKSPSQLVDAGKRGRGRPRKVQTGLEQLVPVQETAAKSSSQ LGDTGKRSRGRPRKVQDSPTSLGGNVKVVPEKGKDSQELSVNSSRSLRSRSQEKSIEPDVNNIVADEGADREKPRKKRKK RMEENRVDEFCRIRTHLRYLLHRIKYEKNFLDAYSGEGWKGQSLDKIKPEKELKRAKAEIFGRKLKIRDLFQRLDLARSE GRLPEILFDSRGEIDSEDIFCAKCGSKDVTLSNDIILCDGACDRGFHQFCLDPPLLKEYIPPDDEGWLCPGCECKIDCIK LLNDSQETNILLGDSWEKVFAEEAAAAASGKNLDDNSGLPSDDSEDDDYDPGGPDLDEKVQGDDSSTDESDYQSESDDMQ VIRQKNSRGLPSDDSEDDEYDPSGLVTDQMYKDSSCSDFTSDSEDFTGVFDDYKDTGKAQGPLASTPDHVRNNEEGCGHP EQGDTAPLYPRRQVESLDYKKLNDIEFSKMCDILDILSSQLDVIICTGNQEEYGNTSSDSSDEDYMVTSSPDKNNSDKEA TAMERGRESGDLELDQKARESTHNRRYIKKFAVEGTDSFLSRSCEDSAAPVAGSKSTSKTLHGEHATQRLLQSFKENQYP QRAVKESLAAELALSVRQVSNWFNNRRWSFRHSSRIGSDVAKFDSNDTPRQKSIDMSGPSLKSVLDSATYSEIEKKEQDT ASLGLTEGCDRYMTLNMVADEGNVHTPCIAETREEKTEVGIKPQQNPL type Use the plant or non-plant version. Choice null null p np ( $value eq 'p')? " -P ": " -N " ( " -N ", " -P ")[ value == 'p' ] 10 cleavege Include cleavage site prediction (-c). Boolean 0 (defined $value and $value ne $vdef)? " -c " : "" ( "" , " -c ")[ value is not None and value != vdef ] 20 cutoffs Cutoffs predefined set of cutoffs that yielded this specificity on the TargetP test sets. predefined_cutoff Choice null null '' '' cutoff_95 ( $type eq 'p')? " -p 0.73 -t 0.86 -s 0.43 -o 0.84 " : " -t 0.78 -s 0.00 -o 0.73 " ( " -t 0.78 -s 0.00 -o 0.73 " , " -p 0.73 -t 0.86 -s 0.43 -o 0.84 " )[ type == 'p' ] cutoff_90 ( $type eq 'p')? " -p 0.62 -t 0.76 -s 0.00 -o 0.53 " : " -t 0.65 -s 0.00 -o 0.52 " ( " -t 0.65 -s 0.00 -o 0.52 " , " -p 0.62 -t 0.76 -s 0.00 -o 0.53 " )[ type == 'p' ] 30 user_cutoffs define your own Cutoffs not predefined_cutoff The user cutoffs will be ignored if a predefine set of cutoffs is specified 40 cTP cTP Float 0.0 (defined $value and $value ne $vdef)? " -p $value " : "" ( "" , " -p " + str( value ) )[ value is not None and value != vdef ] The value must be between 0.0 and 1.0 $value >= 0.0 or $value <= 1.0 value >= 0.0 or value <= 1.0 In order to increase the specificity of cTP prediction, use Pcut as a cutoff for predicting cTP: if the winning score is the chloroplast (cTP) score, specifying Pcut means that the score also has to be above that value; if not, the sequence will be left unpredicted, and an asterisk (*) will be out†put in the Loc column. The value of Pcut must be between 0.0 and 1.0. mTP mTP Float 0.0 ( defined $value and $value ne $vdef)? " -t $value" : "" ( "" , " -t " + str( value ) )[ value is not None and value != vdef ] The value must be between 0.0 and 1.0 $value >= 0.0 or $value <= 1.0 value >= 0.0 or value <= 1.0 In order to increase the specificity of mTP prediction, use Tcut as a cutoff for predicting mTP: if the winning score is the mithochondrial (mTP) score, specifying Tcut means that the score also has to be above that value; if not, the sequence will be left unpredicted, and an asterisk (*) will be out†put in the Loc column. The value of Tcut must be between 0.0 and 1.0. SP SP Float 0.0 ( defined $value and $value ne $vdef)? " -s $value" : "" ( "" , " -s " + str( value ) )[ value is not None and value != vdef ] The value must be between 0.0 and 1.0 $value >= 0.0 or $value <= 1.0 value >= 0.0 or value <= 1.0 In order to increase the specificity of SP prediction, use Scut as a cutoff for predicting SP: if the winning score is the Secretory pathway (SP) score, specifying Scut means that the score also has to be above that value; if not, the sequence will be left unpredicted, and an asterisk (*) will be out†put in the Loc column. The value of Scut must be between 0.0 and 1.0. other other Float 0.0 ( defined $value and $value ne $vdef)? " -o $value" : "" ( "" , " -o " + str( value ) )[ value is not None and value != vdef ] The value must be between 0.0 and 1.0 $value >= 0.0 or $value <= 1.0 value >= 0.0 or value <= 1.0 In order to increase the specificity of any other location prediction, use Ocut as a cutoff for predicting any other location : if the winning score is the other location score, specifying Ocut means that the score also has to be above that value; if not, the sequence will be left unpredicted, and an asterisk (*) will be out†put in the Loc column. The value of Ocut must be between 0.0 and 1.0. results targetp report Report targetp "targetp.out" "targetp.out"

The output is in plain text; it will go to stdout. For each input sequence the following is printed (on one line):

  • Name : Sequence name truncated to 20 characters.
  • Len : Sequence length.
  • cTP, mTP, SP, other : Final NN scores on which the final prediction is based (Loc, see below). Note that the scores are not really probabilities, and they do not necessarily add to one. However, the location with the highest score is the most likely according to targetp, and the relationship between the scores (the reliability class, see below) may be an indication of how certain the prediction is.
  • Loc : Prediction of localization, based on the scores above; the codes are:
    • C : Chloroplast, i.e. the sequence contains cTP, a chloroplast transit peptide;
    • M : Mitochondrion, i.e. the sequence contains mTP, a mitochondrial targeting peptide;
    • S : Secretory pathway, i.e. the sequence contains SP, a signal peptide;
    • _ : any other location;
    • * : "don't know". This character appears if cutoff restrictions were demanded (-p, -t, -s, -o, see below) and the winning network output score was below the requested cutoff for that category.
  • RC : Reliability class, from 1 to 5, where 1 indicates the strongest prediction. RC is a measure of the size of the difference ('diff') between the highest (winning) and the second highest output scores. There are 5 reliability classes, defined as follows:
    1. diff > 0.8
    2. 0.800 > diff > 0.600
    3. 0.600 > diff > 0.400
    4. 0.400 > diff > 0.200
    5. 0.200 > diff
    Thus, the lower the value of RC the safer the prediction.
  • TPlen : predicted presequence length (only when the -c option is given).
Programs-5.1.1/restrict.xml0000644000175000001560000005144112072525233014541 0ustar bneronsis restrict EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net restrict Report restriction enzyme cleavage sites in a nucleotide sequence http://bioweb2.pasteur.fr/docs/EMBOSS/restrict.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:restriction restrict e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_datafile Restriction enzyme data file (optional) RestrictionEnzymeData AbstractText ("", " -datafile=" + str(value))[value is not None] 2 e_mfile Restriction enzyme methylation data file (optional) RestrictionEnzymeMethylationData AbstractText ("", " -mfile=" + str(value))[value is not None ] 3 e_required Required section e_sitelen Minimum recognition site length (value from 2 to 20) Integer 4 ("", " -sitelen=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 2 is required value >= 2 Value less than or equal to 20 is required value <= 20 4 This sets the minimum length of the restriction enzyme recognition site. Any enzymes with sites shorter than this will be ignored. e_enzymes Comma separated enzyme list String all ("", " -enzymes=" + str(value))[value is not None and value!=vdef] 5 The name 'all' reads in all enzyme names from the REBASE database. You can specify enzymes by giving their names with commas between then, such as: 'HincII,hinfI,ppiI,hindiii'. The case of the names is not important. You can specify a file of enzyme names to read in by giving the name of the file holding the enzyme names with a '@' character in front of it, for example, '@enz.list'. Blank lines and lines starting with a hash character or '!' are ignored and all other lines are concatenated together with a comma character ',' and then treated as the list of enzymes to search for. An example of a file of enzyme names is: ! my enzymes HincII, ppiII ! other enzymes hindiii HinfI PpiI e_advanced Advanced section e_min Minimum cuts per re (value from 1 to 1000) Integer 1 ("", " -min=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 Value less than or equal to 1000 is required value <= 1000 6 This sets the minimum number of cuts for any restriction enzyme that will be considered. Any enzymes that cut fewer times than this will be ignored. e_max Maximum cuts per re Integer 2000000000 ("", " -max=" + str(value))[value is not None and value!=vdef] 7 This sets the maximum number of cuts for any restriction enzyme that will be considered. Any enzymes that cut more times than this will be ignored. e_solofragment List individual enzymes with their fragments Boolean 0 ("", " -solofragment")[ bool(value) ] 8 This gives the fragment lengths of the forward sense strand produced by complete restriction by each restriction enzyme on its own. Results are added to the tail section of the report. e_single Force single site only cuts Boolean 0 ("", " -single")[ bool(value) ] 9 If this is set then this forces the values of the mincuts and maxcuts qualifiers to both be 1. Any other value you may have set them to will be ignored. e_blunt Allow blunt end cutters Boolean 1 (" -noblunt", "")[ bool(value) ] 10 This allows those enzymes which cut at the same position on the forward and reverse strands to be considered. e_sticky Allow sticky end cutters Boolean 1 (" -nosticky", "")[ bool(value) ] 11 This allows those enzymes which cut at different positions on the forward and reverse strands, leaving an overhang, to be considered. e_ambiguity Allow ambiguous matches Boolean 1 (" -noambiguity", "")[ bool(value) ] 12 This allows those enzymes which have one or more 'N' ambiguity codes in their pattern to be considered e_plasmid Allow circular dna Boolean 0 ("", " -plasmid")[ bool(value) ] 13 If this is set then this allows searches for restriction enzyme recognition site and cut positions that span the end of the sequence to be considered. e_methylation Use methylation data Boolean 0 ("", " -methylation")[ bool(value) ] 14 If this is set then RE recognition sites will not match methylated bases. e_commercial Only enzymes with suppliers Boolean 1 (" -nocommercial", "")[ bool(value) ] 15 If this is set, then only those enzymes with a commercial supplier will be searched for. This qualifier is ignored if you have specified an explicit list of enzymes to search for, rather than searching through 'all' the enzymes in the REBASE database. It is assumed that, if you are asking for an explicit enzyme, then you probably know where to get it from and so all enzymes names that you have asked to be searched for, and which cut, will be reported whether or not they have a commercial supplier. e_output Output section e_limit Limits reports to one isoschizomer Boolean 1 (" -nolimit", "")[ bool(value) ] 16 This limits the reporting of enzymes to just one enzyme from each group of isoschizomers. The enzyme chosen to represent an isoschizomer group is the prototype indicated in the data file 'embossre.equ', which is created by the program 'rebaseextract'. If you prefer different prototypes to be used, make a copy of embossre.equ in your home directory and edit it. If this value is set to be false then all of the input enzymes will be reported. You might like to set this to false if you are supplying an explicit set of enzymes rather than searching 'all' of them. e_alphabetic Sort output alphabetically Boolean 0 ("", " -alphabetic")[ bool(value) ] 17 e_fragments Show fragment lengths Boolean 0 ("", " -fragments")[ bool(value) ] 18 This gives the fragment lengths of the forward sense strand produced by complete restriction using all of the input enzymes together. Results are added to the tail section of the report. e_name Show sequence name Boolean 0 ("", " -name")[ bool(value) ] 19 e_outfile Name of the report file Filename restrict.report ("" , " -outfile=" + str(value))[value is not None] 20 e_rformat_outfile Choose the report output format Choice TABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 21 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 22 Programs-5.1.1/seqretsplit.xml0000644000175000001560000001367612072525233015271 0ustar bneronsis seqretsplit EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net seqretsplit Reads sequences and writes them to individual files http://bioweb2.pasteur.fr/docs/EMBOSS/seqretsplit.html http://emboss.sourceforge.net/docs/themes sequence:edit seqretsplit e_input Input section e_feature Use feature information Boolean 0 ("", " -feature")[ bool(value) ] 1 e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 2 e_advanced Advanced section e_firstonly Read one sequence and stop Boolean 0 ("", " -firstonly")[ bool(value) ] 3 e_output Output section e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 4 e_outseq_out outseq_out option Sequence "*." + str( e_osformat_outseq).lower() auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/yank.xml0000644000175000001560000001030712072525233013640 0ustar bneronsis yank EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net yank Add a sequence reference (a full USA) to a list file http://bioweb2.pasteur.fr/docs/EMBOSS/yank.html http://emboss.sourceforge.net/docs/themes sequence:edit yank e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_output Output section e_newfile Overwrite existing output file Boolean 0 ("", " -newfile")[ bool(value) ] 2 e_outfile Name of the output file (e_outfile) Filename yank.e_outfile ("" , " -outfile=" + str(value))[value is not None] 3 e_outfile_out outfile_out option UsaList AbstractText e_outfile auto Turn off any prompting String " -auto -stdout" 4 Programs-5.1.1/extend_align.xml0000644000175000001560000001077212030620132015331 0ustar bneronsis extend_align concatenate several alignments from several files This tool concatenates multiple MSAs.

for instance

fisrt alignment
            >seq1
            aaaaggg
            >seq2
            aaaa--g
            >seq3
            aa--ggg
            
second alignment
            >seq1
            ccccttt
            >seq2
            cccc--t
            >seq3
            cc--ttt
            
the resulting alignment if the linker is "---" will be:
            >seq1
            aaaaggg---ccccttt
            >seq2
            aaaa--g---cccc--t
            >seq3
            aa--ggg---cc--ttt
            
two methods can be used to extend the aligenmt:
  1. by sequence order: in this case the ids of sequences could be different. and the id of each sequence in the resulting alignment is the concatenation of the respective sequence id (All Alignments MUST have the same number of sequences)
  2. by sequence id: in this case the extension is made based on the sequence ids.
Néron, B.
alignment:multiple extend_align extend_method method to extend the alignment Choice null null id ( defined $value and value != $vdef)? " --id ":"" ( "" , " --id ")[value is not None and value != vdef] 5
  1. by sequence order: in this case the ids of sequences could be different. and the id of each sequence in the resulting alignment is the concatenation of the respective sequence id (All Alignments MUST have the same number of sequences)
  2. by sequence id: in this case the extension is made based on the sequence ids.
fasta_align alignment MultipleAlignment FASTA " -i $value fasta" " -i " + value + " fasta" 10 FASTA -i linker sequence linker String (defined $value)?"": " -l $value" ("", " -l " + str( value ) )[value is not None] 100 concatenated_alignment concatenated alignment Alignment FASTA "extend_align.out" "extend_align.out"
Programs-5.1.1/restover.xml0000644000175000001560000003115712072525233014555 0ustar bneronsis restover EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net restover Find restriction enzymes producing a specific overhang http://bioweb2.pasteur.fr/docs/EMBOSS/restover.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:restriction restover e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_datafile Restriction enzyme data file (optional) RestrictionEnzymeData AbstractText ("", " -datafile=" + str(value))[value is not None] 2 e_mfile Restriction enzyme methylation data file RestrictionEnzymeMethylationData AbstractText ("", " -mfile=" + str(value))[value is not None ] 3 e_required Required section e_seqcomp Overlap sequence String ("", " -seqcomp=" + str(value))[value is not None] 4 e_advanced Advanced section e_min Minimum cuts per re (value from 1 to 1000) Integer 1 ("", " -min=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 Value less than or equal to 1000 is required value <= 1000 5 e_max Maximum cuts per re Integer 2000000000 ("", " -max=" + str(value))[value is not None and value!=vdef] 6 e_single Force single site only cuts Boolean 0 ("", " -single")[ bool(value) ] 7 e_threeprime Use 3' overhang e.g. bamhi has ctag as a 5' overhang, and apai has ccgg as 3' overhang. Boolean 0 ("", " -threeprime")[ bool(value) ] 8 e_blunt Allow blunt end cutters Boolean 1 (" -noblunt", "")[ bool(value) ] 9 e_sticky Allow sticky end cutters Boolean 1 (" -nosticky", "")[ bool(value) ] 10 e_ambiguity Allow ambiguous matches Boolean 1 (" -noambiguity", "")[ bool(value) ] 11 e_plasmid Allow circular dna Boolean 0 ("", " -plasmid")[ bool(value) ] 12 e_methylation Use methylation data Boolean 0 ("", " -methylation")[ bool(value) ] 13 If this is set then RE recognition sites will not match methylated bases. e_commercial Only enzymes with suppliers Boolean 1 (" -nocommercial", "")[ bool(value) ] 14 e_output Output section e_html Create html output Boolean 0 ("", " -html")[ bool(value) ] 15 e_limit Limits reports to one isoschizomer Boolean 1 (" -nolimit", "")[ bool(value) ] 16 e_alphabetic Sort output alphabetically Boolean 0 ("", " -alphabetic")[ bool(value) ] 17 e_fragments Show fragment lengths Boolean 0 ("", " -fragments")[ bool(value) ] 18 e_outfile Name of the output file (e_outfile) Filename restover.e_outfile ("" , " -outfile=" + str(value))[value is not None] 19 e_outfile_out outfile_out option RestoverReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 20 Programs-5.1.1/rnapdist.xml0000644000175000001560000003110211767572177014541 0ustar bneronsis rnapdist RNApdist Calculate distances between thermodynamic RNA secondary structures ensembles Stadler, Hofacker, Bonhoeffer Bonhoeffer S, McCaskill J S, Stadler P F, Schuster P, (1993) RNA multistructure landscapes, Euro Biophys J:22,13-24 RNApdist reads RNA sequences from stdin and calculates structure distances between the thermodynamic ensembles of their secondary structures. To do this the partition function and matrix of base pairing probabilities is computed for each sequence. The probability matrix is then condensed into a vector holding for each base the proba-bilities of being unpaired, paired upstream, or paired downstream, respectively. These profiles are compared by a standard alignment algorithm. sequence:nucleic:2D_structure structure:2D_structure RNApdist seq RNA Sequences File DNA Sequence FASTA " < $value" " < " + str(value) 1000 comparison_options Comparison options 2 compare Which comparisons (-X) Choice p p m f c (defined $value and $value ne $vdef)? " -X$value" : "" ("", " -X" + str(value) )[value is not None and value != vdef] alignment_file Alignment file (-B) Filename (defined $value)? " -B $value" : "" ("" , " -B " + str(value))[ value is not None ] Print an 'alignment' with gaps of the structures, to show matching substructures. ( ) essentially upstream (downstream) paired bases { } weakly upstream (downstream) paired bases | strongly paired bases without preference , weakly paired bases without preference . essentially unpaired bases. others_options Other options 2 temperature Rescale energy parameters to a temperature of temperature Celcius (-T) Integer 37 (defined $value and $value != $vdef)? " -T $value" : "" ( "" , " -T " + str(value) )[ value is not None and value != vdef] tetraloops Do not include special stabilizing energies for certain tetraloops (-4) Boolean 0 ($value)? " -4" : "" ( "" , " -4" )[ value ] dangling How to treat dangling end energies for bases adjacent to helices in free ends and multiloops (-d) Choice -d1 -d1 -d -d2 (defined $value and $value ne $vdef)? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] How to treat 'dangling end' energies for bases adjacent to helices in free ends and multiloops: Normally only unpaired bases can participate in at most one dangling end. With -d2 this check is ignored, this is the default for partition function folding (-p). -d ignores dangling ends altogether. Note that by default pf and mfe folding treat dangling ends differently, use -d2 (or -d) in addition to -p to ensure that both algorithms use the same energy model. The -d2 options is available for RNAfold, RNAeval, and RNAinverse only. noGU Do not allow GU pairs (-noGU) Boolean 0 ($value)? " -noGU" : "" ( "" , " -noGU" )[ value ] noCloseGU Do not allow GU pairs at the end of helices (-noCloseGU) Boolean 0 ($value)? " -noCloseGU" : "" ( "" , " -noCloseGU" )[ value ] nsp Non standard pairs (comma seperated list) (-nsp) String (defined $value)? " -nsp $value" : "" ( "" , " -nsp " + str(value) )[ value is not None ] Allow other pairs in addition to the usual AU,GC,and GU pairs. pairs is a comma seperated list of additionally allowed pairs. If a the first character is a '-' then AB will imply that AB and BA are allowed pairs. e.g. RNAfold -nsp -GA will allow GA and AG pairs. Nonstandard pairs are given 0 stacking energy. parameter Parameter file (-P) EnergyParameterFile AbstractText (defined $value)? " -P $value" : "" ( "" , " -P " + str(value) )[ value is not None ] Read energy parameters from paramfile, instead of using the default parameter set. A sample parameterfile should accompany your distribution. See the RNAlib documentation for details on the file format. readseq String "readseq -f=19 -a $seq > $seq.tmp && (cp $seq $seq.orig && mv $seq.tmp $seq) ; " "readseq -f=19 -a "+ str(seq) + " > "+ str(seq) +".tmp && (cp "+ str(seq) +" "+ str(seq) +".orig && mv "+ str(seq) +".tmp "+ str(seq) +") ; " -10 psfiles Postscript output file PostScript Binary "*.ps" "*.ps" alnoutfile Result alignment file Alignment defined $alignment_file alignment_file is not None "$alignment_file" str(alignment_file) Programs-5.1.1/prophecy.xml0000644000175000001560000004233711672346320014542 0ustar bneronsis prophecy EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net prophecy Create frequency matrix or profile from a multiple alignment http://bioweb2.pasteur.fr/docs/EMBOSS/prophecy.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:profiles sequence:protein:profiles prophecy e_input Input section e_sequence sequence option Alignment FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_type Profile type Choice F F G H ("", " -type=" + str(value))[value is not None and value!=vdef] 2 e_datafile Scoring matrix Choice e_type !="F" mobyle_null mobyle_null EBLOSUM30 EBLOSUM35 EBLOSUM40 EBLOSUM45 EBLOSUM50 EBLOSUM55 EBLOSUM60 EBLOSUM62 EBLOSUM62-12 EBLOSUM65 EBLOSUM70 EBLOSUM75 EBLOSUM80 EBLOSUM85 EBLOSUM90 EBLOSUMN EDNAFULL EDNAMAT EDNASIMPLE EPAM10 EPAM100 EPAM110 EPAM120 EPAM130 EPAM140 EPAM150 EPAM160 EPAM170 EPAM180 EPAM190 EPAM20 EPAM200 EPAM210 EPAM220 EPAM230 EPAM240 EPAM250 EPAM260 EPAM270 EPAM280 EPAM290 EPAM30 EPAM300 EPAM310 EPAM320 EPAM330 EPAM340 EPAM350 EPAM360 EPAM370 EPAM380 EPAM390 EPAM40 EPAM400 EPAM410 EPAM420 EPAM430 EPAM440 EPAM450 EPAM460 EPAM470 EPAM480 EPAM490 EPAM50 EPAM500 EPAM60 EPAM70 EPAM80 EPAM90 SSSUB ("", " -datafile=" + str(value))[value is not None and value!=vdef] 3 'Epprofile' for Gribskov type, or EBLOSUM62 e_required Required section e_name Enter a name for the profile String mymatrix ("", " -name=" + str(value))[value is not None and value!=vdef] 4 e_profiletypesection Profile type specific options e_threshold Enter threshold reporting percentage (value from 1 to 100) Integer e_type =="F" 75 ("", " -threshold=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 Value less than or equal to 100 is required value <= 100 5 e_gapsection Gap options e_open Gap opening penalty Float e_type !="F" 3.0 ("", " -open=" + str(value))[value is not None and value!=vdef] 6 e_extension Gap extension penalty Float e_type !="F" 0.3 ("", " -extension=" + str(value))[value is not None and value!=vdef] 7 e_output Output section e_outfile Name of the output file (e_outfile) Filename prophecy.e_outfile ("" , " -outfile=" + str(value))[value is not None] 8 e_outfile_out outfile_out option ProphecyReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 9 Programs-5.1.1/fasta.xml0000644000175000001560000013146211767572177014025 0ustar bneronsis fasta 3.4(t25d6) FASTA Sequence database search W. Pearson Pearson, W. R. (1999) Flexible sequence similarity searching with the FASTA3 program package. Methods in Molecular Biology W. R. Pearson and D. J. Lipman (1988), Improved Tools for Biological Sequence Analysis, PNAS 85:2444-2448 W. R. Pearson (1998) Empirical statistical estimates for sequence similarity searches. In J. Mol. Biol. 276:71-84 Pearson, W. R. (1996) Effective protein sequence comparison. In Meth. Enz., R. F. Doolittle, ed. (San Diego: Academic Press) 266:227-258 http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml http://faculty.virginia.edu/wrpearson/fasta/ database:search:homology fasta Fasta program Choice null null fasta tfasta fastx tfastx fasty tfasty fastf tfastf fasts tfasts "$value -q" str(value) + " -q" 0 - fasta: scan a protein or DNA sequence library for similar sequences - tfasta: compare a protein sequence to a DNA sequence librarSy, translating the DNA sequence library `on-the-fly' to the 3 forward and the 3 reverse frames without frameshifts. - fastx/fasty: compare a DNA sequence to a protein sequence database, comparing the translated DNA sequence in three frames, with frameshifts. fasty2 allows frameshifts inside codons. - tfastx/tfasty: compare a protein sequence vs a translated DNA db, with frameshifts. tfasty allows frameshifts inside codons. - fastf/tfastf: compare an ordered peptide mixture (obtained for example by Edman degradation of a CNBr cleavage) against a protein or translated DNA database. - fasts/tfasts: compare a set of short peptide fragments (obtained from a mass-spec analysis of a protein) against a protein or translated DNA database. query Query sequence File Protein DNA Sequence FASTA " $value" " "+str(value) 2 seqtype Is it a DNA or protein sequence (-n) Choice null null DNA protein (defined $value and $fasta =~ /^fasta/ and $value eq "DNA") or $fasta =~ /^fast(x|y)/) ? " -n" : "" ( "" , " -n" )[ value is not None and value == "DNA" and fasta == 'fasta'] fastf, fasts, tfasta, tfastx, tfasty, tfastf and tfasts take a protein sequence fastx and fasty take a DNA sequence ($fasta =~ /^fast(f|s)/ and $seqtype eq "DNA") or ($fasta =~ /^fast(x|y)/ and $seqtype eq "protein") (seqtype == "protein" and fasta in ["fasta", "fastf", "fasts", "tfasta", "tfastx", "tfasty", "tfastf", "tfasts"]) or (seqtype == "DNA" and fasta in ["fasta", "fastx", "fasty"] ) 1 db Database 3 protein_db Protein Database Choice ($seqtype eq "protein" and $fasta =~ /^fasta/) or $fasta =~ /^fast(x|y|s|f)/ (seqtype == "protein" and fasta == "fasta") or fasta in ["fastx", "fasty", "fastf", "fasts"] null " $value" " " + str(value) Choose a protein db for fasta, fastx, fatsf, fasty or fasts. Please note that Swissprot usage by and for commercial entities requires a license agreement. nucleotid_db Nucleotid Database Choice ($seqtype eq "DNA" and $fasta =~ /^fasta/ ) or $fasta =~ /^tfast/ (seqtype == "DNA" and fasta == "fasta") or fasta in [ "tfasta", "tfastx", "tfasty", "tfastf", "tfasts"] null " $value" " " + str(value) Choose a nucleotide db for fasta, tfasta, tfastx, tfasty, tfastf or tfasts break_long Break long library sequences into blocks (-N) Integer (defined $value) ? " -N $value" : "" ( "" , " -N " + str(value) )[ value is not None ] Break long library sequences into blocks of N residues. Useful for bacterial genomes, which have only one sequence entry. -N 2000 works well for well for bacterial genomes. selectivity_opt Selectivity options 1 ktup Sensitivity and speed of the search Integer (defined $value)? " $value":"" ("" , " " + str(value) )[ value is not None ] ktup can be set to 2 or 1 for protein sequences or from 1 to 6 for DNA sequences. ($seqtype eq "protein" and ($value == 1 or $value == 2 )) or ($seqtype eq "DNA" and ($value == 1 or $value == 2 or $value == 3 or $value == 4 or $value == 5 or $value == 6 ))) (seqtype == "protein" and value in [1,2]) or (seqtype == "DNA" and value in range(1,7,1)) 4 ktup sets the sensitivity and speed of the search. If ktup=2, similar regions in the two sequences being compared are found by looking at pairs of aligned residues; if ktup=1, single aligned amino acids are examined. ktup can be set to 2 or 1 for protein sequences, or from 1 to 6 for DNA sequences. The default if ktup is not specified is 2 for proteins and 6 for DNA. 1ktup=1 should be used for oligonucleotides (DNA query length < 20). optcut Threshold for band optimization (FASTA, FASTX). (-c) Integer (defined $value)? " -c $value":"" ("" , " -c " + str(value) )[ value is not None ] Only used for fasta and fastx $fasta =~ /^fasta/ or $fasta =~ /^fastx fasta in ["fasta", "fastx"] The threshold value is normally calculated based on sequence length. gapinit Penalty for opening a gap (-f) Integer (defined $value)? " -f $value":"" ("" , " -f " + str(value))[ value is not None ] The default for fasta with proteins is -12 and -16 for DNA The default for fastx/fasty/tfastz/tfasty is -15. gapext Penalty for gap extension (-g) Integer (defined $value)? " -g $value":"" ("" , " -g " + str(value))[ value is not None ] The default for fasta is -2 for proteins and -4 for DNA The default for fastx/fasty/tfastz/tfasty is -3. high_expect Maximal expectation value threshold for displaying scores and alignments (-E) Float 10.0 (defined $value and $value != $vdef)? " -E $value":"" ("" , " -E " + str(value))[ value is not None and value != vdef] Expectation value limit for displaying scores and alignments. Defaults are 10.0 for FASTA protein searches, 5.0 for translated DNA/protein comparisons, and 2.0 for DNA/DNA searches. low_expect Minimal expectation value threshold for displaying scores and alignments (-F) Float (defined $value) ? " -F $value":"" ("" , " -F " + str(value))[ value is not None ] Expectation value lower limit for score and alignment display. If value is 1e-6 prevents library sequences with E()-values lower than 1e-6 from being displayed. This allows the use to focus on more distant relationships. This allow one to skip over close relationships in searches for more distant relationships. score_opt Scoring options 1 scoring_nucleic Nucleic penalty $fasta eq "fasta" and seqtype eq "DNA" fasta == "fasta" and seqtype == "DNA" nucleotid_match Maximum positive value for a nucleotid match (-r) Integer 5 Only positive value $value >= 0 value >= 0 nucleotid_mismatch Maximum negative penalty value for a nucleotid mismatch (-r) Integer defined $nucleotid_match nucleotid_match is not None -4 (defined $value and defined nucleotid_match and ($value != $vdef and $nucleotid_match != 5)) ? " -r \"$nucleotid_match/$value\"" : "" ( "" , ' -r "' + str(nucleotid_match) + '/' + str(value)+ '"' )[ value is not None and nucleotid_match is not None and (nucleotid_match != 5 and value != vdef) ] Only negative value $value < 0 value < 0 '+5/-4' are the default values for nucleotid match/mismatch, but '+3/-2' can perform better in some cases. scoring_protein Protein penalty seqtype ne "DNA" seqtype != "DNA" matrix Scoring matrix file (-s) Choice BL50 BL50 BL62 BL80 P20 P40 P120 P250 M10 M20 M40 (defined $value and $value ne $vdef) ? " -s $value" : "" ( "" , " -s " + str(value) )[ value is not None and value != vdef] X_penalty Penalty for a match to 'X' (independently of the PAM matrix) (-x) Integer (defined $value) ? " -x $value" : "" ( "" , " -x " + str(value) )[ value is not None ] Particularly useful for fast[xy], where termination codons are encoded as 'X'. frame_transl_opt Frameshift and translation options 1 frameshift Penalty for frameshift between two codons (fast[xy]/tfast[xy]) (-h) Integer ($fasta =~ /fast(x|y)/) fasta in ["fastx", "fasty", "tfastx", "tfasty"] (defined $value)? " -h $value":"" ("" , " -h " + str(value))[ value is not None ] frameshift_within Penalty for frameshift within a codon (fasty/tfasty) (-j) Integer ($fasta =~ /fasty/) fasta in ["fasty", "tfasty"] (defined $value)? " -j $value":"" ("" , " -j " + str(value))[ value is not None ] threeframe Search only the three forward frames (-3) Boolean $fasta =~ /^tfast(a|x|y)/ fasta in ["tfasta", "tfastx", "tfasty"] 0 ($value) ? " -3":"" ("" , " -3")[ value ] invert Reverse complement the query sequence (-i) Boolean $fasta =~ /fast(x|y)/ fasta in ["fastx", "tfastx", "fasty", "tfasty"] 0 ($value) ? " -i" : "" ( "" , " -i" )[ value ] genetic_code Use genetic code for translation (tfasta/tfast[xy]/fast[xy]) (-t) Choice $fasta =~ /^tfast/ or $fasta =~ /fast[xy]/ fasta in [ "tfasta", "tfastx", "tfasty", "tfastf", "tfasts", "fastx", "fasty" ] 1 1 2 3 4 5 6 9 10 11 12 13 14 15 (defined $value and $value ne $vdef) ? " -t $value" : "" ( "" , " -t " + str(value) )[ value is not None and value != vdef] optimize_opt Optimization options 1 band Band-width used for optimization (-y) Integer (defined $value)? " -y $value":"" ("" , " -y " + str(value))[ value is not None ] Set the band-width used for optimization. -y 16 is the default for protein when ktup=2 and for all DNA alignments. -y 32 is used for protein and ktup=1. For proteins, optimization slows comparison 2-fold and is highly recommended. swalig Force Smith-Waterman alignment for DNA (-A) Boolean $fasta =~ /^fasta/ and $seqtype eq "DNA" fasta in [ "tfasta", "fasta" ] and seqtype == "DNA" 0 ($value)? " -A":"" ("" , " -A")[ value ] Force Smith-Waterman alignment for output. Smith-Waterman is the default for protein sequences and FASTX, but not for TFASTA or DNA comparisons with FASTA. noopt Turn fasta band optimization off during initial phase (-o) Boolean 0 ($value)? " -o":"" ("" , " -o")[ value ] Turn off default optimization of all scores greater than OPTCUT. Shirt results by 'initn' scores reduces the accuracy of statistical estimates. This was the behavior of fasta1 versions. stat Specify statistical calculation. (-z) Choice 1 0 1 2 3 4 5 6 (defined $random and defined $value and $value > 0) ? " -z 1$value" : ($value ne $vdef) ? " -z $value" : "" ( (( "", " -z " + str(value) ) [ value is not None and value != vdef]) , " -z 1" + str(value) )[value is not None and value > 0 and random is not None ] In general, 1 and 2 are the best methods. random Estimate statistical parameters from shuffled copies of each library sequence (-z) Boolean $stat > 0 stat > 0 0 This doubles the time required for a search, but allows accurate statistics to be estimated for libraries comprised of a single protein family. affichage Report options 1 histogram Turn off histogram display (-H) Boolean 0 ($value)? " -H":"" ("" , " -H" )[ value ] scores Number of similarity scores to be shown (-b) Integer (defined $value and $value <= $high_expect)? " -b $value":"" ("" , " -b " + str(value))[ value is not None and value <= high_expect] Must be <= -E cutoff if -E is given. $value <= $high_expect value <= high_expect alns Number of alignments to be shown (-d) Integer (defined $value and $value <= $high_expect)? " -d $value":"" ("" , " -d " + str(value))[ value is not None and value <= high_expect] Must be <= -E cutoff if -E is given. $value <= $high_expect value <= high_expect html_output HTML output (-m) Boolean 0 ($value)? " -m 6" : "" ( "" , " -m 6" )[ value ] markx Alternate display of matches and mismatches in alignments (-m) Choice not $html_output not html_output 0 0 1 2 3 4 9 10 (defined $value and $value ne $vdef )? " -m $value" : "" ( "" , " -m " + str(value) )[ value is not None and value != vdef] (MARKX) =0,1,2,3,4. Alternate display of matches and mismatches in alignments. MARKX=0 uses ':','.',' ', for identities, conservative replacements, and non-conservative replacements, respectively. MARKX=1 uses ' ','x', and 'X'. MARKX=2 does not show the second sequence, but uses the second alignment line to display matches with a '.' for identity, or with the mismatched residue for mismatches. MARKX=2 is useful for aligning large numbers of similar sequences. MARKX=3 writes out a file of library sequences in FASTA format. MARKX=3 should always be used with the 'SHOWALL' (-a) option, but this does not completely ensure that all of the sequences output will be aligned. MARKX=4 displays a graph of the alignment of the library sequence with respect to the query sequence, so that one can identify the regions of the query sequence that are conserved. init1 Sequences ranked by the z-score based on the init1 score (-1) Boolean 0 ($value)? " -1":"" ("" , " -1")[ value ] z_score_out Show normalize score as (-B) Choice 0 1 0 (defined $value and $value ne $vdef) ? " -B" : "" ( "" , " -B" )[ value is not None and value != vdef] linlen Output line length for sequence alignments (-w) Integer 60 (defined $value and $value != $vdef)? " -w $value":"" ("" , " -w " + str(value))[ value is not None and value != vdef ] Value must be <= 200. $value <= 200 value <= 200 offsets Start numbering the aligned sequences at position x1 x2 (2 numbers separated by comma) (-X) String (defined $value)? " -X \"$value\"":"" ("" , ' -X "' + str(value) + '"')[ value is not None ] Must be 2 numbers separated by comma. $value ~= /\d+(,\d+){1}/ and $value len (value.split(',')) == 2 and value.split(',')[0] != '' and value.split(',')[1] != '' Causes fasta/lfasta/plfasta to start numbering the aligned sequences starting with offset1 and offset2, rather than 1 and 1. This is particularly useful for showing alignments of promoter regions. info Display more information about the library sequence in the alignment (-L) Boolean 0 ($value)? " -L":"" ("" , " -L")[ value ] other_opt Other options 1 filter Lower case filtering (-S) Boolean 0 ($value) ? " -S" : "" ( "" , " -S" )[ value ] Treat lower-case characters in the query or library sequence as 'low-complexity' residues. These characters are treated as 'X' during the initial scan, but are treated as normal residues during the final alignment. Sinces statistical significance is calculated from similarity score calculated during library search, low complexity regions will not produce statistical significant matches. If a significant alignment contains low complexity regions the final score may be higher than the score obtained during the search. outfile Fasta report FastaTextReport Report "fasta.out" "fasta.out" html_outfile Html output file FastaHtmlReport Report $html_output html_output " > fasta.html" " > fasta.html" 100 "fasta.html" "fasta.html" Programs-5.1.1/btwisted.xml0000644000175000001560000001143712072525233014530 0ustar bneronsis btwisted EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net btwisted Calculate the twisting in a B-DNA sequence http://bioweb2.pasteur.fr/docs/EMBOSS/btwisted.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:composition btwisted e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_angledata Dna base pair twist angle data file BasePairTwistAngles AbstractText ("", " -angledata=" + str(value))[value is not None ] 2 e_energydata Dna base pair stacking energies data file BasePairStackingEnergies AbstractText ("", " -energydata=" + str(value))[value is not None ] 3 e_output Output section e_outfile Name of the output file (e_outfile) Filename btwisted.e_outfile ("" , " -outfile=" + str(value))[value is not None] 4 e_outfile_out outfile_out option BtwistedReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/tmhmm.xml0000644000175000001560000002230111533171506014016 0ustar bneronsis tmhmm 2.0 tmhmm prediction of transmembrane helices in proteins. http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?tmhmm http://www.cbs.dtu.dk/services/TMHMM// sequence:protein:2D_structure structure:2D_structure tmhmm String "tmhmm " "tmhmm " sequence Input Sequence Sequence FASTA " $value " " " + str( value ) 50 output_opt output options Choice 0 0 " -noshort " " -noshort " 1 " -noplot " " -noplot " 2 " -short " " -short " 10 v1 Use old model (version 1). Boolean 0 ($value)? " -v1 ": "" ( ""," -v1 ")[ bool( value ) ] 20 results tmhmm report. Report tmhmm "tmhmm.out" "tmhmm.out"

Long output format

The first few lines gives some statistics:

  • Length: the length of the protein sequence.
  • Number of predicted TMHs: The number of predicted transmembrane helices.
  • Exp number of AAs in TMHs: The expected number of amino acids intransmembrane helices. If this number is larger than 18 it is very likely to be a transmembrane protein (OR have a signal peptide).
  • Exp number, first 60 AAs: The expected number of amino acids in transmembrane helices in the first 60 amino acids of the protein. If this number more than a few, you should be warned that a predicted transmembrane helix in the N-term could be a signal peptide.
  • Total prob of N-in: The total probability that the N-term is on the cytoplasmic side of the membrane.
  • POSSIBLE N-term signal sequence: a warning that is produced when "Exp number, first 60 AAs" is larger than 10.

Short output format

In the short output format one line is produced for each protein with no graphics. Each line starts with the sequence identifier and then these fields:

  • "len=": the length of the protein sequence.
  • "ExpAA=": The expected number of amino acids intransmembrane helices (see above).
  • "First60=": The expected number of amino acids in transmembrane helices in the first 60 amino acids of the protein (see above).
  • "PredHel=": The number of predicted transmembrane helices by N-best.
  • "Topology=": The topology predicted by N-best.

For the example above the short output would be (except that it would be on one line):

COX2_BACSU
len=278
ExpAA=68.69
First60=39.89
PredHel=3
Topology=i7-29o44-66i87-109o

The topology is given as the position of the transmembrane helices separated by 'i' if the loop is on the inside or 'o' if it is on the outside. The above example 'i7-29o44-66i87-109o' means that it starts on the inside, has a predicted TMH at position 7 to 29, the outside, then a TMH at position 44-66 etc.

eps Plot of probabilities Binary tmhmm_graphic EPS output_opt == "0" output_opt == "0" "*.eps" "*.eps"

The plot shows the posterior probabilities of inside/outside/TM helix. Here one can see possible weak TM helices that were not predicted, and one can get an idea of the certainty of each segment in the prediction.

At the top of the plot (between 1 and 1.2) the N-best prediction is shown.

The plot is obtained by calculating the total probability that a residue sits in helix, inside, or outside summed over all possible paths through the model. Sometimes it seems like the plot and the prediction are contradictory, but that is because the plot shows probabilities for each residue, whereas the prediction is the over-all most probable structure. Therefore the plot should be seen as a complementary source of information.

png tmhmm graphic. Binary tmhmm_graphic png output_opt == "0" output_opt == "0" "*.png" "*.png"

The plot shows the posterior probabilities of inside/outside/TM helix. Here one can see possible weak TM helices that were not predicted, and one can get an idea of the certainty of each segment in the prediction.

At the top of the plot (between 1 and 1.2) the N-best prediction is shown.

The plot is obtained by calculating the total probability that a residue sits in helix, inside, or outside summed over all possible paths through the model. Sometimes it seems like the plot and the prediction are contradictory, but that is because the plot shows probabilities for each residue, whereas the prediction is the over-all most probable structure. Therefore the plot should be seen as a complementary source of information.

Programs-5.1.1/predator.xml0000644000175000001560000003475712006243340014526 0ustar bneronsis predator 2.1.2 PREDATOR Protein secondary structure prediction from a single sequence or a set of sequences D. Frishman & P. Argos Frishman, D. and Argos, P. (1996) Incorporation of long-distance interactions into a secondary structure prediction algorithm. Protein Engineering, 9, 133-142. Frishman, D. and Argos, P. (1997) 75% accuracy in protein secondary structure prediction. Proteins, 27, 329-335. Frishman,D and Argos,P. (1995) Knowledge-based secondary structure assignment. Proteins: structure, function and genetics, 23, 566-579. Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22: 2577-2637. ftp://ftp.ebi.ac.uk/pub/software/unix/predator/ sequence:protein:2D_structure structure:2D_structure predator inputfile Input file You must enter either a protein sequences or alignment file sequences Protein sequence(s) File Sequence FASTA 1,n defined $sequences and not defined $alignment sequences is not None and alignment is None (defined $value) ? " $value" : "" ("" , " " + str(value) )[ value is not None ] You must fill either sequences or alignment not both not defined $alignment alignment is None 100 alignment Protein Alignment File Alignment FASTA CLUSTAL MSF defined $alignment and not defined $sequences alignment is not None and sequences is None (defined $value) ? " $value" : "" ("" , " " + str(value))[ value is not None ] You must fill either alignment or sequences not both not defined sequences sequences is None 101 prediction Prediction options 1 single Perform single sequence prediction. Ignore other sequences in the set for computing the prediction (-s) Boolean not $all not all 0 ($value) ? " -s" : "" ( "" , " -s" )[ value ] 1 dont_copy Do not copy assignment directly from the PDB database (-u) Boolean 0 ($value) ? " -u" : "" ( "" , " -u" )[ value ] 1 Do not copy assignment directly from the PDB database if query sequence is found in PDB. By default, the known conformation of 7-residue segments will be used if they are identical to a 7-residue fragment in the query sequence. dssp Use DSSP target assignment (-d) Boolean defined $dssp_file dssp_file is not None 0 ($value) ? " -d" : "" ( "" , " -d" )[ value ] 1 Use DSSP target assignment (default is STRIDE). The predictions made with DSSP and STRIDE target assignments are optimized to reproduce these assignments as well as possible. percentid Find a subset of sequences with no more than this identity between any pair of sequences (-n) Float (defined $value) ? " -n$value" : "" ( "" , " -n" + str(value) )[ value is not None ] 1 input Input parameters 1 all Make prediction for All sequences in the input file (-a) Boolean not defined $seqid seqid is None 0 ($value) ? " -a" : "" ( "" , " -a" )[ value ] 1 seqid Make prediction for this sequence (give its id) (-i) String (defined $value) ? " -i$value" : "" ( "" , " -i" + str(value) )[ value is not None ] 1 This option is case sensitive! stride_file STRIDE file (-x) StrideReport Report (defined $value) ? " -x$value" : "" ( "" , " -x" + str(value) )[ value is not None ] 1 dssp_file DSSP file (-y) DsspOutput AbstractText (defined $value)? " -y$value" : "" ( "" , " -y"+str( value ) )[ value is not None] 1 pdb_chain PDB Chain (-z) String defined $dssp_file or defined $stride_file dssp_file is not None or stride_file is not None (defined $value) ? " -z$value" : " -z-" ( " -z-" , " -z" + str(value) )[ value is not None ] 1 output Output parameters 1 long Long output form (-l) Boolean 0 ($value) ? " -l" : "" ( "" , " -l" )[ value ] 1 Every output line contains residue number, three-letter residue name, one-letter residue name, predicted secondary structural state and reliability estimate. If a STRIDE or DSSP secondary structure assignment has been read (see other options), the known assignment will also be shown in the output for comparison. By default the short output form is used. other_info Output other additional information if available (-h) Boolean 0 ($value) ? " -h" : "" ( "" , " -h" )[ value ] 1 predator_output Text Short output form: Secondary structure states of amino acids are indicated by the letters "H" (helix), "E" (extended or sheet), and "_" (coil). Long output form ( option -L selected ): Secondary structure states of amino acids are indicated by letters "H" or "h" (helix), "E" or "e" (extended), and "C" or "c" (coil). The prediction is shown in lower case except for those residues for which the assignment was directly copied from the PDB database. This feature is added so that you can distinguish between the predictions actually made by PREDATOR and those taken from known structures. The prediction is contained in the records beginning with the identifier PRED in the first columns. For each amino acid site of your sequence, residue number, three- and one-letter residue code, prediction, reliability estimate, and the number of residues from related sequences projected onto this residue through the local alignment procedure are shown in subsequent columns. Additionally, if the STRIDE or DSSP assignments have been read using the options -x or -y (and -z), the last column of the PREDATOR output will contain the actual secondary structural assignment for your sequence if it corresponds exactly to the one in the STRIDE or DSSP file (for comparison). If the known assignment is not available, i.e., if you did not use the -x or -y options, question signs will be output. Both output forms: If option -h has been used, PREDATOR will show progress by printing dots on the standard output. If your sequence has related sequences with known 3D structure, PDB identifiers of these sequences will be printed. 'predator.out' Programs-5.1.1/listor.xml0000644000175000001560000001505412072525233014216 0ustar bneronsis listor EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net listor Write a list file of the logical OR of two sets of sequences http://bioweb2.pasteur.fr/docs/EMBOSS/listor.html http://emboss.sourceforge.net/docs/themes sequence:edit listor e_input Input section e_firstsequences firstsequences option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 2,n ("", " -firstsequences=" + str(value))[value is not None] 1 e_secondsequences secondsequences option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 2,n ("", " -secondsequences=" + str(value))[value is not None] 2 e_additional Additional section e_operator Logical operator to combine sequence lists Choice mobyle_null mobyle_null O A X N ("", " -operator=" + str(value))[value is not None and value!=vdef] 3 The following logical operators combine the sequences in the following ways: OR - gives all that occur in one set or the other AND - gives only those which occur in both sets XOR - gives those which only occur in one set or the other, but not in both NOT - gives those which occur in the first set except for those that also occur in the second e_output Output section e_outfile Name of the output file (e_outfile) Filename outfile.list ("" , " -outfile=" + str(value))[value is not None] 4 The list of sequence names will be written to this list file e_outfile_out outfile_out option UsaList AbstractText e_outfile auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/mfold.xml0000644000175000001560000010361311767572177014025 0ustar bneronsis mfold 3.2 MFOLD Prediction of RNA secondary structure M. Zuker M. Zuker, D.H. Mathews and D.H. Turner Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide in RNA Biochemistry and Biotechnology, J. Barciszewski and B.F.C. Clark, eds., NATO ASI Series, Kluwer Academic Publishers, (1999) http://bioweb2.pasteur.fr/docs/mfold/index.html http://www.bioinfo.rpi.edu/applications/mfold/ http://www.bioinfo.rpi.edu/~zukerm/export/ sequence:nucleic:2D_structure structure:2D_structure mfold String "mfold" "mfold" 0 SEQ Sequence File (SEQ) DNA Sequence IG GENBANK EMBL " SEQ=$value" " SEQ=" + str(value) 1 SEQ : The sequence file may contain multiple sequences. At present, the mfold script will fold the first sequence by default. NA RNA or DNA (NA) Choice RNA RNA DNA (defined $value and $value ne $vdef) ? " NA=$value" : "" ("" , " NA=" + str(value))[ value is not None and value != vdef] 2 LC Sequence type (LC) Choice linear linear circular (defined $value and $value ne $vdef) ? " LC=$value" : "" ("" , " LC=" + str(value))[ value is not None and value != vdef] 2 It indicates whether a linear or circular nucleic acid is being folded. control Control options 3 T Temperature (T) Integer 37 (defined $value and $value != $vdef) ? " T=$value" : "" ("" , " T=" + str(value))[ value is not None and value != vdef] Enter a value between 0 and 100 $value <= 100 and $value >= 0 value <= 100 and value >= 0 3 P Percent (P) Integer 5 (defined $value and $value != $vdef) ? " P=$value" : "" ("", " P="+str(value))[value is not None and value != vdef] 3 This is the percent suboptimality for computing the energy dot plot and suboptimal foldings. The default value is 5%. This parameter controls the value of the free energy increment, delta (deltaG). Delta of deltaG is set to P% of deltaG, computed minimum free energy. The energy dot plot shows only those base pairs that are in foldings with free energy minus or equal to deltaG plus delta (deltaG). Similarly, the free energies of computed foldings are in the range from deltaG to deltaG plus delta (deltaG). No matter the value of P, mfold currently keeps delta (deltaG) in the range [1,12] (kcal/mole). NA_CONC Na+ molar concentration (NA_CONC) Float 1.0 (defined $value and $value != $vdef) ? " NA_CONC=$value" : "" ("" , " NA_CONC=" + str(value))[ value is not None and value != vdef] 3 MG_CONC Mg++ molar concentration (MG_CONC) Float 0.0 (defined $value and $value != $vdef) ? " MG_CONC=$value" : "" ("" , " MG_CONC=" + str(value))[ value is not None and value != vdef] 3 W Window parameter (W) Integer (defined $value) ? " W=$value" : "" ("" , " W=" + str(value))[ value is not None ] 3 This is the window parameter that controls the number of foldings that are automatically computed by mfold . `W' may be thought of as a distance parameter. The distance between 2 base pairs, i.j and i'.j' may be defined as max{|i-i'|,|j-j'|}. Then if k-1 foldings have already been predicted by mfold , the kth folding will have at least W base pairs that are at least a distance W from any of the base pairs in the first k-1 foldings. As W increases, the number of predicted foldings decreases. If W is not specified, mfold selects a value by default based on sequence length. MAXBP Max base pair distance (MAXBP) Integer (defined $value) ? " MAXBP=$value" : "" ("" , " MAXBP=" + str(value))[ value is not None ] 3 A base pair i.j will not be allowed to form (in linear RNA) if j-i > MAXBP. For circular RNA, a base pair i.j cannot form if min{j-i,n+i-j} > MAXBP . Thus small values of MAXBP ensure that only short range base pairs will be predicted. By default, MAXBP=+infinity, indicating no constraint. MAX_LP Maximum bulge/interior loop size (MAX_LP) Integer 30 (defined $value and $value != $vdef) ? " MAX_LP=$value" : "" ("" , " MAX_LP=" + str(value))[ value is not None and value != vdef] 3 MAX_AS Maximum asymmetry of a bulge/interior loop (MAX_AS) Integer 30 (defined $value and $value != $vdef) ? " MAX_AS=$value" : "" ("" , " MAX_AS=" + str(value))[ value is not None and value != vdef] 3 MAX Maximum number of foldings to be computed (MAX) Integer 50 (defined $value and $value != $vdef) ? " MAX=$value" : "" ("" , " MAX=" + str(value))[ value is not None and value != vdef] 3 MAX : This is the maximum number of foldings that mfold will compute (50 by default). It is better to limit the number of foldings by careful selection of the P and W parameters. ANN Structure annotation type (ANN) Choice none none p-num ss-count (defined $value and $value ne $vdef) ? " ANN=$value" : "" ("" , " ANN=" + str(value))[ value is not None and value != vdef] 2 This parameter currently takes on 3 values. - `none' : secondary structures are drawn without any special annotation. Letters or outline are in black, while base pairs are red lines or dots for GC pairs and blue lines or dots for AU and GU pairs. - `p-num' : Colored dots, colored base characters or a combination are used to display in each folding how well-determined each base is according to the P-num values in the `fold_name.ann' file. - `ss-count' : Colored dots, colored base characters or a combination are used to display in each folding how likely a base is to be single-stranded according to sample statistics stored in the `fold_name.ss-count' file. MODE Structure display mode (MODE) Choice auto auto bases lines (defined $value and $value ne $vdef) ? " MODE=$value" : "" ("" , " MODE=" + str(value))[ value is not None and value != vdef] 2 ROT_ANG Structure rotation angle (ROT_ANG) Integer 0 (defined $value and $value != $vdef) ? " ROT_ANG=$value" : "" ("" , " ROT_ANG=" + str(value))[ value is not None and value != vdef] 3 START 5' base number (START) Integer 1 (defined $value and $value != $vdef) ? " START=$value" : "" ("" , " START=" + str(value))[ value is not None and value != vdef] 3 STOP 3' base number (STOP) Integer (defined $value) ? " STOP=$value" : "" ("" , " STOP=" + str(value))[ value is not None ] 3 AUX Constraints File (AUX) MfoldFoldingConstraints AbstractText (defined $value) ? " AUX=$value" : "" ( "" , " AUX=" + str(value) )[ value is not None] 3 AUX : (optional) This is the name of an auxiliary input file of folding constraints. If this parameter is not used, mfold looks for a file named `fold_name.aux'. If this file exists and is not empty, then it is interpreted as a constraint file. Thus constraints may be used without the use of this command line parameter. Fill the box or the file with constraints (1 constraint per line) You may: 1. force bases i,i+1,...,i+k-1 to be double stranded by entering: F i 0 k 2. force consecutive base pairs i.j,i+1.j-1, ...,i+k-1.j-k+1 by entering: F i j k 3. force bases i,i+1,...,i+k-1 to be single stranded by entering: P i 0 k 4. prohibit the consecutive base pairs i.j,i+1.j-1, ...,i+k-1.j-k+1 by entering: P i j k 5. prohibit bases i to j from pairing with bases k to l by entering: P i-j k-l runtype Output options 3 txt_format Text output file Boolean 1 "" "" 4 txt_out Text $txt_format txt_format "*.out" "*.out" det_format Detailed output file Boolean 0 "" "" 4 det_out_html Detailed outfile MfoldDetailHtmlReport Report $det_format and $html_format det_format and html_format "*.det.html" "*.det.html" det_out Detailed outfile Text $det_format and not $html_format det_format and not html_format "*.det" "*.det" html_format Html output file Boolean 0 ($value) ? " RUN_TYPE=html" : "" ("" , " RUN_TYPE=html")[ value ] 4 out_html Html output file MfoldHtmlReport Report $html_format html_format "*.html" "*.html" rnaml_format RNAML output file Boolean 0 The RNAML (RNA Markup Language) was developed by a consortium of investigators and is a proposed syntax for RNA information files. A description was published in 2002: A. Waugh, P. Gendron, R. Altman, J.W. Brown, D. Case, D. Gautheret, S.C. Harvey, N. Leontis, J. Westbrook, E. Westhof, M. Zuker, and F. Major RNAML: A standard syntax for exchanging RNA information. RNA 8 (6), 707-717, (2002) out_rnaml RNAML output file RNA 2DStructure AbstractText RNAML $rnaml_format rnaml_format "*.rnaml" "*.rnaml" energy_param Energy Dot plot 3 plot_format Energy dot plot output file Boolean 0 This is a text file that contains all the base pairs on the energy dot plot , organized into helices for which Delta G(i,j) is constant. The first record is a header, and each subsequent record describes a single helix. The records are usually sorted by Delta G(i,j) and are often filtered so that short helices or isolated base pairs (helices of length 1) in suboptimal foldings are removed. out_plot Energy dot plot output file Text $plot_format plot_format "*.plot" "*.plot" ann_format Structure annotation output file Boolean 0 ann_plot Structure annotation output file Text $ann_format ann_format "*.ann" "*.ann" hnum_format Helix num output file Boolean 0 This file is the same as plot file, except that the energy column is replaced by an h-num column. These files are usually sorted by h-num; lowest to highest, or best determined to worst determined. Often, only helices in optimal foldings are retained. hnum_plot Helix num output file Text $hnum_format hnum_format "*.h-num" "*.h-num" structure_format Structure file format 3 gif_format GIF output file Boolean 0 A graphics file that ends with the suffix .gif should be displayed directly on the page of your web browser. out_gif GIF output file Picture Binary $gif_format gif_format *.gif "*.gif" pdf_format PDF output file Boolean 0 out_pdf PDF output file Pdf Binary $pdf_format pdf_format "*.pdf" "*.pdf" ps_format Postscript output file Boolean 0 PostScript is a programming language that is used to describe output for printing and display. It was developed by Adobe Systems. It is common to have PostScript capability on most printers. Programs such as Ghostscript, Ghostview and GSview can be used to display PostScript files(http://www.cs.wisc.edu/~ghost/"). out_ps Postscript output file PostScript Binary $ps_format ps_format "*.ps" "*.ps" ct_format CT output file Boolean 0 The '.ct file' contains the nucleic acid sequence and base pairing information from which a structure plot may be computed. The mfold software and mfold web servers use the "sir_graph_ng" program to create Postscript, jpg and png images from .ct files. "sir_graph_ng" is part of the mfold_util package that may be obtained here: http://www.bioinfo.rpi.edu/~zukerm/export/. out_ct CT output file 2DStructure AbstractText CT $ct_format ct_format "*.ct" "*.ct" ss_format XRNA_ss output file Boolean 0 This is an input file for the XRNA program by Bryn Weiser and Harry Noller. The new Java version is available from UC Santa Cruz RNA Center web site: href="http://rna.ucsc.edu/rnacenter/xrna/xrna.html". The ss file can be regarded as an expanded ct file. It contains base and connect information as well a coordinates for plotting the bases. out_ss XRNA_ss output file 2DStructure AbstractText SS $ss_format ss_format "*.ss" "*.ss-count" "*.ss" "*.ss-count" Programs-5.1.1/bambe.xml0000644000175000001560000014651711767572177014004 0ustar bneronsis bambe 4.01 BAMBE Bayesian Analysis in Molecular Biology and Evolution Simon, Larget Larget, B. and D. Simon (1999). Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Molecular Biology and Evolution 16:750-759. Simon, D. and B. Larget. 1998. Bayesian analysis in molecular biology and evolution (BAMBE), version 1.01 beta. Department of Mathematics and Computer Science, Duquesne University. http://www.stat.wisc.edu/~larget/ http://www.stat.wisc.edu/~larget/ phylogeny:bayesian bambe String "bambe <bambe.params" "bambe <bambe.params" 0 data_file Alignment file (data-file) DNA Alignment CLUSTAL "data-file=$value\\n" "data-file="+str(value)+"\n" 6 bambe.params run_options Run characteristics 1 seed Seed for random number generator (seed) Integer 194024933 (defined $value and $value != $vdef) ? "seed=$value\\n" : "" ("", "seed="+str(value)+"\n")[value is not None and value!=vdef] bambe.params cycles Number of cycles to run the main algorithm (cycles) Integer 1000 (defined $value and $value != $vdef) ?"cycles=$value\\n" : "" ("", "cycles="+str(value)+"\n")[value is not None and value!=vdef] bambe.params window_interval Number of cycles between printing trees to output (window-interval) Integer 200 (defined $value and $value != $vdef) ? "window-interval=$value\\n" : "" ("", "window-interval="+str(value)+"\n")[value is not None and value!=vdef] Also used for updating the window size during burn-in. bambe.params main_algorithm Algorithm to run during production cycles (main-algorithm) Choice local global local (defined $value and $value ne $vdef) ? "main-algorithm=$value\\n" : "" ("", "main-algorithm="+str(value)+"\n")[value is not None and value != vdef] bambe.params burn Number of cycles to run the burn algorithm (burn) Integer 1000 (defined $value and $value != $vdef) ? "burn=$value\\n" : "" ("", "burn="+str(value)+"\n")[value is not None and value!=vdef] Parameter values are not updated during burn. User should discard these cycles and the initial cycles of the main algorithm before inference. bambe.params burn_algorithm Algorithm to run during burn (burn-algorithm) Choice global global local (defined $value and $value ne $vdef) ? "burn-algorithm=$value\\n" : "" ("","burn-algorithm="+str(value)+"\n")[value is not None and value!=vdef] bambe.params use_beta Use scaled beta distribution modification of the local algorithm (use-beta) Boolean $main_algorithm eq "local" or $burn_algorithm eq "local" main_algorithm == "local" or burn_algorithm == "local" 0 ($value) ? "use-beta=true\\n" : "" ("", "use-beta=true\n")[ value ] bambe.params model_options Model specification 2 molecular_clock Use a molecular clock (molecular-clock) Boolean 1 ($value) ? "" :"molecular-clock=false\\n" ("molecular-clock=false\n", "")[ value ] bambe.params likelihood_model Likelihood model (likelihood-model) Choice HKY85 HKY85 F84 TN93 GREV (defined $value and $value ne $vdef) ? "likelihood-model=$value\\n" : "" ("", "likelihood-model="+str(value)+"\n")[value is not None and value!=vdef] bambe.params category_list A valid category list (category-list) String (defined $value) ? "category-list=$value\\n" :"" ("", "category-list="+str(value)+"\n" )[value is not None] Each category has its own set of parameters. Each category is denoted by a positive integer between 1 and 10. A comma-separated list gives the categories of the sites in order, e.g., 1,2,3,1,3 means that the first site is in category 1, the second in 2, the third in 3, the fourth in 1, and the fifth site is in category 3. A repeat count is indicated by a caret (^). For example, 1^20,2^5,3^2 means that the first twenty sites are in category 1, the next five sites are in 2, and the next two sites are in category 3. Parentheses may be used to group sites together with a common repeat count, i.e., (1,2)^5 is the same as 1,2,1,2,1,2,1,2,1,2. Repeat counts may be nested, e.g., (1^3,2)^2 is the same as 1,1,1,2,1,1,1,2. Repetition to the end of the list of sites is indicated by an asterisk (*). For example, 1^5,2* means that the first five sites are in category 1, and all the remaining sites are in category 2. Parentheses may also be used in conjunction with the asterisk, e.g., (1,2)* is the same as 1,2,1,2,1,2,.... The category list may contain at most one asterisk and it must be associated with the last category or group in the list. In other words, an asterisk may appear only at the end of the list. Examples 1* - all sites are the same category. (default) (1,2,3)* - all sites are partitioned by codon position. 1^99,2^50,3^9 - the sites are divided over three genes. Each gene has its own set of parameters used by all sites in that gene. The first gene is composed of the first ninety-nine sites, the next by the next fifty sites, and the last by nine sites. bambe.params single_kappa Single kappa (single-kappa) Boolean 0 ($value) ? "single-kappa=true\\n" : "" ("", "single-kappa=true\n")[ value ] If true, the same kappa parameter is used for all site categories. If false, there are different values for different site categories. It has no effect if there is only one rate category. bambe.params initial_kappa Comma separated list of positive kappa values for each site category (initial-kappa) String $likelihood_model eq "HKY85" or $likelihood_model eq "F84" likelihood_model=="HKY85" or likelihood_model=="F84" 7.5,2.5,10.75 (defined $value and $value ne $vdef) ? "initial-kappa=$value\\n" : "" ("", "initial-kappa="+str(value)+"\n")[value is not None and value!=vdef] If single-kappa is true, a warning is given if more than one value is specified. The first value will be used. If single-kappa is false, a value must be specified for each category in use. bambe.params initial_theta Comma separated list of positive theta values for each site category (initial-theta) String 1.4,1.0,8.3 (defined $value and $value ne $vdef) ? "initial-theta=$value\\n" : "" ("", "initial-theta="+str(value)+"\n")[value is not None and value!=vdef] The weighted average of these values should be 1, with weights given by the proportion of sites in each site category. (Renormalization is automatic and a warning given if the condition fails.) If there are an equal number of sites in each category, for example, the numbers should average to 1. bambe.params estimate_pi Use empirical relative frequencies (estimate-pi) Boolean 1 ($value) ? "" : "estimate-pi=false\\n" ("estimate-pi=false\n","")[ value ] If true, the initial stationary probabilities for each base in each category are estimated by the relative frequencies with which they appear in the data. bambe.params initial_pia Comma separated list of initial pi value of base A (initial-pia) String not $estimate_pi not estimate_pi 0.25,0.25,0.25 (defined $value and $value ne $vdef) ? "initial-pia=$value\\n" : "" ("", "initial-pia="+str(value)+"\n")[value is not None and value!=vdef] initial_pig Comma separated list of initial pi value of base G (initial-pig) String not $estimate_pi not estimate_pi 0.25,0.25,0.25 (defined $value and $value ne $vdef) ? "initial-pig=$value\\n" : "" ("", "initial-pig="+str(value)+"\n")[value is not None and value!=vdef] initial_pic Comma separated list of initial pi value of base C (initial-pic) String not $estimate_pi not estimate_pi 0.25,0.25,0.25 (defined $value and $value ne $vdef) ? "initial-pic=$value\\n" : "" ("", "initial-pic="+str(value)+"\n")[value is not None and value!=vdef] initial_pit Comma separated list of initial pi value of base T (initial-pit) String not $estimate_pi not estimate_pi 0.25,0.25,0.25 (defined $value and $value ne $vdef) ? "initial-pit=$value\\n" : "" ("","initial-pit="+str(value)+"\n")[value is not None and value!=vdef] initial_ttp Comma separated list of positive transition/transversion parameter values (TN93 model) (initial-ttp) String $likelihood_model eq "TN93" likelihood_model == "TN93" 1.0,1.0,1.0 (defined $value and $value ne $vdef) ? "initial-ttp=$value\\n" : "" ("", "initial-ttp="+str(value)+"\n")[value is not None and value!=vdef] This is used only with TN93. There must be a value specified for each site-category used if TN93 is the chosen model. bambe.params initial_gamma Comma separated list of positive gamma values (TN93 model) (initial-gamma) String $likelihood_model eq "TN93" likelihood_model == "TN93" 1.0,1.0,1.0 (defined $value and $value ne $vdef) ? "initial-gamma=$value\\n" : "" ("", "initial-gamma="+str(value)+"\n")[value is not None and value!=vdef] This is used only with TN93. There must be a value specified for each site-category used if TN93 is the chosen model. bambe.params initial_Rac Comma separated list of positive r values for AC bases(GREV model) (initial-rac) String $likelihood_model eq "GREV" likelihood_model == "GREV" 1.0,1.0,1.0 (defined $value and $value ne $vdef) ? "initial-rac=$value\\n" : "" ("", "initial-rac="+str(value)+"\n")[value is not None and value!=vdef] This is used only with GREV model. bambe.params initial_Rag Comma separated list of positive r values for AG (GREV model) (initial-rag) String $likelihood_model eq "GREV" likelihood_model == "GREV" 1.0,1.0,1.0 (defined $value and $value ne $vdef) ? "initial-rag=$value\\n" : "" ("", "initial-rag="+str(value)+"\n")[value is not None and value!=vdef] This is used only with GREV model. bambe.params initial_Rat Comma separated list of positive r values for AT (GREV model) (initial-rat) String $likelihood_model eq "GREV" likelihood_model == "GREV" 1.0,1.0,1.0 (defined $value and $value ne $vdef) ? "initial-rat=$value\\n" : "" ("", "initial-rat="+str(value)+"\n")[value is not None and value!=vdef] This is used only with GREV model. bambe.params initial_Rcg Comma separated list of positive r values for CG (GREV model) (initial-rcg) String $likelihood_model eq "GREV" likelihood_model == "GREV" 1.0,1.0,1.0 (defined $value and $value ne $vdef) ? "initial-rcg=$value\\n" : "" ("", "initial-rcg="+str(value)+"\n")[value is not None and value!=vdef] This is used only with GREV model. bambe.params initial_Rct Comma separated list of positive r values for CT (GREV model) (initial-rct) String $likelihood_model eq "GREV" likelihood_model == "GREV" 1.0,1.0,1.0 (defined $value and $value ne $vdef) ? "initial-rct=$value\\n" : "" ("", "initial-rct="+str(value)+"\n")[value is not None and value!=vdef] This is used only with GREV model. bambe.params initial_Rgt Comma separated list of positive r values for GT (GREV model) (initial-rct) String $likelihood_model eq "GREV" likelihood_model == "GREV" 1.0,1.0,1.0 (defined $value and $value ne $vdef) ? "initial-rgt=$value\\n" : "" ("", "initial-rgt="+str(value)+"\n")[value is not None and value!=vdef] This is used only with GREV model. bambe.params param_update Parameter updating 4 parameter_update_interval Parameter update interval (parameter-update-interval) Integer 1 (defined $value and $value != $vdef) ? "parameter-update-interval=$value\\n" : "" ("", "parameter-update-interval="+str(value)+"\n")[value is not None and value!=vdef] During the main algorithm, any 'on' parameters are updated at each cycle divisible by this value. Use zero for no parameter updating. bambe.params update_kappa Update kappa value (update-kappa) Boolean $likelihood_model eq "HKY85" or $likelihood_model eq "F84" likelihood_model=="HKY85" or likelihood_model=="F84" 1 ($value) ? "" : "update-kappa=false\\n" ("update-kappa=false\n", "")[ value ] bambe.params update_theta Update theta value (update-theta) Boolean 1 ($value) ? "" : "update-theta=false\\n" ("update-theta=false\n", "")[ value ] bambe.params update_pi Update pi value (update-pi) Boolean 1 ($value) ? "" : "update-pi=false\\n" ("update-pi=false\n", "")[ value ] bambe.params update_ttp Update ttp value (TN93 model) (update-ttp) Boolean $likelihood_model eq "TN93" likelihood_model == "TN93" 1 ($value) ? "" : "update-ttp=false\\n" ("update-ttp=false\n", "")[ value ] bambe.params update_gamma Update gamma value for (TN93 model) (update-gamma) Boolean $likelihood_model eq "TN93" likelihood_model == "TN93" 1 ($value) ? "" : "update-gamma=false\\n" ("update-gamma=false\n", "")[ value ] bambe.params update_grev Update grev (GREV model) (update-grev) Boolean $likelihood_model eq "GREV" likelihood_model == "GREV" 1 ($value) ? "" : "update-grev=false\\n" ("update-grev=false\n", "")[ value ] bambe.params update_invariant_prob Update invariant probability (update-invariant-prob) Boolean 0 ($value) ? "update-invariant-prob=true\\n" : "" ("", "update-invariant-prob=true\n")[ value ] bambe.params local_tune Stretch parameter for local (local-tune) Float $burn_algorithm eq "local" or $main_algorithm eq "local" burn_algorithm == "local" or main_algorithm == "local" 0.19 (defined $value and $value != $vdef) ? "local-tune=$value\\n" : "" ("", "local-tune="+str(value)+"\n")[value is not None and value!=vdef] This tuning parameter is only used with the local algorithm. It modulates the size of a maximal stretch. The smaller the value, the greater the tree acceptance rate will be. bambe.params theta_tune Dirichlet parameter for theta update (theta-tune) Float $parameter_update_interval != 0 and $update_theta parameter_update_interval != 0 and update_theta 2000.0 (defined $value and $value != $vdef) ? "theta-tune=$value\\n" : "" ("","theta-tune="+str(value)+"\n")[value is not None and value!=vdef] Tuning parameter used during update of theta value(s). The larger its value, the more likely proposals are to be accepted. bambe.params pi_tune Dirichlet parameter for pi update (pi-tune) Float $parameter_update_interval != 0 and $update_pi parameter_update_interval != 0 and update_pi 2000.0 (defined $value and $value != $vdef) ? "pi-tune=$value\\n" : "" ("", "pi-tune="+str(value)+"\n")[value is not None and value!=vdef] Tuning parameter used during update of pi values. The larger its value, the more likely proposals are to be accepted. bambe.params kappa_tune Halft the size of the window for uniform updates of kappa (kappa-tune) Float $parameter_update_interval != 0 and $update_kappa and ($likelihood_model eq "HKY85" or $likelihood_model eq "F84") parameter_update_interval != 0 and update_kappa and (likelihood_model == "HKY85" or likelihood_model == "F84") 0.2 (defined $value and $value != $vdef) ? "kappa-tune=$value\\n" : "" ("","kappa-tune="+str(value)+"\n")[value is not None and value!=vdef] This tuning parameter is only used when 'parameter-update-interval' is positive and 'update-kappa' is true. The smaller its value, the greater the parameter acceptance rate will be. bambe.params ttp_tune Halft window width for ttp update (TN93 model) (ttp-tune) Float $parameter_update_interval != 0 and $update_ttp and $likelihood_model eq "TN93" parameter_update_interval != 0 and update_ttp and likelihood_model == "TN93" 0.1 (defined $value and $value != $vdef) ? "ttp-tune=$value\\n" : "" ("","ttp-tune="+str(value)+"\n")[value is not None and value!=vdef] This tuning parameter is only used when 'parameter-update-interval' is positive and 'update-ttp' is true. The smaller its value, the greater the parameter acceptance rate will be. bambe.params gamma_tune Half window width for gamma update (TN93 model) (gamma-tune) Float $parameter_update_interval != 0 and $update_gamma and $likelihood_model eq "TN93" parameter_update_interval != 0 and update_gamma and likelihood_model == "TN93" 0.1 (defined $value and $value != $vdef) ? "gamma-tune=$value\\n" : "" ("","gamma-tune="+str(value)+"\n")[value is not None and value!=vdef] This tuning parameter is only used when 'parameter-update-interval' is positive and 'update-gamma' is true. The smaller its value, the greater the parameter acceptance rate will be. bambe.params grev_tune Halft window width for grev update (grev-tune) Float $parameter_update_interval != 0 and $update_grev and $likelihood_model eq "GREV" parameter_update_interval != 0 and update_grev and likelihood_model=="GREV" 2000 (defined $value and $value != $vdef) ? "grev-tune=$value\\n" : "" ("","grev-tune="+str(value)+"\n")[value is not None and value!=vdef] bambe.params beta_tune Beta parameter for local update (beta-tune) Float $use_beta use_beta 10.0 (defined $value and $value != $vdef) ? "beta-tune=$value\\n" : "" ("","beta-tune="+str(value)+"\n")[value is not None and value!=vdef] bambe.params invariant_prob_tune Halft window width for invariant probability update (invariant-prob-tune) Float $parameter_update_interval != 0 and $update_invariant_prob parameter_update_interval != 0 and update_invariant_prob 2000 (defined $value and $value != $vdef) ? "invariant-prob-tune=$value\\n" : "" ("","invariant-prob-tune="+str(value)+"\n")[value is not None and value!=vdef] bambe.params output_options Output options 2 sample_interval Sample interval (sample-interval) Integer 200 (defined $value and $value != $vdef) ? "sample-interval=$value\\n" : "" ("","sample-interval="+str(value)+"\n")[value is not None and value!=vdef] During burn and main algorithms, the tree topology, log likelihoods, and parameters are written to files at each cycle divisible by this value. bambe.params newick_format Newick format of tree file (newick-format) Boolean 1 ($value) ? "newick-format=false\\n" : "" ("newick-format=false\n", "")[ value ] Indicates the format of the tree to read (if not random) and the format of the tree to print after the run. bambe.params results_files Results files Text "bambe_results.lpd" and "bambe_results.par" and "bambe_results.out" "bambe_results.lpd" "bambe_results.par" "bambe_results.out" result_tree Tree file Tree NEWICK BAMBE "bambe_results.tre" "bambe_results.tre" top_file Topology file Text "bambe_results.top" "bambe_results.top" file_root String "file-root=bambe_results\\n" "file-root=bambe_results\n" 6 bambe.params input_options Input options 7 outgroup Number of the outgroup (outgroup) Integer $molecular_clock molecular_clock 1 (defined $value and $value != $vdef) ? "outgroup=$value\\n" : "" ("", "outgroup="+str(value)+"\n")[value is not None and value!=vdef] This is ignored if a molecular clock is assumed. In the absence of a clock, trees and tree topologies are printed with the outgroup emerging directly from the root. bambe.params tree_file Tree file (tree-file) Tree NEWICK BAMBE $initial_tree_type eq "bambe" or $initial_tree_type eq "newick" initial_tree_type=="bambe" or initial_tree_type=="newick" (defined $value) ? "tree-file=$value\\n" : "" ("", "tree-file="+str(value)+"\n")[value is not None] If no tree file is given, the program generates a random tree from a flat distribution where each labeled history is equally likely. bambe.params initial_tree_type Initial tree type (initial-tree-type) Choice random random upgma neighbor-joining newick bambe (defined $value and $value ne $vdef) ? "initial-tree-type=$value\\n" : "" ("", "initial-tree-type="+str(value)+"\n")[value is not None and value != vdef] . random select a tree from the prior . upgma sets the initial clock tree to the UPGMA tree using maximum likelihood distances with the specified model and initial parameter values. . neighbor-joining sets the initial nonclock tree to the neigbor joining tree using maximum likelihood distances with the specified model and initial parameter values. . newick reads in an initial tree in Newick format from a file. . bambe reads in an initial tree in BAMBE format from a file. bambe.params print_all_trees Print all trees?(print-all-trees) Boolean 1 ($value) ? "" : "print-all-trees=false\\n" ("print-all-trees=false\n", "")[value ] bambe.params max_initial_tree_height Initial tree height used to generate an initial random tree (max-initial-tree-height) Float 0.1 (defined $value and $value != $vdef) ? "max-initial-tree-height=$value\\n" : "" ("", "max-initial-tree-height="+str(value)+"\n")[value is not None and value!=vdef] This parameter is only used to generate an initial random tree. bambe.params Programs-5.1.1/freak.xml0000644000175000001560000002463212072525233013774 0ustar bneronsis freak EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net freak Generate residue/base frequency table or plot http://bioweb2.pasteur.fr/docs/EMBOSS/freak.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:composition sequence:protein:composition freak e_input Input section e_seqall seqall option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -seqall=" + str(value))[value is not None] 1 e_required Required section e_letters Residue letters String gc ("", " -letters=" + str(value))[value is not None and value!=vdef] 2 e_additional Additional section e_step Stepping value Integer 1 ("", " -step=" + str(value))[value is not None and value!=vdef] 3 e_window Averaging window Integer 30 ("", " -window=" + str(value))[value is not None and value!=vdef] 4 e_output Output section e_plot Produce graphic Boolean 0 ("", " -plot")[ bool(value) ] 5 e_graph Choose the e_graph output format Choice e_plot png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 6 xy_goutfile Name of the output graph Filename e_plot freak_xygraph ("" , " -goutfile=" + str(value))[value is not None] 7 xy_outgraph_png Graph file Picture Binary e_plot and e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_plot and e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_plot and e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_plot and e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_plot and e_graph == "data" "*.dat" e_outfile Name of the output file (e_outfile) Filename not e_plot freak.e_outfile ("" , " -outfile=" + str(value))[value is not None] 8 e_outfile_out outfile_out option FreakReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 9 Programs-5.1.1/findkm.xml0000644000175000001560000002021411672346320014147 0ustar bneronsis findkm EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net findkm Calculate and plot enzyme reaction data http://bioweb2.pasteur.fr/docs/EMBOSS/findkm.html http://emboss.sourceforge.net/docs/themes sequence:enzyme:kinetics findkm e_input Input section e_infile Enzyme kinetics data (application-specific) file EnzymeData AbstractText ("", " -infile=" + str(value))[value is not None] 1 e_advanced Advanced section e_plot S/v vs s Boolean 1 (" -noplot", "")[ bool(value) ] 2 e_output Output section e_outfile Name of the output file (e_outfile) Filename findkm.e_outfile ("" , " -outfile=" + str(value))[value is not None] 3 e_outfile_out outfile_out option FindkmReport Report e_outfile e_graphlb Choose the e_graphlb output format Choice png png gif cps ps meta data (" -graphlb=" + str(vdef), " -graphlb=" + str(value))[value is not None and value!=vdef] 4 xy_goutfile Name of the output graph Filename findkm_xygraph ("" , " -goutfile=" + str(value))[value is not None] 5 xy_outgraph_png Graph file Picture Binary e_graphlb == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_graphlb == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_graphlb == "ps" or e_graphlb == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_graphlb == "meta" "*.meta" xy_outgraph_data Graph file Text e_graphlb == "data" "*.dat" auto Turn off any prompting String " -auto -stdout" 6 Programs-5.1.1/wordcount.xml0000644000175000001560000001304012072525233014717 0ustar bneronsis wordcount EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net wordcount Count and extract unique words in molecular sequence(s) http://bioweb2.pasteur.fr/docs/EMBOSS/wordcount.html http://emboss.sourceforge.net/docs/themes nucleic:composition protein:composition wordcount e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_wordsize Word size (value greater than or equal to 1) Integer ("", " -wordsize=" + str(value))[value is not None] Value greater than or equal to 1 is required value >= 1 2 e_additional Additional section e_mincount Minimum word count to report (value greater than or equal to 1) Integer 1 ("", " -mincount=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 3 e_output Output section e_outfile Name of the output file (e_outfile) Filename wordcount.e_outfile ("" , " -outfile=" + str(value))[value is not None] 4 e_outfile_out outfile_out option WordcountReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/showorf.xml0000644000175000001560000002325412072525233014372 0ustar bneronsis showorf EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net showorf Display a nucleotide sequence and translation in pretty format http://bioweb2.pasteur.fr/docs/EMBOSS/showorf.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:gene_finding sequence:nucleic:translation showorf e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_frames Select frames to translate (value from 0 to 6) String 1,2,3,4,5,6 ("", " -frames=" + str(value))[value is not None and value!=vdef] 2 "0: None, 1: F1,2: F2,3: F3,4: R1,5: R2,6: R3" e_additional Additional section e_table Genetic codes Choice 0 0 1 2 3 4 5 6 9 10 11 12 13 14 15 16 21 22 23 ("", " -table=" + str(value))[value is not None and value!=vdef] 3 e_output Output section e_outfile Name of the output file (e_outfile) Filename showorf.e_outfile ("" , " -outfile=" + str(value))[value is not None] 4 e_outfile_out outfile_out option ShoworfReport Report e_outfile e_ruler Add a ruler Boolean 1 (" -noruler", "")[ bool(value) ] 5 e_plabel Number translations Boolean 1 (" -noplabel", "")[ bool(value) ] 6 e_nlabel Number dna sequence Boolean 1 (" -nonlabel", "")[ bool(value) ] 7 e_width Width of screen (value greater than or equal to 10) Integer 50 ("", " -width=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 10 is required value >= 10 8 auto Turn off any prompting String " -auto -stdout" 9 Programs-5.1.1/protpars.xml0000644000175000001560000010345511767572177014602 0ustar bneronsis protpars protpars Protein Sequence Parsimony Method http://bioweb2.pasteur.fr/docs/phylip/doc/protpars.html This program infers an unrooted phylogeny from protein sequences, using a new method intermediate between the approaches of Eck and Dayhoff (1966) and Fitch (1971). Eck and Dayhoff (1966) allowed any amino acid to change to any other, and counted the number of such changes needed to evolve the protein sequences on each given phylogeny. This has the problem that it allows replacements which are not consistent with the genetic code, counting them equally with replacements that are consistent. Fitch, on the other hand, counted the minimum number of nucleotide substitutions that would be needed to achieve the given protein sequences. This counts silent changes equally with those that change the amino acid. phylogeny:parsimony protpars String "protpars <protpars.params" "protpars <protpars.params" 0 infile Alignment File Protein Alignment PHYLIPI $infile ne "infile" infile != "infile" "ln -s $infile infile && " "ln -s " + str(infile) + " infile && " -10 The input file must contained aligned sequences in PHYLIP format obtained by sequence alignment programs. 5 10 Alpha ABCDEFGHIK Beta AB--EFGHIK Gamma ?BCDSFG*?? Delta CIKDEFGHIK Epsilon DIKDEFGHIK protpars_opt Parsimony options use_threshold Use Threshold parsimony (T) Boolean 0 ($value) ? "T\\n$threshold\\n" : "" ( "" , "T\n" + str( threshold ) + "\n" )[ value ] 3 protpars.params threshold Threshold parsimony value Integer $use_threshold use_threshold Value must be greater than 1 $value > 1 value > 1 2 protpars.params code Genetic code for 'categories' model (C) Choice U U M V F Y (defined $value and $value ne $vdef) ? "C\\n$code\\n" : "" ( "" , "C\n"+str( code ) +"\n" )[ value is not None and value != vdef ] 3 protpars.params jumble_opt Randomize options ( one dataset ) Use these options only if you have only one data set jumble Randomize (jumble) input order of sequences (J) Boolean 0 ($value) ? "J\\n$jumble_seed\\n$times2jumble\\n" : "" ( "" , "J\n" + str( jumble_seed ) + "\n" + str(times2jumble) + "\n" )[ value ] you can't use "Randomize options" and "Bootstrap options" at the same times not( $jumble and $seqboot) not (jumble and seqboot) 20 protpars.params jumble_seed Jumble random number seed (must be odd) Integer $jumble jumble Jumble seed must be odd $value > 0 and ($value % 2) != 0 value > 0 and (value % 2) != 0 19 times2jumble Number of times to jumble Integer $jumble jumble 1 bootstrap Bootstrap options ( multiple dataset ) if you bootstrap your data ( generate multiple dataset ) don't use Randomize options seqboot Perform a bootstrap before analysis Boolean 0 ($value) ? "seqboot <seqboot.params && mv outfile seqboot.outfile && rm infile && ln -s seqboot.outfile infile && " : "" ( "" , "seqboot < seqboot.params && mv outfile seqboot.outfile && rm infile && ln -s seqboot.outfile infile && " )[ value ] you can't use "Randomize options" and "Bootstrap options" at the same times not( $seqboot and $jumble) not( seqboot and jumble) -5 By selecting this option, the bootstrap will be performed on your sequence file. So you don't need to perform a separated seqboot before. Don't give an already bootstrapped file to the program, this won't work! You can't use "Randomize options" and "Bootstrap options" at the same time. Method Resampling methods Choice $seqboot seqboot bootstrap bootstrap "" "" jackknife "J\\n" "" permute_species "J\\nJ\\n" "J\nJ\n" permute_char "J\\nJ\\nJ\\n" "J\nJ\nJ\n" permute_within_species "J\\nJ\\nJ\\nJ\\n" "J\nJ\nJ\nJ\n" 1 1. The bootstrap. Bootstrapping was invented by Bradley Efron in 1979, and its use in phylogeny estimation was introduced by me (Felsenstein, 1985b). It involves creating a new data set by sampling N characters randomly with replacement, so that the resulting data set has the same size as the original, but some characters have been left out and others are duplicated. The random variation of the results from analyzing these bootstrapped data sets can be shown statistically to be typical of the variation that you would get from collecting new data sets. The method assumes that the characters evolve independently, an assumption that may not be realistic for many kinds of data. 2. Delete-half-jackknifing. This alternative to the bootstrap involves sampling a random half of the characters, and including them in the data but dropping the others. The resulting data sets are half the size of the original, and no characters are duplicated. The random variation from doing this should be very similar to that obtained from the bootstrap. The method is advocated by Wu (1986). 3. Permuting species for each characters. This method of resampling (well, OK, it may not be best to call it resampling) was introduced by Archie (1989) and Faith (1990; see also Faith and Cranston, 1991). It involves permuting the columns of the data matrix separately. This produces data matrices that have the same number and kinds of characters but no taxonomic structure. It is used for different purposes than the bootstrap, as it tests not the variation around an estimated tree but the hypothesis that there is no taxonomic structure in the data: if a statistic such as number of steps is significantly smaller in the actual data than it is in replicates that are permuted, then we can argue that there is some taxonomic structure in the data (though perhaps it might be just the presence of aa pair of sibling species). 4. Permuting characters order. This simply permutes the order of the characters, the same reordering being applied to all species. For many methods of tree inference this will make no difference to the outcome (unless one has rates of evolution correlated among adjacent sites). It is included as a possible step in carrying out a permutation test of homogeneity of characters (such as the Incongruence Length Difference test). 5. Permuting characters separately for each species. This is a method introduced by Steel, Lockhart, and Penny (1993) to permute data so as to destroy all phylogenetic structure, while keeping the base composition of each species the same as before. It shuffles the character order separately for each species. seqboot.params replicates How many replicates (R) Integer $seqboot seqboot 100 (defined $value and $value != $vdef) ? "R\\n$value\\n" : "" ( "" , "R\n" + str( value ) + "\n" )[ value is not None and value != vdef ] This server allows no more than 1000 replicates $replicates <= 1000 replicates <= 1000 Bad data sets number: it must be greater than 1 $value > 1 value > 1 1 seqboot.params seqboot_seed Random number seed (must be odd) Integer $seqboot seqboot "$value\\n" str(value) + "\n" Random seed must be odd $value > 0 and ($value % 2) != 0 value > 0 and (value % 2) != 0 1000 seqboot.params seqboot_times2jumble Number of times to jumble Integer $seqboot seqboot 1 the product of "number of times to jumble" and replicates must be less than 100000 ($seqboot_times2jumble * (defined $replicates) ? $replicates : 1) <= 100000 seqboot_times2jumble * ( 1 , replicates)[replicates is not None] <= 100000 multiple_dataset String $seqboot seqboot "M\\nD\\n$replicates\\n$seqboot_seed\\n$times2jumble\\n" "M\nD\n" + str( replicates ) + "\n" + str( seqboot_seed ) + "\n"+ str( seqboot_times2jumble ) + "\n" 1 protpars.params consense Compute a consensus tree Boolean $seqboot and $print_treefile seqboot and print_treefile 0 ($value) ? " && cp infile protpars.infile && cp protpars.outtree intree && consense <consense.params && mv outtree consense.outtree && mv outfile consense.outfile" : "" ("" , " && cp infile protpars.infile && cp protpars.outtree intree && consense < consense.params && mv outtree consense.outtree && mv outfile consense.outfile" )[ value ] 10 user_tree_opt User tree options user_tree Use user tree (default: no, search for best tree) (U) Boolean 0 ($value) ? "U\\n" : "" ( "" , "U\n" )[ value ] You cannot bootstrap your dataset and give a user tree at the same time not ( $user_tree and $seqboot ) not ( user_tree and seqboot ) You cannot randomize (jumble) your dataset and give a user tree at the same time not ( $user_tree and $jumble ) not ( user_tree and jumble ) 1 To give your tree to the program, you must normally put it in the alignment file, after the sequences, preceded by a line indicating how many trees you give. Here, this will be automatically appended: just give a treefile and the number of trees in it. protpars.params tree_file User tree file Tree NEWICK $user_tree user_tree (defined $value) ? "ln -s $tree_file intree && " : "" ( "" , "ln -s " + str( tree_file ) + " intree && ")[ value is not None ] -1 Give a tree whenever the infile does not already contain the tree. output Output options print_tree Print output tree (3) Boolean 1 ($value != $vdef) ? "" : "3\\n" ( "" , "3\n" )[ value != vdef ] 1 Tells the program to print a semi-graphical picture of the tree in the outfile. protpars.params print_sequences Print sequences at all nodes of tree (5) Boolean 0 ($value) ? "5\\n" : "" ( "" , "5\n" )[ value ] 1 protpars.params print_treefile Write out trees onto tree file (6) Boolean 1 ($value != $vdef) ? "" : "6\\n" ( "" , "6\n" )[ value != vdef ] 1 Tells the program to save the tree in a tree file (outtree) (a standard representation of trees where the tree is specified by a nested pairs of parentheses, enclosing names and separated by commas). protpars.params printdata Print out the data at start of run (1) Boolean 0 ($value) ? "1\\n" : "" ( "" , "1\n" )[ value ] 1 protpars.params print_steps Print out steps in each site (4) Boolean 0 ($value) ? "4\\n" : "" ( "" , "4\n" )[ value ] 1 protpars.params other_options Other options outgroup Outgroup species (O) Integer 1 (defined $value and $value != $vdef) ? "O\\n$value\\n" : "" ( "" , "O\n" + str( value ) +"\n" )[ value is not None and value != vdef ] Please enter a value greater than 0 $value > 0 value > 0 1 The O (Outgroup) option specifies which species is to have the root of the tree be on the line leading to it. For example, if the outgroup is a species "Mouse" then the root of the tree will be placed in the middle of the branch which is connected to this species, with Mouse branching off on one side of the root and the lineage leading to the rest of the tree on the other. This option is toggle on by choosing the number of the outgroup (the species being taken in the numerical order that they occur in the input file). Outgroup-rooting will not be attempted if it is a user-defined tree, despite your invoking the option. When it is used, the tree as printed out is still listed as being an unrooted tree, though the outgroup is connected to the bottommost node so that it is easy to visually convert the tree into rooted form. protpars.params outfile Outfile Text " && mv outfile protpars.outfile" " && mv outfile protpars.outfile" "protpars.outfile" "protpars.outfile" treefile Tree file Tree NEWICK $print_treefile print_treefile " && mv outtree protpars.outtree" " && mv outtree protpars.outtree" "protpars.outtree" "protpars.outtree" seqboot_out seqboot outfile SetOfAlignment AbstractText $seqboot seqboot 40 "seqboot.outfile" "seqboot.outfile" confirm String "y\\n" "y\n" 1000 protpars.params terminal_type String "0\\n" "0\n" -1 protpars.params bootconfirm String $seqboot seqboot "y\\n" "y\n" 100 seqboot.params bootterminal_type String $seqboot seqboot "0\\n" "0\n" -1 seqboot.params consense_confirm String $consense consense "Y\\n" "Y\n" 1000 consense.params consense_terminal_type String $consense consense "T\\n" "T\n" -2 consense.params consense_outgroup String $consense and $outgroup > 1 consense and outgroup > 1 "O\\n$outgroup\\n" "0\n" + str(outgroup) + "\n" 1000 consense.params consense_outfile Consense outfile Text $consense consense "consense.outfile" "consense.outfile" consense_treefile Consense tree file Tree NEWICK $consense consense "consense.outtree" "consense.outtree" Programs-5.1.1/fuzznuc.xml0000644000175000001560000002303212072525233014401 0ustar bneronsis fuzznuc EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net fuzznuc Search for patterns in nucleotide sequences http://bioweb2.pasteur.fr/docs/EMBOSS/fuzznuc.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:motifs fuzznuc e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_pattern Search pattern DNA Pattern AbstractText ("", " -pattern=@" + str(value))[value is not None] 2 The standard IUPAC one-letter codes for the nucleotides are used. The symbol 'n' is used for a position where any nucleotide is accepted. Ambiguities are indicated by listing the acceptable nucleotides for a given position, between square parentheses '[ ]'. For example: [ACG] stands for A or C or G. Ambiguities are also indicated by listing between a pair of curly brackets '{ }' the nucleotides that are not accepted at a given position. For example: {AG} stands for any nucleotides except A and G. Each element in a pattern is separated from its neighbor by a '-'. (Optional in fuzznuc). Repetition of an element of the pattern can be indicated by following that element with a numerical value or a numerical range between parenthesis. Examples: N(3) corresponds to N-N-N, N(2,4) corresponds to N-N or N-N-N or N-N-N-N. When a pattern is restricted to either the 5' or 3' end of a sequence, that pattern either starts with a '<' symbol or respectively ends with a '>' symbol. A period ends the pattern. (Optional in fuzznuc). For example, [CG](5)TG{A}N(1,5)C e_pmismatch Search pattern Integer 0 ("", " -pmismatch=" + str(value))[value is not None and value!=vdef] 3 e_advanced Advanced section e_complement Search complementary strand Boolean 0 ("", " -complement")[ bool(value) ] 4 e_output Output section e_outfile Name of the report file Filename fuzznuc.report ("" , " -outfile=" + str(value))[value is not None] 5 e_rformat_outfile Choose the report output format Choice SEQTABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 6 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/eprimer3.xml0000644000175000001560000021150412072525233014426 0ustar bneronsis eprimer3 EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net eprimer3 Picks PCR primers and hybridization oligos http://bioweb2.pasteur.fr/docs/EMBOSS/eprimer3.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:primers eprimer3 e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 The sequence from which to choose primers. The sequence must be presented 5' to 3' e_primer Pick pcr primer(s) Boolean 1 (" -noprimer", "")[ bool(value) ] 2 Tell EPrimer3 to pick primer(s) e_task Task Choice e_primer 1 1 2 3 4 ("", " -task=" + str(value))[value is not None and value!=vdef] 3 Tell EPrimer3 what task to perform. Legal values are 1: 'Pick PCR primers', 2: 'Pick forward primer only', 3: 'Pick reverse primer only', 4: 'No primers needed'. e_hybridprobe Pick hybridization probe Boolean 0 ("", " -hybridprobe")[ bool(value) ] 4 An 'internal oligo' is intended to be used as a hybridization probe (hyb probe) to detect the PCR product after amplification. e_mishyblibraryfile Primer3 internal oligo mishybridizing library file (optional) Primer3Mishybridizing AbstractText e_hybridprobe ("", " -mishyblibraryfile=" + str(value))[value is not None] 5 Similar to MISPRIMING-LIBRARY, except that the event we seek to avoid is hybridization of the internal oligo to sequences in this library rather than priming from them. The file must be in (a slightly restricted) FASTA format (W. B. Pearson and D.J. Lipman, PNAS 85:8 pp 2444-2448 [1988]); we briefly discuss the organization of this file below. If this parameter is specified then EPrimer3 locally aligns each candidate oligo against each library sequence and rejects those primers for which the local alignment score times a specified weight (see below) exceeds INTERNAL-OLIGO-MAX-MISHYB. (The maximum value of the weight is arbitrarily set to 12.0.) Each sequence entry in the FASTA-format file must begin with an 'id line' that starts with '>'. The contents of the id line is 'slightly restricted' in that EPrimer3 parses everything after any optional asterisk ('*') as a floating point number to use as the weight mentioned above. If the id line contains no asterisk then the weight defaults to 1.0. The alignment scoring system used is the same as for calculating complementarity among oligos (e.g. SELF-ANY). The remainder of an entry contains the sequence as lines following the id line up until a line starting with '>' or the end of the file. Whitespace and newlines are ignored. Characters 'A', 'T', 'G', 'C', 'a', 't', 'g', 'c' are retained and any other character is converted to 'N' (with the consequence that any IUB / IUPAC codes for ambiguous bases are converted to 'N'). There are no restrictions on line length. An empty value for this parameter indicates that no library should be used. e_mispriminglibraryfile Primer3 mispriming library file (optional) Primer3Mispriming AbstractText ("", " -mispriminglibraryfile=" + str(value))[value is not None] 6 The name of a file containing a nucleotide sequence library of sequences to avoid amplifying (for example repetitive sequences, or possibly the sequences of genes in a gene family that should not be amplified.) The file must be in (a slightly restricted) FASTA format (W. B. Pearson and D.J. Lipman, PNAS 85:8 pp 2444-2448 [1988]); we briefly discuss the organization of this file below. If this parameter is specified then EPrimer3 locally aligns each candidate primer against each library sequence and rejects those primers for which the local alignment score times a specified weight (see below) exceeds MAX-MISPRIMING. (The maximum value of the weight is arbitrarily set to 100.0.) Each sequence entry in the FASTA-format file must begin with an 'id line' that starts with '>'. The contents of the id line is 'slightly restricted' in that EPrimer3 parses everything after any optional asterisk ('*') as a floating point number to use as the weight mentioned above. If the id line contains no asterisk then the weight defaults to 1.0. The alignment scoring system used is the same as for calculating complementarity among oligos (e.g. SELF-ANY). The remainder of an entry contains the sequence as lines following the id line up until a line starting with '>' or the end of the file. Whitespace and newlines are ignored. Characters 'A', 'T', 'G', 'C', 'a', 't', 'g', 'c' are retained and any other character is converted to 'N' (with the consequence that any IUB / IUPAC codes for ambiguous bases are converted to 'N'). There are no restrictions on line length. An empty value for this parameter indicates that no repeat library should be used. e_additional Additional section e_programsection Program options e_numreturn Number of results to return (value greater than or equal to 0) Integer 5 ("", " -numreturn=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 7 The maximum number of primer pairs to return. Primer pairs returned are sorted by their 'quality', in other words by the value of the objective function (where a lower number indicates a better primer pair). Caution: setting this parameter to a large value will increase running time. e_seqoptsection Sequence options e_includedregion Included region(s) String ("", " -includedregion=" + str(value))[value is not None] 8 A sub-region of the given sequence in which to pick primers. For example, often the first dozen or so bases of a sequence are vector, and should be excluded from consideration. The value for this parameter has the form (start),(end) where (start) is the index of the first base to consider, and (end) is the last in the primer-picking region. e_targetregion Target region(s) String ("", " -targetregion=" + str(value))[value is not None] 9 If one or more Targets is specified then a legal primer pair must flank at least one of them. A Target might be a simple sequence repeat site (for example a CA repeat) or a single-base-pair polymorphism. The value should be a space-separated list of (start),(end) pairs where (start) is the index of the first base of a Target, and (end) is the last E.g. 50,51 requires primers to surround the 2 bases at positions 50 and 51. e_excludedregion Excluded region(s) String ("", " -excludedregion=" + str(value))[value is not None] 10 Primer oligos may not overlap any region specified in this tag. The associated value must be a space-separated list of (start),(end) pairs where (start) is the index of the first base of the excluded region, and and (end) is the last. This tag is useful for tasks such as excluding regions of low sequence quality or for excluding regions containing repetitive elements such as ALUs or LINEs. E.g. 401,407 68,70 forbids selection of primers in the 7 bases starting at 401 and the 3 bases at 68. e_forwardinput Forward input primer sequence to check String ("", " -forwardinput=" + str(value))[value is not None] 11 The sequence of a forward primer to check and around which to design reverse primers and optional internal oligos. Must be a substring of SEQUENCE. e_reverseinput Reverse input primer sequence to check String ("", " -reverseinput=" + str(value))[value is not None] 12 The sequence of a reverse primer to check and around which to design forward primers and optional internal oligos. Must be a substring of the reverse strand of SEQUENCE. e_primersection Primer options e_gcclamp Gc clamp (value greater than or equal to 0) Integer e_primer 0 ("", " -gcclamp=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 13 Require the specified number of consecutive Gs and Cs at the 3' end of both the forward and reverse primer. (This parameter has no effect on the internal oligo if one is requested.) e_osize Primer optimum size (value greater than or equal to 0) Integer e_primer 20 ("", " -osize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 14 Optimum length (in bases) of a primer oligo. EPrimer3 will attempt to pick primers close to this length. e_minsize Primer minimum size (value greater than or equal to 1) Integer e_primer 18 ("", " -minsize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 15 Minimum acceptable length of a primer. Must be greater than 0 and less than or equal to MAX-SIZE. e_maxsize Primer maximum size (value less than or equal to 35) Integer e_primer 27 ("", " -maxsize=" + str(value))[value is not None and value!=vdef] Value less than or equal to 35 is required value <= 35 16 Maximum acceptable length (in bases) of a primer. Currently this parameter cannot be larger than 35. This limit is governed by the maximum oligo size for which EPrimer3's melting-temperature is valid. e_otm Primer optimum tm Float e_primer 60.0 ("", " -otm=" + str(value))[value is not None and value!=vdef] 17 Optimum melting temperature(Celsius) for a primer oligo. EPrimer3 will try to pick primers with melting temperatures are close to this temperature. The oligo melting temperature formula in EPrimer3 is that given in Rychlik, Spencer and Rhoads, Nucleic Acids Research, vol 18, num 21, pp 6409-6412 and Breslauer, Frank, Bloecker and Marky, Proc. Natl. Acad. Sci. USA, vol 83, pp 3746-3750. Please refer to the former paper for background discussion. e_mintm Primer minimum tm Float e_primer 57.0 ("", " -mintm=" + str(value))[value is not None and value!=vdef] 18 Minimum acceptable melting temperature(Celsius) for a primer oligo. e_maxtm Primer maximum tm Float e_primer 63.0 ("", " -maxtm=" + str(value))[value is not None and value!=vdef] 19 Maximum acceptable melting temperature(Celsius) for a primer oligo. e_maxdifftm Maximum difference in tm of primers Float e_primer 100.0 ("", " -maxdifftm=" + str(value))[value is not None and value!=vdef] 20 Maximum acceptable (unsigned) difference between the melting temperatures of the forward and reverse primers. e_ogcpercent Primer optimum gc percent Float e_primer 50.0 ("", " -ogcpercent=" + str(value))[value is not None and value!=vdef] 21 Primer optimum GC percent. e_mingc Primer minimum gc percent Float e_primer 20.0 ("", " -mingc=" + str(value))[value is not None and value!=vdef] 22 Minimum allowable percentage of Gs and Cs in any primer. e_maxgc Primer maximum gc percent Float e_primer 80.0 ("", " -maxgc=" + str(value))[value is not None and value!=vdef] 23 Maximum allowable percentage of Gs and Cs in any primer generated by Primer. e_saltconc Salt concentration (mm) Float e_primer 50.0 ("", " -saltconc=" + str(value))[value is not None and value!=vdef] 24 The millimolar concentration of salt (usually KCl) in the PCR. EPrimer3 uses this argument to calculate oligo melting temperatures. e_dnaconc Dna concentration (nm) Float e_primer 50.0 ("", " -dnaconc=" + str(value))[value is not None and value!=vdef] 25 The nanomolar concentration of annealing oligos in the PCR. EPrimer3 uses this argument to calculate oligo melting temperatures. The default (50nM) works well with the standard protocol used at the Whitehead/MIT Center for Genome Research--0.5 microliters of 20 micromolar concentration for each primer oligo in a 20 microliter reaction with 10 nanograms template, 0.025 units/microliter Taq polymerase in 0.1 mM each dNTP, 1.5mM MgCl2, 50mM KCl, 10mM Tris-HCL (pH 9.3) using 35 cycles with an annealing temperature of 56 degrees Celsius. This parameter corresponds to 'c' in Rychlik, Spencer and Rhoads' equation (ii) (Nucleic Acids Research, vol 18, num 21) where a suitable value (for a lower initial concentration of template) is 'empirically determined'. The value of this parameter is less than the actual concentration of oligos in the reaction because it is the concentration of annealing oligos, which in turn depends on the amount of template (including PCR product) in a given cycle. This concentration increases a great deal during a PCR; fortunately PCR seems quite robust for a variety of oligo melting temperatures. See ADVICE FOR PICKING PRIMERS. e_maxpolyx Maximum polynucleotide repeat (value greater than or equal to 0) Integer e_primer 5 ("", " -maxpolyx=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 26 The maximum allowable length of a mononucleotide repeat in a primer, for example AAAAAA. e_productsection Product options e_psizeopt Product optimum size (value greater than or equal to 0) Integer e_primer 200 ("", " -psizeopt=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 27 The optimum size for the PCR product. 0 indicates that there is no optimum product size. e_prange Product size range String e_primer 100-300 ("", " -prange=" + str(value))[value is not None and value!=vdef] 28 The associated values specify the lengths of the product that the user wants the primers to create, and is a space separated list of elements of the form (x)-(y) where an (x)-(y) pair is a legal range of lengths for the product. For example, if one wants PCR products to be between 100 to 150 bases (inclusive) then one would set this parameter to 100-150. If one desires PCR products in either the range from 100 to 150 bases or in the range from 200 to 250 bases then one would set this parameter to 100-150 200-250. EPrimer3 favors ranges to the left side of the parameter string. EPrimer3 will return legal primers pairs in the first range regardless the value of the objective function for these pairs. Only if there are an insufficient number of primers in the first range will EPrimer3 return primers in a subsequent range. e_ptmopt Product optimum tm Float e_primer 0.0 ("", " -ptmopt=" + str(value))[value is not None and value!=vdef] 29 The optimum melting temperature for the PCR product. 0 indicates that there is no optimum temperature. e_ptmmin Product minimum tm Float e_primer -1000000.0 ("", " -ptmmin=" + str(value))[value is not None and value!=vdef] 30 The minimum allowed melting temperature of the amplicon. Please see the documentation on the maximum melting temperature of the product for details. e_ptmmax Product maximum tm Float e_primer 1000000.0 ("", " -ptmmax=" + str(value))[value is not None and value!=vdef] 31 The maximum allowed melting temperature of the amplicon. Product Tm is calculated using the formula from Bolton and McCarthy, PNAS 84:1390 (1962) as presented in Sambrook, Fritsch and Maniatis, Molecular Cloning, p 11.46 (1989, CSHL Press). Tm = 81.5 + 16.6(log10([Na+])) + .41*(%GC) - 600/length Where [Na+} is the molar sodium concentration, (%GC) is the percent of Gs and Cs in the sequence, and length is the length of the sequence. A similar formula is used by the prime primer selection program in GCG, which instead uses 675.0/length in the last term (after F. Baldino, Jr, M.-F. Chesselet, and M.E. Lewis, Methods in Enzymology 168:766 (1989) eqn (1) on page 766 without the mismatch and formamide terms). The formulas here and in Baldino et al. assume Na+ rather than K+. According to J.G. Wetmur, Critical Reviews in BioChem. and Mol. Bio. 26:227 (1991) 50 mM K+ should be equivalent in these formulae to .2 M Na+. EPrimer3 uses the same salt concentration value for calculating both the primer melting temperature and the oligo melting temperature. If you are planning to use the PCR product for hybridization later this behavior will not give you the Tm under hybridization conditions. e_oligosinput Internal oligo input e_oexcludedregion Internal oligo excluded region String e_hybridprobe ("", " -oexcludedregion=" + str(value))[value is not None] 32 Middle oligos may not overlap any region specified by this tag. The associated value must be a space-separated list of (start),(end) pairs, where (start) is the index of the first base of an excluded region, and (end) is the last. Often one would make Target regions excluded regions for internal oligos. e_oligoinput Internal oligo input sequence (if any) String e_hybridprobe ("", " -oligoinput=" + str(value))[value is not None] 33 The sequence of an internal oligo to check and around which to design forward and reverse primers. Must be a substring of SEQUENCE. e_oligosection Internal oligo options e_osizeopt Internal oligo optimum size (value greater than or equal to 0) Integer e_hybridprobe 20 ("", " -osizeopt=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 34 Optimum length (in bases) of an internal oligo. EPrimer3 will attempt to pick primers close to this length. e_ominsize Internal oligo minimum size (value greater than or equal to 0) Integer e_hybridprobe 18 ("", " -ominsize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 35 Minimum acceptable length of an internal oligo. Must be greater than 0 and less than or equal to INTERNAL-OLIGO-MAX-SIZE. e_omaxsize Internal oligo maximum size (value less than or equal to 35) Integer e_hybridprobe 27 ("", " -omaxsize=" + str(value))[value is not None and value!=vdef] Value less than or equal to 35 is required value <= 35 36 Maximum acceptable length (in bases) of an internal oligo. Currently this parameter cannot be larger than 35. This limit is governed by maximum oligo size for which EPrimer3's melting-temperature is valid. e_otmopt Internal oligo optimum tm Float e_hybridprobe 60.0 ("", " -otmopt=" + str(value))[value is not None and value!=vdef] 37 Optimum melting temperature (Celsius) for an internal oligo. EPrimer3 will try to pick oligos with melting temperatures that are close to this temperature. The oligo melting temperature formula in EPrimer3 is that given in Rychlik, Spencer and Rhoads, Nucleic Acids Research, vol 18, num 21, pp 6409-6412 and Breslauer, Frank, Bloecker and Marky, Proc. Natl. Acad. Sci. USA, vol 83, pp 3746-3750. Please refer to the former paper for background discussion. e_otmmin Internal oligo minimum tm Float e_hybridprobe 57.0 ("", " -otmmin=" + str(value))[value is not None and value!=vdef] 38 Minimum acceptable melting temperature(Celsius) for an internal oligo. e_otmmax Internal oligo maximum tm Float e_hybridprobe 63.0 ("", " -otmmax=" + str(value))[value is not None and value!=vdef] 39 Maximum acceptable melting temperature (Celsius) for an internal oligo. e_ogcopt Internal oligo optimum gc percent Float e_hybridprobe 50.0 ("", " -ogcopt=" + str(value))[value is not None and value!=vdef] 40 Internal oligo optimum GC percent. e_ogcmin Internal oligo minimum gc Float e_hybridprobe 20.0 ("", " -ogcmin=" + str(value))[value is not None and value!=vdef] 41 Minimum allowable percentage of Gs and Cs in an internal oligo. e_ogcmax Internal oligo maximum gc Float e_hybridprobe 80.0 ("", " -ogcmax=" + str(value))[value is not None and value!=vdef] 42 Maximum allowable percentage of Gs and Cs in any internal oligo generated by Primer. e_osaltconc Internal oligo salt concentration (mm) Float e_hybridprobe 50.0 ("", " -osaltconc=" + str(value))[value is not None and value!=vdef] 43 The millimolar concentration of salt (usually KCl) in the hybridization. EPrimer3 uses this argument to calculate internal oligo melting temperatures. e_odnaconc Internal oligo dna concentration (nm) Float e_hybridprobe 50.0 ("", " -odnaconc=" + str(value))[value is not None and value!=vdef] 44 The nanomolar concentration of annealing internal oligo in the hybridization. e_oanyself Internal oligo maximum self complementarity (value less than or equal to 9999.99) Float e_hybridprobe 12.00 ("", " -oanyself=" + str(value))[value is not None and value!=vdef] Value less than or equal to 9999.99 is required value <= 9999.99 45 The maximum allowable local alignment score when testing an internal oligo for (local) self-complementarity. Local self-complementarity is taken to predict the tendency of oligos to anneal to themselves The scoring system gives 1.00 for complementary bases, -0.25 for a match of any base (or N) with an N, -1.00 for a mismatch, and -2.00 for a gap. Only single-base-pair gaps are allowed. For example, the alignment 5' ATCGNA 3' || | | 3' TA-CGT 5' is allowed (and yields a score of 1.75), but the alignment 5' ATCCGNA 3' || | | 3' TA--CGT 5' is not considered. Scores are non-negative, and a score of 0.00 indicates that there is no reasonable local alignment between two oligos. e_oendself Internal oligo maximum 3' self complementarity (value less than or equal to 9999.99) Float e_hybridprobe 12.00 ("", " -oendself=" + str(value))[value is not None and value!=vdef] Value less than or equal to 9999.99 is required value <= 9999.99 46 The maximum allowable 3'-anchored global alignment score when testing a single oligo for self-complementarity. The scoring system is as for the Maximum Complementarity argument. In the examples above the scores are 7.00 and 6.00 respectively. Scores are non-negative, and a score of 0.00 indicates that there is no reasonable 3'-anchored global alignment between two oligos. In order to estimate 3'-anchored global alignments for candidate oligos, Primer assumes that the sequence from which to choose oligos is presented 5' to 3'. INTERNAL-OLIGO-SELF-END is meaningless when applied to internal oligos used for hybridization-based detection, since primer-dimer will not occur. We recommend that INTERNAL-OLIGO-SELF-END be set at least as high as INTERNAL-OLIGO-SELF-ANY. e_opolyxmax Internal oligo maximum polynucleotide repeat (value greater than or equal to 0) Integer e_hybridprobe 5 ("", " -opolyxmax=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 47 The maximum allowable length of an internal oligo mononucleotide repeat, for example AAAAAA. e_omishybmax Internal oligo maximum mishybridization (value less than or equal to 9999.99) Float e_hybridprobe 12.0 ("", " -omishybmax=" + str(value))[value is not None and value!=vdef] Value less than or equal to 9999.99 is required value <= 9999.99 48 Similar to MAX-MISPRIMING except that this parameter applies to the similarity of candidate internal oligos to the library specified in INTERNAL-OLIGO-MISHYB-LIBRARY. e_advanced Advanced section e_explainflag Explain flag Boolean 0 ("", " -explainflag")[ bool(value) ] 49 If this flag is true, produce LEFT-EXPLAIN, RIGHT-EXPLAIN, and INTERNAL-OLIGO-EXPLAIN output tags, which are intended to provide information on the number of oligos and primer pairs that EPrimer3 examined, and statistics on the number discarded for various reasons. e_fileflag Create results files for each sequence Boolean 0 ("", " -fileflag")[ bool(value) ] 50 If the associated value is true, then EPrimer3 creates two output files for each input SEQUENCE. File (sequence-id).for lists all acceptable forward primers for (sequence-id), and (sequence-id).rev lists all acceptable reverse primers for (sequence-id), where (sequence-id) is the value of the SEQUENCE-ID tag (which must be supplied). In addition, if the input tag TASK is 1 or 4, EPrimer3 produces a file (sequence-id).int, which lists all acceptable internal oligos. e_firstbaseindex First base index Integer 1 ("", " -firstbaseindex=" + str(value))[value is not None and value!=vdef] 51 This parameter is the index of the first base in the input sequence. For input and output using 1-based indexing (such as that used in GenBank and to which many users are accustomed) set this parameter to 1. For input and output using 0-based indexing set this parameter to 0. (This parameter also affects the indexes in the contents of the files produced when the primer file flag is set.) e_pickanyway Pick anyway Boolean 0 ("", " -pickanyway")[ bool(value) ] 52 If true pick a primer pair even if LEFT-INPUT, RIGHT-INPUT, or INTERNAL-OLIGO-INPUT violates specific constraints. e_maxmispriming Primer maximum mispriming (value less than or equal to 9999.99) Float 12.00 ("", " -maxmispriming=" + str(value))[value is not None and value!=vdef] Value less than or equal to 9999.99 is required value <= 9999.99 53 The maximum allowed weighted similarity with any sequence in MISPRIMING-LIBRARY. e_pairmaxmispriming Primer pair maximum mispriming (value less than or equal to 9999.99) Float 24.00 ("", " -pairmaxmispriming=" + str(value))[value is not None and value!=vdef] Value less than or equal to 9999.99 is required value <= 9999.99 54 The maximum allowed sum of weighted similarities of a primer pair (one similarity for each primer) with any single sequence in MISPRIMING-LIBRARY. e_numnsaccepted Maximum ns accepted in a primer (value greater than or equal to 0) Integer 0 ("", " -numnsaccepted=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 55 Maximum number of unknown bases (N) allowable in any primer. e_selfany Maximum self complementarity (value from 0.00 to 9999.99) Float 8.00 ("", " -selfany=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.00 is required value >= 0.00 Value less than or equal to 9999.99 is required value <= 9999.99 56 The maximum allowable local alignment score when testing a single primer for (local) self-complementarity and the maximum allowable local alignment score when testing for complementarity between forward and reverse primers. Local self-complementarity is taken to predict the tendency of primers to anneal to each other without necessarily causing self-priming in the PCR. The scoring system gives 1.00 for complementary bases, -0.25 for a match of any base (or N) with an N, -1.00 for a mismatch, and -2.00 for a gap. Only single-base-pair gaps are allowed. For example, the alignment 5' ATCGNA 3' ...|| | | 3' TA-CGT 5' is allowed (and yields a score of 1.75), but the alignment 5' ATCCGNA 3' ...|| | | 3' TA--CGT 5' is not considered. Scores are non-negative, and a score of 0.00 indicates that there is no reasonable local alignment between two oligos. e_selfend Maximum 3' self complementarity (value greater than or equal to 0.00) Float 3.00 ("", " -selfend=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.00 is required value >= 0.00 57 The maximum allowable 3'-anchored global alignment score when testing a single primer for self-complementarity, and the maximum allowable 3'-anchored global alignment score when testing for complementarity between forward and reverse primers. The 3'-anchored global alignment score is taken to predict the likelihood of PCR-priming primer-dimers, for example 5' ATGCCCTAGCTTCCGGATG 3' .............||| ||||| ..........3' AAGTCCTACATTTAGCCTAGT 5' or 5' AGGCTATGGGCCTCGCGA 3' ...............|||||| ............3' AGCGCTCCGGGTATCGGA 5' The scoring system is as for the Maximum Complementarity argument. In the examples above the scores are 7.00 and 6.00 respectively. Scores are non-negative, and a score of 0.00 indicates that there is no reasonable 3'-anchored global alignment between two oligos. In order to estimate 3'-anchored global alignments for candidate primers and primer pairs, Primer assumes that the sequence from which to choose primers is presented 5' to 3'. It is nonsensical to provide a larger value for this parameter than for the Maximum (local) Complementarity parameter because the score of a local alignment will always be at least as great as the score of a global alignment. e_primerweights Primer penalty weights e_maxendstability Maximum 3' end stability (value less than or equal to 999.9999) Float 9.0 ("", " -maxendstability=" + str(value))[value is not None and value!=vdef] Value less than or equal to 999.9999 is required value <= 999.9999 58 The maximum stability for the five 3' bases of a forward or reverse primer. Bigger numbers mean more stable 3' ends. The value is the maximum delta G for duplex disruption for the five 3' bases as calculated using the nearest neighbor parameters published in Breslauer, Frank, Bloecker and Marky, Proc. Natl. Acad. Sci. USA, vol 83, pp 3746-3750. EPrimer3 uses a completely permissive default value for backward compatibility (which we may change in the next release). Rychlik recommends a maximum value of 9 (Wojciech Rychlik, 'Selection of Primers for Polymerase Chain Reaction' in BA White, Ed., 'Methods in Molecular Biology, Vol. 15: PCR Protocols: Current Methods and Applications', 1993, pp 31-40, Humana Press, Totowa NJ). e_output Output section e_outfile Name of the output file (e_outfile) Filename eprimer3.e_outfile ("" , " -outfile=" + str(value))[value is not None] 59 e_outfile_out outfile_out option Primer3Report Report e_outfile auto Turn off any prompting String " -auto -stdout" 60 Programs-5.1.1/rnainverse.xml0000644000175000001560000003375611767572177015112 0ustar bneronsis rnainverse RNAinverse Find RNA sequences with given secondary structure Hofacker I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994) Fast Folding and Comparison of RNA Secondary Structures. Monatshefte f. Chemie 125: 167-188 A. Walter, D Turner, J Kim, M Lyttle, P Muller, D Mathews, M Zuker Coaxial stacking of helices enhances binding of oligoribonucleotides. PNAS, 91, pp 9218-9222, 1994 M. Zuker, P. Stiegler (1981) Optimal computer folding of large RNA sequences using thermodynamic and auxiliary information, Nucl Acid Res 9: 133-148 J.S. McCaskill (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structures, Biopolymers 29: 11051119 D.H. Turner N. Sugimoto and S.M. Freier (1988) RNA structure prediction, Ann Rev Biophys Biophys Chem 17: 167-192 D. Adams (1979) The hitchhiker's guide to the galaxy, Pan Books, London RNAinverse searches for sequences folding into a predefined structure, thereby inverting the folding algorithm. Target structures (in bracket notation) and starting sequences for the search are read alternately from file. sequence:nucleic:2D_structure structure:2D_structure rnainverse String rnainverse "RNAinverse" "RNAinverse" seq Structures File RNA RNAStructure AbstractText " < $value" " < " + str(value) 1000 Target structures and starting sequences for the search. (((.(((....))).))) NNNgNNNNNNNNNNaNNN control Control options 2 folding Folding method (-F) Choice m m p mp (defined $value and $value ne $vdef)? " -F$value" : "" ("", " -F" + str(value))[ value is not None] Use minimum energy (-Fm), partition function folding (-Fp) or both (-Fmp). In partition function mode, the probability of the target structure exp(-E(S)/kT)/Q is maximized. This probability is written in brackets after the found sequence and Hamming distance. In most cases you'll want to use the -f option in conjunction with -Fp, see below. The default is -Fm. final Stop search when sequence is found with E(s)-F smaller than this value (-f) Float $folding eq 'mp' or $folding eq 'p' folding == 'mp' or folding == 'p' (defined $value)? " -f $value" : "" ("", " -f " + str(value))[ value is not None] In combination with -Fp F=-kT*ln(Q) repeats Search repeatedly for the same structure (-R) Integer (defined $value)? " -R $value" : "" ("", " -R " + str(value))[ value is not None] Search repeatedly for the same structure. If repeats is negative search until -repeats exact solutions are found, no output is done for unsuccessful searches. Be aware, that the program will not terminate if the target structure can not be found. alphabet Find sequences using only bases from this alphabet (-a) String (defined $value)? " -a $value" : "" ("", " -a " + str(value))[ value is not None] others_options Other options 2 temperature Rescale energy parameters to a temperature of temperature Celcius (-T) Integer 37 (defined $value and $value != $vdef)? " -T $value" : "" ( "" , " -T " + str(value) )[ value is not None and value != vdef] tetraloops Do not include special stabilizing energies for certain tetraloops (-4) Boolean 0 ($value)? " -4" : "" ( "" , " -4" )[ value ] dangling How to treat dangling end energies for bases adjacent to helices in free ends and multiloops (-d) Choice -d1 -d1 -d -d2 (defined $value and $value ne $vdef)? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] How to treat 'dangling end' energies for bases adjacent to helices in free ends and multiloops: Normally only unpaired bases can participate in at most one dangling end. With -d2 this check is ignored, this is the default for partition function folding (-p). -d ignores dangling ends altogether. Note that by default pf and mfe folding treat dangling ends differently, use -d2 (or -d) in addition to -p to ensure that both algorithms use the same energy model. The -d2 options is available for RNAfold, RNAeval, and RNAinverse only. noGU Do not allow GU pairs (-noGU) Boolean 0 ($value)? " -noGU" : "" ( "" , " -noGU" )[ value ] noCloseGU Do not allow GU pairs at the end of helices (-noCloseGU) Boolean 0 ($value)? " -noCloseGU" : "" ( "" , " -noCloseGU" )[ value ] nsp Non standard pairs (comma seperated list) (-nsp) String (defined $value)? " -nsp $value" : "" ( "" , " -nsp " + str(value) )[ value is not None ] Allow other pairs in addition to the usual AU,GC,and GU pairs. pairs is a comma seperated list of additionally allowed pairs. If a the first character is a '-' then AB will imply that AB and BA are allowed pairs. e.g. RNAfold -nsp -GA will allow GA and AG pairs. Nonstandard pairs are given 0 stacking energy. parameter Parameter file (-P) EnergyParameterFile AbstractText (defined $value)? " -P $value" : "" ( "" , " -P " + str(value) )[ value is not None ] Read energy parameters from paramfile, instead of using the default parameter set. A sample parameterfile should accompany your distribution. See the RNAlib documentation for details on the file format. Programs-5.1.1/dan.xml0000644000175000001560000006231012072525233013441 0ustar bneronsis dan EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net dan Calculates nucleic acid melting temperature http://bioweb2.pasteur.fr/docs/EMBOSS/dan.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:composition dan e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_windowsize Enter window size (value from 1 to 100) Integer 20 ("", " -windowsize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 Value less than or equal to 100 is required value <= 100 2 The values of melting point and other thermodynamic properties of the sequence are determined by taking a short length of sequence known as a window and determining the properties of the sequence in that window. The window is incrementally moved along the sequence with the properties being calculated at each new position. e_shiftincrement Enter shift increment (value greater than or equal to 1) Integer 1 ("", " -shiftincrement=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 3 This is the amount by which the window is moved at each increment in order to find the melting point and other properties along the sequence. e_dnaconc Enter dna concentration (nm) (value from 1. to 100000.) Float 50. ("", " -dnaconc=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1. is required value >= 1. Value less than or equal to 100000. is required value <= 100000. 4 e_saltconc Enter salt concentration (mm) (value from 1. to 1000.) Float 50. ("", " -saltconc=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1. is required value >= 1. Value less than or equal to 1000. is required value <= 1000. 5 e_additional Additional section e_productsection Product options e_product Prompt for product values Boolean 0 ("", " -product")[ bool(value) ] 6 This prompts for percent formamide, percent of mismatches allowed and product length. e_formamide Enter percentage of formamide (value from 0. to 100.) Float e_product 0. ("", " -formamide=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0. is required value >= 0. Value less than or equal to 100. is required value <= 100. 7 This specifies the percent formamide to be used in calculations (it is ignored unless -product is used). e_mismatch Enter percent mismatch (value from 0. to 100.) Float e_product 0. ("", " -mismatch=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0. is required value >= 0. Value less than or equal to 100. is required value <= 100. 8 This specifies the percent mismatch to be used in calculations (it is ignored unless -product is used). e_prodlen Enter the product length Integer e_product and e_windowsize ("", " -prodlen=" + str(value))[value is not None] 9 This specifies the product length to be used in calculations (it is ignored unless -product is used). e_thermosection Thermodynamic options e_thermo Thermodynamic calculations Boolean 0 ("", " -thermo")[ bool(value) ] 10 Output the DeltaG, DeltaH and DeltaS values of the sequence windows to the output data file. e_temperature Enter temperature (value from 0. to 100.) Float e_thermo 25. ("", " -temperature=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0. is required value >= 0. Value less than or equal to 100. is required value <= 100. 11 If -thermo has been specified then this specifies the temperature at which to calculate the DeltaG, DeltaH and DeltaS values. e_advanced Advanced section e_rna Use rna data values Boolean 0 ("", " -rna")[ bool(value) ] 12 This specifies that the sequence is an RNA sequence and not a DNA sequence. e_output Output section e_plot Produce a plot Boolean 0 ("", " -plot")[ bool(value) ] 13 If this is not specified then the file of output data is produced, else a plot of the melting point along the sequence is produced. e_mintemp Enter minimum temperature (value from 0. to 150.) Float e_plot 55. ("", " -mintemp=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0. is required value >= 0. Value less than or equal to 150. is required value <= 150. 14 Enter a minimum value for the temperature scale (y-axis) of the plot. e_graph Choose the e_graph output format Choice e_plot png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 15 xy_goutfile Name of the output graph Filename e_plot dan_xygraph ("" , " -goutfile=" + str(value))[value is not None] 16 xy_outgraph_png Graph file Picture Binary e_plot and e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_plot and e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_plot and e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_plot and e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_plot and e_graph == "data" "*.dat" e_outfile Name of the report file Filename not e_plot dan.report ("" , " -outfile=" + str(value))[value is not None] 17 If a plot is not being produced then data on the melting point etc. in each window along the sequence is output to the file. e_rformat_outfile Choose the report output format Choice not e_plot SEQTABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 18 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 19 Programs-5.1.1/rankoptimizer.xml0000644000175000001560000001233212125257722015600 0ustar bneronsis rankoptimizer 1.0 rankoptimizer rankoptimizer report taxonomic abundance with Krona library, based on BLAST hits. Programm use Krona 2.0 an interactive metagenomic visualization tool in a Web browser C. Maufrais http://sourceforge.net/p/krona/home/krona/ Krona-2.0: Ondov BD, Bergman NH, and Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30; 12(1):385. database:search:display rankoptimizer inputFile Taxoptimizer output file taxoptimizerTextReport Report Tabulated file ' -i ' + str(value) 10 Output Options rankoptimizerOptions lca Report lowest common ancestor of taxonomic abundance (-a) Boolean 0 70 ('',' -a ')[value] xmlKronaOutput xml output Boolean 1 Abundance is report in a xml file with krona Specification (-k). ('', ' -k rankoptimizer.xml')[value] 20 htmlKronaOutput html output Boolean 1 Abundance is report in html file with krona Specification and Krona javascript library (-v). ('', ' -v rankoptimizer.html' )[value] 30 delta Percentage of score to be considered to be relevant (-d) Integer 0 Analysis lines only if the score S is <= best score Sb + value. Default: value = 0. ('',' -d '+ str(value))[value is not None] 40 cellular Remove 'cellular organism' from taxonomy (-l) Boolean 0 70 ('',' -l ')[value] lcaout LCA output name LCAReport Report lca "rankoptimizer.out" htmloutfile HTML Output file(s) KronaHtmlReport Report htmlKronaOutput "rankoptimizer.html" xmloutfile XML Output file(s) KronaXMLReport Report xmlKronaOutput "rankoptimizer.xml" Programs-5.1.1/polydot.xml0000644000175000001560000003026212072525233014372 0ustar bneronsis polydot EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net polydot Draw dotplots for all-against-all comparison of a sequence set http://bioweb2.pasteur.fr/docs/EMBOSS/polydot.html http://emboss.sourceforge.net/docs/themes alignment:pairwise:dot_plots polydot e_input Input section e_sequences sequences option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 2,n ("", " -sequences=" + str(value))[value is not None] 1 File containing a sequence alignment e_required Required section e_wordsize Word size (value greater than or equal to 2) Integer 6 ("", " -wordsize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 2 is required value >= 2 2 e_output Output section e_gap Gap (in residues) between dotplots (value greater than or equal to 0) Integer 10 ("", " -gap=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 3 This specifies the size of the gap that is used to separate the individual dotplots in the display. The size is measured in residues, as displayed in the output. e_boxit Draw a box around each dotplot Boolean 1 (" -noboxit", "")[ bool(value) ] 4 e_dumpfeat Dump all matches as feature files Boolean 0 ("", " -dumpfeat")[ bool(value) ] 5 e_outfeat Name of the output feature file (e_outfeat) Filename e_dumpfeat polydot.e_outfeat ("" , " -outfeat=" + str(value))[value is not None] 6 e_offormat_outfeat Choose the feature output format Choice e_dumpfeat GFF GFF EMBL SWISSPROT NBRF CODATA ("", " -offormat=" + str(value))[value is not None and value!=vdef] 7 e_outfeat_out outfeat_out option Feature AbstractText e_outfeat e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 8 e_goutfile Name of the output graph Filename polydot_graph ("" , " -goutfile=" + str(value))[value is not None] 9 outgraph_png Graph file Picture Binary e_graph == "png" "*.png" outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" outgraph_data Graph file Text e_graph == "data" "*.dat" auto Turn off any prompting String " -auto -stdout" 10 Programs-5.1.1/primo.xml0000644000175000001560000001447011767572177014054 0ustar bneronsis primo PRIMO A primer design tool Kupfer, Li P. Li, K. C. Kupfer, C. J. Davies, D. Burbee, G. A. Evans, and H. R. Garner. PRIMO: A primer design program that applies base quality statistics for automated large-scale DNA sequencing. Genomics 40:476-485 (1997). sequence:nucleic:primers primo input_file_name Sequence data DNA Sequence FASTA " $value" " "+str(value) 1 all String " -all" " -all" 10 cover Cover template with walking-primers on both strands (-cover) Boolean 0 ($value)? " -cover" : "" ( "" , " -cover" )[ value ] 10 print Print formatted/annotated sequence to log file (-print) Boolean not $cover not cover 0 ($value)? " -print" : "" ( "" , " -print" )[ value ] 10 regions_file Regions file (-read) PrimoRegion AbstractText (defined $value)? " -read $value" : "" ( "" , " -read " + str(value) )[ value is not None ] 10 qual_file Use quality data? (-noqual) Boolean 0 ($value)? "" : " -noqual" ( " -noqual" , "" )[ value ] 10 repeats_file Repeats file PrimoRepeats AbstractText (defined $value)? "ln -s $value human.rep; " : "" ( "" , "ln -s " +str(value)+ " human.rep; " )[ value is not None ] -10 oligo_file Oligo file PrimoOligo AbstractText (defined $value )? "ln -s $value oligo.screen; " : "" ( "" , "ln -s " +str(value)+ " oligo.screen; " )[ value is not None ] -10 rf String defined $regions_file regions_file is not None "ln -s $regions_file $input_file_name.regions && " "ln -s "+str(regions_file) + " " + str(input_file_name) + ".regions && " -10 qf String $qual_file qual_file "ln -s $qual_file $input_file_name.qual; " "ln -s "+str(qual_file) + " " + str(input_file_name) + ".qual && " -10 results_files Output files Text "*.log" "*.primers" "oligo.cri" "*.log" "*.primers" "oligo.cri" Programs-5.1.1/consense.xml0000644000175000001560000002242211724156742014524 0ustar bneronsis consense consense Consensus tree program http://bioweb2.pasteur.fr/docs/phylip/doc/consense.html CONSENSE reads a file of computer-readable trees and prints out (and may also write out onto a file) a consensus tree. phylogeny:tree_analyser consense String "consense <consense.params" "consense <consense.params" 0 infile Series of trees in file (intree) Tree NEWICK "ln -s $infile intree && " "ln -s "+ str( infile )+ " intree &&" -10 Input is a tree file which contains a series of trees in the Newick standard form (A,(B,(H,(D,(J,(((G,E),(F,I)),C)))))); (A,(B,(D,((J,H),(((G,E),(F,I)),C))))); (A,(B,(D,(H,(J,(((G,E),(F,I)),C)))))); (A,(B,(E,(G,((F,I),((J,(H,D)),C)))))); (A,(B,(E,(G,((F,I),(((J,H),D),C)))))); (A,(B,(E,((F,I),(G,((J,(H,D)),C)))))); (A,(B,(E,((F,I),(G,(((J,H),D),C)))))); (A,(B,(E,((G,(F,I)),((J,(H,D)),C))))); (A,(B,(E,((G,(F,I)),(((J,H),D),C))))); type Consensus type Choice MRE MRE "" "" S "C\\n" "C\n" MR "C\\nC\\n" "C\nC\n" ML "C\\nC\\nC\\n" "C\nC\nC\n" consense.params output Output options print_tree Print out tree (3) Boolean 1 ($value) ? "" : "3\\n" ( "3\n" , "" )[ value ] 1 Tells the program to print a semi-graphical picture of the tree in the outfile. consense.params print_treefile Write out trees onto tree file (4) Boolean 1 ($value) ? "" : "4\\n" ( "4\n" , "" )[ value ] 1 Tells the program to save the tree in a treefile (a standard representation of trees where the tree is specified by a nested pairs of parentheses, enclosing names and separated by commas). consense.params printdata Print out the data at start of run (1) Boolean 0 ($value) ? "1\\n" : "" ( "" , "1\n" )[ value ] 1 consense.params other_options Other options outgroup Outgroup species (O) Integer 1 (defined $value and $value != $vdef) ? "O\\n$value\\n" : "" ( "" , "O\n"+ str( value ) +"\n" )[ value is not None and value != vdef ] Please enter a value greater than 0 $value > 0 value > 0 1 consense.params rooted Trees to be treated as rooted (R) Boolean 0 ($value) ? "R\\n" : "" ( "" , "R\n" )[ value ] 1 consense.params outfile Consense output file Text " && mv outfile consense.outfile" " && mv outfile consense.outfile " 40 "consense.outfile" "consense.outfile" treefile Consense tree file Tree NEWICK $print_treefile print_treefile " && mv outtree consense.outtree" " && mv outtree consense.outtree" 50 "consense.outtree" "consense.outtree" confirm String "Y\\n" "Y\n" 1000 consense.params terminal_type String "T\\n" "T\n" -1 consense.params Programs-5.1.1/morePhyML.xml0000644000175000001560000005364212126514513014562 0ustar bneronsis morePhyML 1.0 morePhyML Improving ML tree searching with PhyML 3 Alexis Criscuolo Criscuolo A (2011) morePhyML: improving the phylogenetic tree space exploration with PhyML 3. Molecular Phylogenetics and Evolution. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307-321. Guindon, S. and Gascuel, O. (2003) A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood Syst. Biol., 52, 696-704 ftp://ftp.pasteur.fr/pub/gensoft/projects/morePhyML/ http://bioweb2.pasteur.fr/docs/morePhyML/morePhyML_doc.pdf phylogeny:likelihood morePhyML alignment Sequence Alignment Alignment PHYLIP-RELAXED "-i $value" " -i "+str(value) seqtype Data type (-d) Choice nt nt aa (defined $value) ? " -d $value" : "" ( "" , " -d " + str(value) )[ value is not None ] control_opt Control Options ntmodel Nucleotide substitution model (-m) Choice $seqtype eq "nt" seqtype == "nt" GTR HKY85 JC69 K80 F81 F84 TN93 GTR TN93e TPM1e TPM1u TPM2e TPM2u TPM3e TPM3u TIM1e TIM1u TIM2e TIM2u TIM3e TIM3u TVMe TVMu SYM ($value ne $vdef) ? " -m $value" : "" ("", " -m "+str(value))[value != vdef] aamodel Amino-acid substitution model (-m) Choice $seqtype eq "aa" seqtype == "aa" LG LG WAG JTT MtREV Dayhoff DCMut RtREV CpREV VT Blosum62 MtMam MtArt HIVw HIVb ($value ne $vdef) ? " -m $value" : "" ("", " -m "+str(value))[value != vdef] tstvratio1 Estimated transition/transversion ratio for DNA sequences? (-t) Boolean $seqtype eq "nt" seqtype == "nt" 0 ($value) ? " -t e" : "" ( "" , " -t e")[ value ] tstvratio2 User transition/transversion ratio for DNA sequences? (-t) Float $seqtype eq "nt" and not $tstvratio1 seqtype == "nt" and not tstvratio1 (defined $value ) ? " -t $value" : "" ( "" , " -t "+str(value))[ value is not None ] propinvar1 Estimated proportion of invariable sites? (-v) Boolean 0 ($value) ? " -v e" : "" ( "" , " -v e")[ value ] propinvar2 User proportion of invariable sites? (-v) Float not $propinvar1 not propinvar1 (defined $value) ? " -v $value" : "" ( "" , " -v "+str(value))[value is not None] Value must be >= 0 and < 1 $value >= 0 and $value < 1 value >= 0 and value < 1 nbsubstcat Number of relative substitution rate categories (-c) Integer 1 (defined $value and $value != $vdef) ? " -c $value" : "" ("", " -c "+str(value))[value is not None and value != vdef] gamma1 Estimated Gamma distribution parameter? (-a) Boolean 0 ($value) ? " -a e" : "" ("", " -a e")[value] gamma2 User gamma distribution parameter? (-a) Float not $gamma1 not gamma1 (defined $value) ? " -a $value" : "" ("", " -a "+str(value))[value is not None] frequencies Equilibrium character frequencies (-f) Choice m m e (defined $value) ? " -f $value" : "" ("", " -f "+str(value))[value is not None] usertreefile Starting tree filename (u) Tree NEWICK (defined $value) ? " -u $value" : "" ("", " -u "+str(value))[value is not None] parsimoniousTree Most parsimonious starting tree (-p) Boolean 0 (defined $value) ? " -p" : "" ("", " -p")[value] randomNumber Number of random starting trees to be used (-n) Integer (defined $value) ? " -n $value" : "" ("", " -n " + str(value))[value is not None] Value must be > 0 $value > 0 value > 0 tree_swapping First tree swapping (-s) Choice SPR SPR NNI both (defined $value and $value ne $vdef) ? " -s $value" : "" ("", " -s "+str(value))[value is not None and value !=vdef] branch_support Branch support (-b) Choice null null -2 0 -1 (defined $value and $value ne $vdef) ? " -b $value" : "" ("", " -b "+str(value))[value is not None and value !=vdef] likelihoodCar Write the likelihood for each character (-l) Boolean 0 (defined $value) ? " -l" : "" ("", " -l")[value] firstRun Write the results outputed by the first run of phyml (-x) Boolean 0 (defined $value) ? " -x" : "" ("", " -x")[value] outfile Output file Text "*_morephyml_stats.txt" "*_morephyml_stats.txt" outtree Output tree Tree NEWICK "*_morephyml_tree.txt" "*_morephyml_tree.txt" Programs-5.1.1/descseq.xml0000644000175000001560000001616212072525233014332 0ustar bneronsis descseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net descseq Alter the name or description of a sequence. http://bioweb2.pasteur.fr/docs/EMBOSS/descseq.html http://emboss.sourceforge.net/docs/themes sequence:edit descseq e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_additional Additional section e_name Name of the sequence String ("", " -name=" + str(value))[value is not None] 2 e_description Description of the sequence String ("", " -description=" + str(value))[value is not None] 3 e_advanced Advanced section e_append Append to the existing description Boolean 0 ("", " -append")[ bool(value) ] 4 This allows you to append the name or description you have given on to the end of the existing name or description of the sequence. e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename descseq.e_outseq ("" , " -outseq=" + str(value))[value is not None] 5 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 6 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/cai.xml0000644000175000001560000010310112072525233013425 0ustar bneronsis cai EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net cai Calculate codon adaptation index http://bioweb2.pasteur.fr/docs/EMBOSS/cai.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:codon_usage cai e_input Input section e_seqall seqall option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -seqall=" + str(value))[value is not None] 1 e_cfile cfile option Choice Eyeast_cai.cut Eacc.cut Eacica.cut Eadenovirus5.cut Eadenovirus7.cut Eagrtu.cut Eaidlav.cut Eanasp.cut Eani.cut Eani_h.cut Eanidmit.cut Earath.cut Easn.cut Eath.cut Eatu.cut Eavi.cut Eazovi.cut Ebacme.cut Ebacst.cut Ebacsu.cut Ebacsu_high.cut Ebja.cut Ebly.cut Ebme.cut Ebmo.cut Ebna.cut Ebommo.cut Ebov.cut Ebovin.cut Ebovsp.cut Ebpphx.cut Ebraja.cut Ebrana.cut Ebrare.cut Ebst.cut Ebsu.cut Ebsu_h.cut Ecac.cut Ecaeel.cut Ecal.cut Ecanal.cut Ecanfa.cut Ecaucr.cut Eccr.cut Ecel.cut Echi.cut Echick.cut Echicken.cut Echisp.cut Echk.cut Echlre.cut Echltr.cut Echmp.cut Echnt.cut Echos.cut Echzm.cut Echzmrubp.cut Ecloab.cut Ecpx.cut Ecre.cut Ecrigr.cut Ecrisp.cut Ectr.cut Ecyapa.cut Edayhoff.cut Eddi.cut Eddi_h.cut Edicdi.cut Edicdi_high.cut Edog.cut Edro.cut Edro_h.cut Edrome.cut Edrome_high.cut Edrosophila.cut Eeca.cut Eeco.cut Eeco_h.cut Eecoli.cut Eecoli_high.cut Eemeni.cut Eemeni_high.cut Eemeni_mit.cut Eerwct.cut Ef1.cut Efish.cut Efmdvpolyp.cut Ehaein.cut Ehalma.cut Ehalsa.cut Eham.cut Ehha.cut Ehin.cut Ehma.cut Ehorvu.cut Ehum.cut Ehuman.cut Ekla.cut Eklepn.cut Eklula.cut Ekpn.cut Elacdl.cut Ella.cut Elyces.cut Emac.cut Emacfa.cut Emaize.cut Emaize_chl.cut Emam_h.cut Emammal_high.cut Emanse.cut Emarpo_chl.cut Emedsa.cut Emetth.cut Emixlg.cut Emouse.cut Emsa.cut Emse.cut Emta.cut Emtu.cut Emus.cut Emussp.cut Emva.cut Emyctu.cut Emze.cut Emzecp.cut Encr.cut Eneigo.cut Eneu.cut Eneucr.cut Engo.cut Eoncmy.cut Eoncsp.cut Eorysa.cut Eorysa_chl.cut Epae.cut Epea.cut Epet.cut Epethy.cut Epfa.cut Ephavu.cut Ephix174.cut Ephv.cut Ephy.cut Epig.cut Eplafa.cut Epolyomaa2.cut Epombe.cut Epombecai.cut Epot.cut Eppu.cut Eprovu.cut Epse.cut Epseae.cut Epsepu.cut Epsesm.cut Epsy.cut Epvu.cut Erab.cut Erabbit.cut Erabit.cut Erabsp.cut Erat.cut Eratsp.cut Erca.cut Erhile.cut Erhime.cut Erhm.cut Erhoca.cut Erhosh.cut Eric.cut Erle.cut Erme.cut Ersp.cut Esalsa.cut Esalsp.cut Esalty.cut Esau.cut Eschma.cut Eschpo.cut Eschpo_cai.cut Eschpo_high.cut Esco.cut Eserma.cut Esgi.cut Esheep.cut Eshp.cut Eshpsp.cut Esli.cut Eslm.cut Esma.cut Esmi.cut Esmu.cut Esoltu.cut Esoy.cut Esoybn.cut Espi.cut Espiol.cut Espn.cut Espo.cut Espo_h.cut Espu.cut Esta.cut Estaau.cut Estrco.cut Estrmu.cut Estrpn.cut Estrpu.cut Esty.cut Esus.cut Esv40.cut Esyhsp.cut Esynco.cut Esyncy.cut Esynsp.cut Etbr.cut Etcr.cut Eter.cut Etetsp.cut Etetth.cut Etheth.cut Etob.cut Etobac.cut Etobac_chl.cut Etobcp.cut Etom.cut Etrb.cut Etrybr.cut Etrycr.cut Evco.cut Evibch.cut Ewheat.cut Ewht.cut Exel.cut Exenla.cut Exenopus.cut Eyeast.cut Eyeast_cai.cut Eyeast_high.cut Eyeast_mit.cut Eyeastcai.cut Eyen.cut Eyeren.cut Eyerpe.cut Eysc.cut Eysc_h.cut Eyscmt.cut Eysp.cut Ezebrafish.cut Ezma.cut ("", " -cfile=" + str(value))[value is not None and value!=vdef] 2 e_output Output section e_outfile Name of the output file (e_outfile) Filename cai.e_outfile ("" , " -outfile=" + str(value))[value is not None] 3 e_outfile_out outfile_out option CaiReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 4 Programs-5.1.1/silent.xml0000644000175000001560000002147712072525233014206 0ustar bneronsis silent EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net silent Find restriction sites to insert (mutate) with no translation change http://bioweb2.pasteur.fr/docs/EMBOSS/silent.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:restriction silent e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_enzymes Comma separated enzyme list String all ("", " -enzymes=" + str(value))[value is not None and value!=vdef] 2 e_output Output section e_sshow Display untranslated sequence Boolean 0 ("", " -sshow")[ bool(value) ] 3 e_tshow Display translated sequence Boolean 0 ("", " -tshow")[ bool(value) ] 4 e_allmut Display all mutations Boolean 0 ("", " -allmut")[ bool(value) ] 5 e_outfile Name of the report file Filename silent.report ("" , " -outfile=" + str(value))[value is not None] 6 e_rformat_outfile Choose the report output format Choice TABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 7 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 8 Programs-5.1.1/nw_rename.xml0000644000175000001560000000606211767601016014660 0ustar bneronsis nw_rename 1.6 newick ID mapper helps out with the 10-character limit of the PHYLIP-PHYML formats GNU
Due to an incompatibility between the PHYLIP format and phyml and morePhyml named rules, the using of long identifier in phyml or morePhyml failed. We proposed the following workaround:
  1. your alignment must be in FASTA format, if it is in other format use squizz_convert to reformat it
  2. use fastaRename to generate an alignment with short ID and a file of ID mapping
  3. perform your analysis with the alignment with short id
  4. replace the short IDs in your tree (in NEWICK format) with nw_rename and the file of IDs mapping generated at the step 1.
phylogeny:others nw_rename input_newick Tree file (in NEWICK format) Tree NEWICK " $value" " " + str( value ) 1 Put here youre treefile with the short ID in NEWICK format input_map Labels map ID_Mapping AbstractText " $value" " " + str( value ) 2 The file of mapping between long ID and short ID. This file can be generated with fastaRename service. output_tree Newick tree Tree NEWICK "nw_rename.out" "nw_rename.out"
Programs-5.1.1/puzzle.xml0000644000175000001560000017074511767572177014267 0ustar bneronsis puzzle 5.2 Tree-Puzzle Maximum likelihood analysis for nucleotide, amino-acid and two state data Heiko A. Schmidt, Korbinian Strimmer and Arndt von Haeseler Strimmer, K., and A. von Haeseler. 1996. Quartet puzzling:A quartet maximum likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13: 964-969. http://bioweb2.pasteur.fr/docs/puzzle/tree-puzzle.pdf http://www.tree-puzzle.de/ http://www.tree-puzzle.de/ phylogeny:likelihood puzzle stdinput String " < puzzle.params" " < puzzle.params" 2 confirm String "y\\n" "y\n" 1000 puzzle.params infile Alignment File Alignment PHYLIPI " $infile" " " + str(value) 1 general_options General options 1 analysis_type Type of analysis? (b) Choice 1 1 "" "" 2 "b\\n" "b\n" Allows to switch between tree reconstruction/analysis by maximum likelihood and likelihood mapping. puzzle.params both_options Both Tree reconstruction and Likelihood mapping options parameter_estimates Parameter estimates? (e) Choice 1 1 "" "" 2 "e\\n" "e\n" Determines whether an approximate or the exact likelihood function is used to estimate parameters of the models of sequence evolution. The approximate likelihood function is in most cases sufficient and is faster. puzzle.params parameter_estimation Parameter estimation uses? (x) Choice 1 1 "" "" 2 "x\\n" "x\n" Selects the methods used in the estimation of the model parameters. Neighbor-joining tree means that a NJ tree is used to estimate the parameters. Quartet sampling means that a number of random sets of four sequences are selected to estimate parameters. puzzle.params likelihood_options Likelihood mapping options $analysis_type eq "2" analysis_type == "2" quartet Number of quartets? (n) Integer 10000 (defined $value and $value != $vdef)? "n\\n$value\\n" : "" ( "" , "n\n" + str(value) + "\n" )[ value is not None and value != vdef] If tree reconstruction is selected: number of puzzling steps. Parameter of the quartet puzzling tree search. Generally, the more sequences are used the more puzzling steps are advised. The default value varies depending on the number of sequences (at least 1000).If likelihood mapping is selected: number of qua rtets in a likelihood mapping analysis. Equal to the number of dots in the likelihood mapping diagram. By default 10000 dots/quartets are assumed. To use all possible quartets in clustered likelihood mapping you have to specify a value of n=0. puzzle.params reconstruction_options Tree reconstruction options $analysis_type eq "1" analysis_type == "1" tree_search Tree search procedure? (k) Choice 1 1 "" "" 2 "k\\n" "k\n" 3 "k\\nk\\n" "k\nk\n" 4 "k\\nk\\nk\\n" "k\nk\nk\n" Determines how the overall tree is obtained. The topology is either computed with the quartet puzzling algorithm or a set of trees is provided by the user. If there are more than two trees in such a set, maximum likelihood branch lengths will be computed for this tree and a number of tests (KH-test, SH-test, and ELW) will be performed on the trees by default. Instead of the evaluation a consensus can be computed for all the trees for which ML branch lengths and ML value are estimated. Alternatively, a maximum likelihood distance matrix only can also be computed (no overall tree). puzzle.params tree_file User Tree file Tree NEWICK $tree_search eq "3" or $tree_search eq "2" tree_search == "3" or tree_search == "2" (defined $value)? "$tree_file\\n" : "" ( "" , str(value) +"\n" )[ value is not None ] 2000 puzzle.params clocklike Compute clocklike branch lengths? (z) Boolean $tree_search ne "4" tree_search != "4" 0 ($value)? "z\\n" : "" ( "" , "z\n" )[ value ] Computation of clock-like maximum likelihood branch lengths. This option also invokes the likelihood ratio clock test. puzzle.params invalid Enter an invalid branch number to search for the best location despite of automatic search (l) Integer $tree_search ne "4" and $clocklike tree_search != "4" and clocklike (defined $value)? "l\\n$value\\n" : "" ( "" , "l\n" + str(value) + "\n" )[ value is not None] Location of root. Only for computation of clock-like maximum likelihood branch lengths. Allows to specify the branch where the root should be placed in an unrooted tree topology. For example, in the tree (a,b,(c,d)) l = 1 places the root at the branch leading to sequence a whereas l=5 places the root at the internal branch. puzzle.params quartet_options Quartet puzzling options approximate Approximate quartet likelihood (v) Boolean 1 ($value)? "" : "v\\n" ( "v\n" , "" )[ value ] For the quartet puzzling tree search only. Only for very small data sets it is necessary to compute an exact maximum likelihood. For larger data sets this option should always be turned on. puzzle.params unresolved List unresolved quartets? (u) Boolean 0 ($value)? "u\\n" : "" ( "" , "u\n" )[ value ] Show unresolved quartets. During the quartet puzzling tree search TREE-PUZZLE counts the number of unresolved quartet trees. An unresolved quartet is a quartet where the maximum likelihood values for each of the three possible quartet topologies are so similar that it is not possible to prefer one of them (Strimmer et al., 1997). If this option is selected you will get a detailed list of all star-like quartets. Note, for some data sets there may be a lot of unresolved quartets. In this case a list of all unresolved quartets is probably not very useful and also needs a lot of disk space. puzzle.params puzzling_step Number of puzzling steps (n) Integer 1000 (defined $value and $value != $vdef)? "n\\n$value\\n" : "" ( "" , "n\n" + str(value) + "\n" )[ value is not None and value != vdef] If tree reconstruction is selected: number of puzzling steps. Parameter of the quartet puzzling tree search. Generally, the more sequences are used the more puzzling steps are advised. The default value varies depending on the number of sequences (at least 1000).If likelihood mapping is selected: number of quartets in a likelihood mapping analysis. Equal to the number of dots in the likelihood mapping diagram. By default 10000 dots/quartets are assumed. To use all possible quartets in clustered likelihood mapping you have to specify a value of n=0. puzzle.params list_puzzling List puzzling step trees? (j) Choice 1 1 "" "" 2 "j\\n" "j\n" 3 "j\\nj\\n" "j\nj\n" 4 "j\\nj\\nj\\n" "j\nj\nj\n" Writes all intermediate trees (puzzling step trees) used to compute the quartet puzzling tree into a file, either as a list of topologies ordered by number of occurrences (*.ptorder), or as list about the chronological occurrence of the topologies (*.pstep), or both. puzzle.params output Display as outgroup? (o) Integer 1 (defined $value and $value != $vdef)? "o\\n$value\\n" : "" ( "" , "o\n" + str(value) + "\n" )[ value is not None and value != vdef] For displaying purposes of the unrooted quartet puzzling tree only. The default outgroup is the first sequence of the data set. puzzle.params substitution_options Substitution process options 1 seqtype Type of sequence input data? (d) Choice 1 1 "d\\n" "d\n" 2 "d\\nd\\n" "d\nd\n" -10 Specifies whether nucleotide, amino acid sequences, or twostate data serve as input. The default is automatically set by inspection of the input data. After TREE-PUZZLE has selected an appropriate data type (marked by ?Auto:?) the ?d?-option changes the type in the following order: automatically selected type ! Nucleotides ! Amino acids ! automatically selected type. puzzle.params protein_options Amino acids options $seqtype == "2" seqtype == "2" protein_model Model of substitution for protein (m) Choice 1 1 "m\\n" "m\n" 2 "m\\nm\\n" "m\nm\n" 3 "m\\nm\\nm\\n" "m\nm\nm\n" 4 "m\\nm\\nm\\nm\\n" "m\nm\nm\nm\n" 5 "m\\nm\\nm\\nm\\nm\\n" "m\nm\nm\nm\nm\n" 6 "m\\nm\\nm\\nm\\nm\\nm\\n" "m\nm\nm\nm\nm\nm\n" For amino acid sequence data the Dayhoff et al. (Dayhoff) model, the Jones et al. (JTT) model, the Adachi and Hasegawa (mtREV24) model, the Henikoff and Henikoff (BLOSUM 62), the Muller and Vingron (VT), and theWhelan and Goldman (WAG) substitution model are implemented in TREE-PUZZLE. The mtREV24 model describes the evolution of amino acids encoded on mtDNA, and BLOSUM 62 is for distantly related amino acid sequences, as well as the VT model. puzzle.params prot_freq Use specified Amino acid frequencies (in %) (f) Boolean 0 ($value)? "f\\n" : "" ( "" , "f\n" )[ value ] 50 The maximum likelihood calculation needs the frequency of each nucleotide (amino acid, doublet) as input. TREE-PUZZLE estimates these values from the sequence input data. This option allows specification of other values. puzzle.params specified_prot_freq Values of specified amino acid frequencies (in %) $prot_freq prot_freq a_freq pi (A) Float "$value\\n" str(value) + "\n" 51 puzzle.params r_freq pi (R) Float "$value\\n" str(value) + "\n" 51 puzzle.params n_freq pi (N) Float "$value\\n" str(value) + "\n" 51 puzzle.params d_freq pi (D) Float "$value\\n" str(value) + "\n" 51 puzzle.params c_freq pi (C) Float "$value\\n" str(value) + "\n" 51 puzzle.params q_freq pi (Q) Float "$value\\n" str(value) + "\n" 51 puzzle.params e_freq pi (E) Float "$value\\n" str(value) + "\n" 51 puzzle.params g_freq pi (G) Float "$value\\n" str(value) + "\n" 51 puzzle.params h_freq pi (H) Float "$value\\n" str(value) + "\n" 51 puzzle.params i_freq pi (I) Float "$value\\n" str(value) + "\n" 51 puzzle.params l_freq pi (L) Float "$value\\n" str(value) + "\n" 51 puzzle.params k_freq pi (K) Float "$value\\n" str(value) + "\n" 51 puzzle.params m_freq pi (M) Float "$value\\n" str(value) + "\n" 51 puzzle.params f_freq pi (F) Float "$value\\n" str(value) + "\n" 51 puzzle.params p_freq pi (P) Float "$value\\n" str(value) + "\n" 51 puzzle.params s_freq pi (S) Float "$value\\n" str(value) + "\n" 51 puzzle.params t_freq pi (T) Float "$value\\n" str(value) + "\n" 51 puzzle.params w_freq pi (W) Float "$value\\n" str(value) + "\n" 51 puzzle.params y_freq pi (Y) Float "$value\\n" str(value) + "\n" 51 puzzle.params dna_options DNA options $seqtype eq "1" seqtype == "1" 1 dna_model Model of substitution for DNA (m) Choice 1 1 "" "" 2 "m\\n" "m\n" 3 "m\\nm\\n" "m\nm\n" 4 "m\\nm\\nm\\n" "m\nm\nm\n" 10 The following models are implemented for nucleotides: the general time reversible model (Tavaree, 1986, GTR, e.g.,) model, the Tamura and Nei (TN) model, the Hasegawa et al. (HKY) model, and the Schroniger and von Haeseler (SH) model. The SH model describes the evolution of pairs of dependent nucleotides (pairs are the first and the second nucleotide, the third and the fourth nucleotide and so on). It allows for specification of the transition-transversion ratio. The original model (Schroniger and von Haeseler, 1994) is obtained by setting the transition-transversion parameter to 0.5. The Jukes and Cantor (1969), the Felsenstein (1981), and the Kimura (1980) model are all special cases of the HKY model. puzzle.params GTR_options GTR model rates $dna_model eq "4" dna_model == "4" 11 GTR_acrate A-C rate (1) Float 1.00 (defined $value and $value != $vdef) ? "1\\n$value\\n" : "" ( "" , "1\n"+ str(value) + "\n" )[ value is not None and value != vdef] 11 puzzle.params GTR_agrate A-G rate (2) Float 1.00 (defined $value and $value != $vdef) ? "2\\n$value\\n" : "" ( "" , "2\n"+ str(value) + "\n" )[ value is not None and value != vdef] puzzle.params GTR_atrate A-T rate (3) Float 1.00 (defined $value and $value != $vdef) ? "3\\n$value\\n" : "" ( "" , "3\n"+ str(value) + "\n" )[ value is not None and value != vdef] puzzle.params GTR_cgrate C-G rate (4) Float 1.00 (defined $value and $value != $vdef) ? "4\\n$value\\n" : "" ( "" , "4\n"+ str(value) + "\n" )[ value is not None and value != vdef] puzzle.params GTR_ctrate C-T rate (5) Float 1.00 (defined $value and $value != $vdef) ? "5\\n$value\\n" : "" ( "" , "5\n" + str(value) +"\n" )[ value is not None and value != vdef] puzzle.params GTR_gtrate G-T rate (6) Float 1.00 (defined $value and $value != $vdef) ? "6\\n$value\\n" : "" ( "" , "6\n" + str(value) +"\n" )[ value is not None and value != vdef] puzzle.params SH_options SH model rates $dna_model eq "4" dna_model == "4" 11 symmetrize_frequencies Symmetrize doublet frequencies (s) Boolean 0 ($value)? "s\\n" : "" ( "" , "s\n" )[ value ] 11 This option is only available for the SH model. With this option the doublet frequencies are symmetrized. For example, the frequencies of ?AT? and ?TA? are then set to the average of both frequencies. puzzle.params TN_options TN model options $dna_model eq "2" dna_model == "2" 11 constrain_TN Constrain TN model to F84 model (p) Boolean 0 ($value)? "p\\n" : "" ( "" , "p\n" )[ value ] 11 This option is only available for the Tamura-Nei model. With this option the expected (!) transition-transversion ratio for the F84 model have to be entered and TREE-PUZZLE computes the corresponding parameters of the TN model (this depends on base frequencies of the data). This allows to compare the results of TREE-PUZZLE and the PHYLIP maximum likelihood programs which use the F84 model. puzzle.params f84_ratio Expected F84 Transition/transversion ratio Float $constrain_TN constrain_TN (defined $value)? "$value\\n" : "" ( "" , str(value) + "\n" )[ value is not None] 12 puzzle.params y_r Y/R transition parameter (r) Float (defined $value)? "r\\n$value\\n" : "" ( "" , "r\n" + str(value) +"\n" )[ value is not None ] 13 This option is only available for the TN model. This parameter is the ratio of the rates for pyrimidine transitions and purine transitions. You do not need to specify this parameter as TREE-PUZZLE estimates it from the data. For precise definition please read the section in this manual about models of sequence evolution. puzzle.params ratio Transition/transversion ratio (t) Float $dna_model ne "3" dna_model != "3" (defined $value) ? "t\\n$value\\n" : "" ( "" , "t\n" + str(value) + "\n" )[ value is not None ] For nucleotide data only. You do not need to specify this parameter as TREE-PUZZLE estimates it from the data. The precise definition of this parameter is given in the section on models of sequence evolution in this manual. puzzle.params nuc_freq Base frequencies (in %) (f) use_specified_nuc Use specified values? Boolean 0 ($value)? "f\\n" : "" ( "" , "f\n" )[ value ] 50 The maximum likelihood calculation needs the frequency of each nucleotide (amino acid, doublet) as input. TREE-PUZZLE estimates these values from the sequence input data. This option allows specification of other values. puzzle.params a_nuc_freq pi (A) Float $use_specified_nuc use_specified_nuc "$value\\n" str(value) + "\n" 51 puzzle.params c_nuc_freq pi (C) Float $use_specified_nuc use_specified_nuc "$value\\n" str(value) + "\n" 52 puzzle.params g_nuc_freq pi (G) Float $use_specified_nuc use_specified_nuc "$value\\n" str(value) + "\n" 53 puzzle.params rate_options Rate heterogeneity options 1 rate_heterogeneity Model of rate heterogeneity (w) Choice 1 1 "" "" 2 "w\\n" "w\n" 3 "w\\nw\\n" "w\nw\n" 4 "w\\nw\\nw\\n" "w\nw\nw\n" 20 TREE-PUZZLE provides several different models of rate heterogeneity: uniform rate over all sites (rate homogeneity), Gamma distributed rates, two rates (1 invariable + 1 variable), and a mixed model (1 invariable rate + Gamma distributed rates). All necessary parameters can be estimated by TREE-PUZZLE. Note that whenever invariable sites are taken into account the parameter estimation will invoke the ?e? option to use an exact likelihood function. For more detailed information please read the section in this manual about models of sequence evolution. See also option ?m? (model of substitution). puzzle.params alpha Gamma rate heterogeneity parameter alpha (a) Float $rate_heterogeneity eq "2" or $rate_heterogeneity eq "4" rate_heterogeneity == "2" or rate_heterogeneity == "4" (defined $value)? "a\\n$value\\n" : "" ( "" , "a\n" + str(value) +"\n" )[ value is not None ] 21 This is the so-called shape parameter of the Gamma distribution. puzzle.params gamma_number Number of Gamma rate categories (c) Integer $rate_heterogeneity eq "2" or $rate_heterogeneity eq "4" rate_heterogeneity == "2" or rate_heterogeneity == "4" 8 (defined $value and $value != $vdef)? "c\\n$value\\n" : "" ( "" , "c\n" + str(value) +"\n" )[ value is not None and value != vdef] 21 Number of rate categories (4-16) for the discrete Gamma distribution (rate heterogeneity). puzzle.params invariable Fraction of invariable sites (i) Float $rate_heterogeneity eq "3" or $rate_heterogeneity eq "4" rate_heterogeneity == "3" or rate_heterogeneity == "4" (defined $value)? "i\\n$value\\n" : "" ( "" , "i\n" + str(value) +"\n" )[ value is not None ] 21 Probability of a site to be invariable. This parameter can be estimated from the data by TREE-PUZZLE (only if the approximation option for the likelihood function is turned off). puzzle.params outfile Output file Text "$infile.puzzle" str(infile) + ".puzzle" outtree Output tree Tree NEWICK "$infile.tree" str(infile) + ".tree" outdist Output distance Text "$infile.dist" str(infile) + ".dist" Programs-5.1.1/tcode.xml0000644000175000001560000003607312072525233014004 0ustar bneronsis tcode EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net tcode Identify protein-coding regions using Fickett TESTCODE statistic http://bioweb2.pasteur.fr/docs/EMBOSS/tcode.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:gene_finding tcode e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_datafile Testcode data file TestcodeData AbstractText ("", " -datafile=" + str(value))[value is not None ] 2 The default data file is Etcode.dat and contains coding probabilities for each base. The probabilities are for both positional and compositional information. e_required Required section e_window Length of sliding window (value greater than or equal to 200) Integer 200 ("", " -window=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 200 is required value >= 200 3 This is the number of nucleotide bases over which the TESTCODE statistic will be performed each time. The window will then slide along the sequence, covering the same number of bases each time. e_advanced Advanced section e_step Stepping increment for the window (value greater than or equal to 1) Integer 3 ("", " -step=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 4 The selected window will, by default, slide along the nucleotide sequence by three bases at a time, retaining the frame (although the algorithm is not frame sensitive). This may be altered to increase or decrease the increment of the slide. e_output Output section e_plot Graphical display Boolean 0 ("", " -plot")[ bool(value) ] 5 On selection a graph of the sequence (X axis) plotted against the coding score (Y axis) will be displayed. Sequence above the green line is coding, that below the red line is non-coding. e_outfile Name of the report file Filename not e_plot tcode.report ("" , " -outfile=" + str(value))[value is not None] 6 e_rformat_outfile Choose the report output format Choice not e_plot TABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 7 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile e_graph Choose the e_graph output format Choice e_plot png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 8 xy_goutfile Name of the output graph Filename e_plot tcode_xygraph ("" , " -goutfile=" + str(value))[value is not None] 9 xy_outgraph_png Graph file Picture Binary e_plot and e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_plot and e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_plot and e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_plot and e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_plot and e_graph == "data" "*.dat" auto Turn off any prompting String " -auto -stdout" 10 Programs-5.1.1/dnapars.xml0000644000175000001560000010726611745213176014347 0ustar bneronsis dnapars dnapars DNA Parsimony Program http://bioweb2.pasteur.fr/docs/phylip/doc/dnapars.html This program carries out unrooted parsimony (analogous to Wagner trees) (Eck and Dayhoff, 1966; Kluge and Farris, 1969) on DNA sequences. The method of Fitch (1971) is used to count the number of changes of base needed on a given tree. phylogeny:parsimony dnapars String "dnapars <dnapars.params" "dnapars <dnapars.params" 0 infile Alignment File (infile) DNA Alignment PHYLIPI $infile ne "infile" infile != "infile" "ln -s $infile infile && " "ln -s " + str( infile ) + " infile && " the name of this data can't be "infile","outfile","outtree","intree" value not in ( "outfile" , "infile" , "outtree" , "intree") $value ne "outfile" and $value ne "infile" and $value ne "outtree" and $value ne "intree" -10 The input file must contained aligned sequences in PHYLIP format obtained by sequence alignment programs. 5 13 Alpha AACGTGGCCACAT Beta AAGGTCGCCACAC Gamma CAGTTCGCCACAA Delta GAGATTTCCGCCT Epsilon GAGATCTCCGCCC dnapars_opt Parsimony options use_threshold Use Threshold parsimony (T) Boolean 0 ($value) ? "T\\n$threshold\\n" : "" ( "" , "T\n" + str( threshold ) + "\n" )[ value ] 2 dnapars.params threshold Value for threshold parsimony Integer $use_threshold use_threshold You must enter a numeric value, greater than 1 $threshold > 1 threshold > 1 3 Thresholds less than or equal to 1.0 do not have any meaning and should not be used: they will result in a tree dependent only on the input order of species and not at all on the data dnapars.params use_transversion Use Transversion parsimony (N) Boolean 0 ($value) ? "N\\n" : "" ( "" , "N\n" )[ value ] 5 dnapars.params bootstrap Bootstrap options ( multiple dataset ) seqboot Perform a bootstrap before analysis Boolean 0 ($value) ? "seqboot <seqboot.params && mv outfile seqboot.outfile && rm infile && ln -s seqboot.outfile infile && " : "" ( "" , "seqboot <seqboot.params && mv outfile seqboot.outfile && rm infile && ln -s seqboot.outfile infile && " )[ value ] you can't use "Randomize options" and "Bootstrap options" at the same time not( $seqboot and $jumble) not( seqboot and jumble) -5 By selecting this option, the bootstrap will be performed on your sequence file. So you don't need to perform a separated seqboot before. Don't give an already bootstrapped file to the program, this won't work! You can't use "Randomize options" and "Bootstrap options" at the same time. Method Resampling methods (J) Choice $seqboot seqboot bootstrap bootstrap "" "" jackknife "J\\n" "" permute_species "J\\nJ\\n" "J\nJ\n" permute_char "J\\nJ\\nJ\\n" "J\nJ\nJ\n" permute_within_species "J\\nJ\\nJ\\nJ\\n" "J\nJ\nJ\nJ\n" 1 1. The bootstrap. Bootstrapping was invented by Bradley Efron in 1979, and its use in phylogeny estimation was introduced by me (Felsenstein, 1985b). It involves creating a new data set by sampling N characters randomly with replacement, so that the resulting data set has the same size as the original, but some characters have been left out and others are duplicated. The random variation of the results from analyzing these bootstrapped data sets can be shown statistically to be typical of the variation that you would get from collecting new data sets. The method assumes that the characters evolve independently, an assumption that may not be realistic for many kinds of data. 2. Delete-half-jackknifing. This alternative to the bootstrap involves sampling a random half of the characters, and including them in the data but dropping the others. The resulting data sets are half the size of the original, and no characters are duplicated. The random variation from doing this should be very similar to that obtained from the bootstrap. The method is advocated by Wu (1986). 3. Permuting species for each characters. This method of resampling (well, OK, it may not be best to call it resampling) was introduced by Archie (1989) and Faith (1990; see also Faith and Cranston, 1991). It involves permuting the columns of the data matrix separately. This produces data matrices that have the same number and kinds of characters but no taxonomic structure. It is used for different purposes than the bootstrap, as it tests not the variation around an estimated tree but the hypothesis that there is no taxonomic structure in the data: if a statistic such as number of steps is significantly smaller in the actual data than it is in replicates that are permuted, then we can argue that there is some taxonomic structure in the data (though perhaps it might be just the presence of aa pair of sibling species). 4. Permuting characters order. This simply permutes the order of the characters, the same reordering being applied to all species. For many methods of tree inference this will make no difference to the outcome (unless one has rates of evolution correlated among adjacent sites). It is included as a possible step in carrying out a permutation test of homogeneity of characters (such as the Incongruence Length Difference test). 5. Permuting characters separately for each species. This is a method introduced by Steel, Lockhart, and Penny (1993) to permute data so as to destroy all phylogenetic structure, while keeping the base composition of each species the same as before. It shuffles the character order separately for each species. seqboot.params replicates How many replicates (R) Integer $seqboot seqboot 100 (defined $value and $value != $vdef) ? "R\\n$value\\n" : "" ( "" , "R\n" + str( value ) +"\n" )[ value is not None and value != vdef ] This server allows no more than 1000 replicates $replicates <= 1000 replicates <= 1000 Bad data sets number: it must be greater than 1 $value > 1 value > 1 1 seqboot.params seqboot_seed Random number seed (must be odd) Integer $seqboot seqboot "$value\\n" str( value ) + "\n" Random number seed must be odd $value > 0 and (($value % 2) != 0) value > 0 and (( value % 2 ) != 0 ) 1010 seqboot.params seqboot_times2jumble Number of times to jumble Integer $seqboot seqboot 1 the product of "number of times to jumble" and replicates must be less than 100000 ($seqboot_times2jumble * (defined $replicates) ? $replicates : 1) <= 100000 seqboot_times2jumble * ( 1 , replicates)[replicates is not None] <= 100000 multiple_dataset String $seqboot seqboot "M\\nD\\n$replicates\\n$seqboot_seed\\n$times2jumble\\n" "M\nD\n" + str( replicates ) + "\n" + str(seqboot_seed) + "\n"+ str( seqboot_times2jumble ) + "\n" 1 dnapars.params bootconfirm String $seqboot seqboot "Y\\n" "Y\n" 1000 seqboot.params bootterminal_type String $seqboot seqboot "0\\n" "0\n" -1 seqboot.params consense Compute a consensus tree Boolean $seqboot and $print_treefile seqboot and print_treefile 0 ($value) ? " && cp infile dnapars.infile && cp dnapars.outtree intree && consense <consense.params && mv outtree consense.outtree && mv outfile consense.outfile" : "" ( "" , " && cp infile dnapars.infile && cp dnapars.outtree intree && consense <consense.params && mv outtree consense.outtree && mv outfile consense.outfile" )[ value ] 100 jumble_opt Randomize options ( one dataset ) Use these options only if you have only one data set jumble Randomize (jumble) input order (J) Boolean 0 ($value and not $seqboot )? "J\\n$jumble_seed\\n$jumble_times\\n" : "" ( "" , "J\n" + str( jumble_seed ) + "\n" + str( jumble_times ) +"\n" )[ value and not seqboot ] you can't use "Randomize options" and "Bootstrap options" at the same times not( $jumble and $seqboot) not (jumble and seqboot) 20 dnapars.params jumble_seed Random number seed for jumble (must be odd) Integer $jumble jumble Random number seed for jumble must be odd. defined $value and ($value > 0 and (($value % 2) != 0)) value is not None and (value > 0 and ((value % 2) != 0)) jumble_times Number of times to jumble Integer $jumble jumble 1 user_tree_opt User tree options user_tree Use User tree (default: no, search for best tree) (U) Boolean 0 ($value)? "U\\n" : "" ( "" , "U\n" )[ value ] You cannot bootstrap your dataset and give a user tree at the same time not ( $user_tree and $seqboot ) not ( user_tree and seqboot ) you cannot randomize (jumble) your dataset and give a user tree at the same time not ( $user_tree and $jumble ) not ( user_tree and jumble ) 1 To give your tree to the program, you must normally put it in the alignment file, after the sequences, preceded by a line indicating how many trees you give. Here, this will be automatically appended: just give a treefile and the number of trees in it. dnapars.params tree_file User Tree file Tree NEWICK $user_tree user_tree (defined $value) ? "ln -s $tree_file intree; " : "" ( "" , "ln -s " + str( tree_file ) + " intree; " )[ value is not None ] the name of this data can't be "infile","outfile","outtree","intree" value not in ( "outfile" , "infile" , "outtree" ,"intree" ) $value ne "outfile" and $value ne "infile" and $value ne "outtree" and $value ne "intree" -1 Give a tree whenever the infile does not already contain the tree. weight_opt Weight options weights Use weights for sites (W) Boolean 0 ($value) ? "W\\n" : "" ( "" , "W\n" )[ value ] 1 dnapars.params weights_file Weights file PhylipWeight AbstractText $weights weights (defined $value) ? "ln -s $weights_file weights; " : "" ( "" , "ln -s " + str( weights_file ) + " weights; " )[ value is not None ] the name of this data can't be "infile","outfile","outtree","intree" value not in ( "outfile" , "infile" , "outtree" ,"intree" ) $value ne "outfile" and $value ne "infile" and $value ne "outtree" and $value ne "intree" -1 output Output options print_tree Print out tree (3) Boolean 1 ($value) ? "" : "3\\n" ( "3\n" , "")[ value ] 1 Tells the program to print a semi-graphical picture of the tree in the outfile. dnapars.params print_steps Print out steps in each site (4) Boolean 0 ($value) ? "4\\n" : "" ( "" , "4\n" )[ value ] 1 dnapars.params print_sequences Print sequences at all nodes of tree (5) Boolean 0 ($value) ? "5\\n" : "" ( "" , "5\n" )[ value ] 1 dnapars.params print_treefile Write out trees onto tree file (6) Boolean 1 ($value) ? "" : "6\\n" ( "6\n" , "" )[ value ] 1 Tells the program to save the tree in a tree file (outtree) (a standard representation of trees where the tree is specified by a nested pairs of parentheses, enclosing names and separated by commas). dnapars.params printdata Print out the data at start of run (1) Boolean 0 ($value) ? "1\\n" : "" ( "" , "1\n" )[ value ] 1 dnapars.params other_options Other options outgroup Outgroup species (J) Integer 1 (defined $value and $value != $vdef) ? "O\\n$value\\n" : "" ( "" , "O\n" + str( value ) + "\n" )[ value is not None and value != vdef ] Please enter a value greater than 0 defined $value and $value > 0 value is not None and value > 0 1 The O (Outgroup) option specifies which species is to have the root of the tree be on the line leading to it. For example, if the outgroup is a species "Mouse" then the root of the tree will be placed in the middle of the branch which is connected to this species, with Mouse branching off on one side of the root and the lineage leading to the rest of the tree on the other. This option is toggle on by choosing the number of the outgroup (the species being taken in the numerical order that they occur in the input file). Outgroup-rooting will not be attempted if it is a user-defined tree, despite your invoking the option. When it is used, the tree as printed out is still listed as being an unrooted tree, though the outgroup is connected to the bottommost node so that it is easy to visually convert the tree into rooted form. dnapars.params outfile Outfile Text " && mv outfile dnapars.outfile" " && mv outfile dnapars.outfile" 40 dnapars.outfile 'dnapars.outfile' treefile Tree file Tree NEWICK $print_treefile print_treefile " && mv outtree dnapars.outtree" " && mv outtree dnapars.outtree" 50 dnapars.outtree 'dnapars.outtree' seqboot_out seqboot outfile SetOfAlignment AbstractText $seqboot seqboot 40 "seqboot.outfile" "seqboot.outfile" confirm String "Y\\n" "Y\n" 1000 dnapars.params terminal_type String "0\\n" "0\n" -1 dnapars.params consense_confirm String $consense consense "Y\\n" "Y\n" 1000 consense.params consense_terminal_type String $consense consense "T\\n" "T\n" -2 consense.params consense_outfile Consense outfile Text $consense consense "consense.outfile" "consense.outfile" consense_outgroup String $consense and $outgroup and $outgroup > 1 consense and outgroup and outgroup > 1 "O\\n$outgroup\\n" "O\n" + str( outgroup ) + "\n" 1000 consense.params consense_treefile Consense tree file Tree NEWICK $consense consense "consense.outtree" "consense.outtree" Programs-5.1.1/palindrome.xml0000644000175000001560000001641412072525233015035 0ustar bneronsis palindrome EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net palindrome Finds inverted repeats in nucleotide sequence(s) http://bioweb2.pasteur.fr/docs/EMBOSS/palindrome.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:repeats palindrome e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_minpallen Enter minimum length of palindrome (value greater than or equal to 1) Integer 10 ("", " -minpallen=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 2 e_maxpallen Enter maximum length of palindrome Integer 100 ("", " -maxpallen=" + str(value))[value is not None and value!=vdef] 3 e_gaplimit Enter maximum gap between repeated regions (value greater than or equal to 0) Integer 100 ("", " -gaplimit=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 4 e_nummismatches Number of mismatches allowed (Positive integer) Integer 0 ("", " -nummismatches=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 5 e_output Output section e_outfile Name of the output file (e_outfile) Filename outfile.pal ("" , " -outfile=" + str(value))[value is not None] 6 e_outfile_out outfile_out option PalindromeReport Report e_outfile e_overlap Report overlapping matches Boolean 1 (" -nooverlap", "")[ bool(value) ] 7 auto Turn off any prompting String " -auto -stdout" 8 Programs-5.1.1/pftools.xml0000644000175000001560000005755511767572177014427 0ustar bneronsis pftools 2.3 PFTOOLS Profile Tools P. Bucher Bucher P, Karplus K, Moeri N and Hofmann, K. (1996). A flexible motif search technique based on generalized profiles. Comput. Chem. 20:3-24. http://www.isrec.isb-sib.ch/ftp-server/pftools/ sequence:nucleic:pattern sequence:protein:pattern pftools PFTOOLS program Choice pfscan pfscan pfsearch "$value -v" str(value) + " -v" pfscan compares a protein or nucleic acid sequence against a profile library (default: Prosite). pfsearch compares a query profile against a DNA or protein sequence library. The result is an unsorted list of profile-sequence matches written to the standard output. fasta_format String " -f" " -f" 1 pfscan PFSCAN parameters $pftools eq "pfscan" pftools == "pfscan" seqfile Sequence File Sequence FASTA " $value" " " + str(value) 90 This DNA or protein sequence will be used to search for matches to a library of PROSITE profiles. prosite Scan PROSITE db? Boolean 0 ($value) ? " " : "" ( "" , " " )[ value ] 100 Prosite library profiledb Profile(s) file in PROSITE format PrositeProfile AbstractText not $prosite not prosite " $value" " " + str(value) 100 This file should contain one or several PROSITE profiles, against which the query sequence will be matched. Each entry in this library should be separated from the next by a line containing only the code. pfsearch PFSEARCH parameters $pftools eq "pfsearch" pftools == "pfsearch" profile Profile(s) file in PROSITE format PrositeProfile AbstractText " $value" " " + str(value) 90 The PROSITE profile contained in this file will be used to search for profile to sequence matches in a biological sequence library. aa_or_nuc_db Library of DNA or protein sequences? Choice Null Null user protein dna "" "" You must either choose a protein, a nucleotid or a user library ($aa_or_nuc_db eq "protein" and $aadb ne "null") or ( $aa_or_nuc_db eq "user" and defined $userdb) or ($aa_or_nuc_db eq "dna" and $nucdb ne "null") (aa_or_nuc_db == "protein" and aadb != "null") or (aa_or_nuc_db == "user" and userdb is not None) or (aa_or_nuc_db == "dna" and nucdb != "null") The program pfsearch tries to identify matches between the input profile and all individual sequences of this library. aadb Protein library Choice $aa_or_nuc_db eq "protein" aa_or_nuc_db == "protein" null null (defined $value and $value ne $vdef) ? " $value" : "" ("", " " + str(value))[value is not None and value != vdef] 100 nucdb Nucleic library Choice $aa_or_nuc_db eq "dna" aa_or_nuc_db == "dna" null null (defined $value and $value ne $vdef) ? " $value" : "" ("", " " + str(value))[value is not None and value != vdef] 100 userdb User library Sequence FASTA $aa_or_nuc_db eq "user" aa_or_nuc_db == "user" (defined $value) ? " $value" : "" ("", " "+str(value) )[value is not None] 100 psa2msa Reformat PSA result file to Fasta multiple sequence alignment file? Boolean 0 ($value) ? " | psa2msa - " : "" ( "" , " | psa2msa - " )[ value ] 110 control Control options 2 cutoff Cut-off level to be used for match selection (-C) Integer not $optimal not optimal 0 (defined $value and $value != $vdef) ? " -C $value" : "" ("" , " -C " + str(value))[ value is not None and value != vdef] The value should be the numerical identifier of a cut-off level defined in the profile. The raw or normalized score of this level will then be used to include profile to sequence matches in the output list. If the specified level does not exist in the profile, the next higher (if cut_off is negative) or next lower (if cut_off is positive) level defined is used instead. mode Normalization mode to use for score (-M) Integer (defined $value) ? " -M $value" : "" ("" , " -M " + str(value))[value is not None] The value specifies which normalization mode defined in the profile should be used to compute the normalized scores for profile to sequence matches. This option will override the profile's PRIORITY parameter. If the specified normalization mode does not exist in the profile, an error message will be output to standard error and the search is interrupted. compl Search the complementary strands of DNA sequences as well (-b) Boolean 0 ($value) ? " -b" : "" ("" , " -b")[ value ] raw_score Use raw scores rather than normalized scores for match selection. Normalized scores will not be listed in the output. (-r) Boolean 0 ($value) ? " -r" : "" ("" , " -r")[ value ] unique Forces DISJOINT=UNIQUE (-u) Boolean 0 ($value) ? " -u" : "" ("" , " -u")[ value ] output Output options 2 optimal Report optimal alignment scores for all sequences regardless of the cut-off value (-a)? Boolean 0 ($value) ? " -a" : "" ("" , " -a")[ value ] This option simultaneously forces DISJOINT=UNIQUE. individual_match Report individual matches for circular profiles (-m)? Boolean 0 ($value) ? " -m" : "" ("" , " -m")[ value ] If the profile is circular, each match between a sequence and a profile can be composed of a stretch of individual matches of the profile. By default, pfscan reports only the total matched region. When this option is set, detailed information for each individual match will be output as well. value_hightest_cut_off Indicate the value of the highest cut-off level exceeded by the match score (-l)? Boolean 0 ($value) ? " -l" : "" ("" , " -l")[ value ] char_hightest_cut_off Indicate by character string of the highest cut-off level exceeded by the match score (-L)? Boolean 0 ($value) ? " -L" : "" ("" , " -L")[ value ] The generalized profile format includes a text string field to specify a name for a cut-off level. The -L option causes the program to display the first two characters of this text string (usually something like !, ?, ??, etc.) at the beginning of each match description. Length Limit profile description length (-d)? Boolean not $xpsa not xpsa 0 ($value) ? " -d" : "" ("" , " -d")[ value ] If this option is set, the description of the profile on the header line will be limited in length. If the match information is longer than the output width specified using option -W, the profile description will not be printed. Else the description will be truncated to fit the -W value. By default, the profile description is not truncated. This option can not be used when option -k is set. xpsa xpsa headers for output (-k)? Boolean not $Length not Length 0 ($value) ? " -k" : "" ("" , " -k")[ value ] When this option is set, all output type will use an xpsa style header line. This format uses keyword=value pairs to output alignment parameters. It is useful to transfer information between different sequence alignment tools. listseq List the sequences of the matched regions as well. The output will be a Pearson/Fasta-formatted sequence library. (-s) Boolean 0 ($value) ? " -s" : "" ("" , " -s")[ value ] psa_format List profile-sequence alignments in pftools PSA format. (-x) Boolean 0 ($value) ? " -x" : "" ("" , " -x")[ value ] between Display alignments between the profile and the matched sequence regions in a human-friendly format. (-y) Boolean 0 ($value) ? " -y" : "" ("" , " -y")[ value ] start_end Indicate starting and ending position of the matched profile range. (-z) Boolean 0 ($value) ? " -z" : "" ("" , " -z")[ value ] The latter position will be given as a negative offset from the end of the profile. Thus the range [1,-1] means entire profile. width Set alignment output width (-W) Integer $listseq or $psa_format or $between listseq or psa_format or between 60 (defined $value and $value != $vdef) ? " -W $value" : "" ("" , " -W "+str(value))[value is not None and value != vdef] The value specifies how many residues will be output on one line when any of the -s, -x or -y options is set. Programs-5.1.1/nthseq.xml0000644000175000001560000001441712072525233014206 0ustar bneronsis nthseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net nthseq Write to file a single sequence from an input stream of sequences http://bioweb2.pasteur.fr/docs/EMBOSS/nthseq.html http://emboss.sourceforge.net/docs/themes sequence:edit nthseq e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_number The number of the sequence to output (value greater than or equal to 1) Integer 1 ("", " -number=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 2 e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename nthseq.e_outseq ("" , " -outseq=" + str(value))[value is not None] 3 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 4 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/showpep.xml0000644000175000001560000005225312072525233014371 0ustar bneronsis showpep EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net showpep Displays protein sequences with features in pretty format http://bioweb2.pasteur.fr/docs/EMBOSS/showpep.html http://emboss.sourceforge.net/docs/themes display showpep e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_format Things to display Choice 2 0 1 2 3 ("", " -format=" + str(value))[value is not None and value!=vdef] 2 e_things Specify your own things to display (value from 1 to 100) String e_format=="0" B,N,T,S,A,F ("", " -things=" + str(value))[value is not None and value!=vdef] 3 Specify a list of one or more code characters in the order in which you wish things to be displayed one above the other down the page. For example if you wish to see things displayed in the order: sequence, ticks line, blank line; then you should enter 'S,T,B'. S: Sequence B: Blank line T: Ticks line N: Number ticks line F: Features A: Annotation For example if you wish to see things displayed in the order: sequence, ticks line, blank line; then you should enter 'S,T,B'. e_additional Additional section e_uppercase Regions to put in uppercase (eg: 4-57,78-94) String ("", " -uppercase=" + str(value))[value is not None] 4 Regions to put in uppercase. If this is left blank, then the sequence case is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 e_highlight Regions to colour in html (eg: 4-57 red 78-94 green) String ("", " -highlight=" + str(value))[value is not None] 5 Regions to colour if formatting for HTML. If this is left blank, then the sequence is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are followed by any valid HTML font colour. Examples of region specifications are: 24-45 blue 56-78 orange 1-100 green 120-156 red A file of ranges to colour (one range per line) can be specified as '@filename'. e_annotation Regions to mark (eg: 4-57 promoter region 78-94 first exon) String ("", " -annotation=" + str(value))[value is not None] 6 Regions to annotate by marking. If this is left blank, then no annotation is added. A set of regions is specified by a set of pairs of positions followed by optional text. The positions are integers. They are followed by any text (but not digits when on the command-line). Examples of region specifications are: 24-45 new domain 56-78 match to Mouse 1-100 First part 120-156 oligo A file of ranges to annotate (one range per line) can be specified as '@filename'. e_featuresection Feature display options e_sourcematch Source of feature to display String ("", " -sourcematch=" + str(value))[value is not None] 7 By default any feature source in the feature table is shown. You can set this to match any feature source you wish to show. The source name is usually either the name of the program that detected the feature or it is the feature table (eg: EMBL) that the feature came from. The source may be wildcarded by using '*'. If you wish to show more than one source, separate their names with the character '|', eg: gene* | embl e_typematch Type of feature to display String ("", " -typematch=" + str(value))[value is not None] 8 By default any feature type in the feature table is shown. You can set this to match any feature type you wish to show. See http://www.ebi.ac.uk/embl/WebFeat/ for a list of the EMBL feature types and see Appendix A of the Swissprot user manual in http://www.expasy.org/sprot/userman.html for a list of the Swissprot feature types. The type may be wildcarded by using '*'. If you wish to show more than one type, separate their names with the character '|', eg: *UTR | intron e_minscore Minimum score of feature to display Float 0.0 ("", " -minscore=" + str(value))[value is not None and value!=vdef] 9 Minimum score of feature to display (see also maxscore) e_maxscore Maximum score of feature to display Float 0.0 ("", " -maxscore=" + str(value))[value is not None and value!=vdef] 10 Maximum score of feature to display. If both minscore and maxscore are zero (the default), then any score is ignored e_tagmatch Tag of feature to display String ("", " -tagmatch=" + str(value))[value is not None] 11 Tags are the types of extra values that a feature may have. By default any feature tag in the feature table is shown. You can set this to match any feature tag you wish to show. The tag may be wildcarded by using '*'. If you wish to show more than one tag, separate their names with the character '|', eg: gene | label e_valuematch Value of feature tags to display String ("", " -valuematch=" + str(value))[value is not None] 12 Tag values are the values associated with a feature tag. Tags are the types of extra values that a feature may have. By default any feature tag value in the feature table is shown. You can set this to match any feature tag value you wish to show. The tag value may be wildcarded by using '*'. If you wish to show more than one tag value, separate their names with the character '|', eg: pax* | 10 e_stricttags Only display the matching tags Boolean 0 ("", " -stricttags")[ bool(value) ] 13 By default if any tag/value pair in a feature matches the specified tag and value, then all the tags/value pairs of that feature will be displayed. If this is set to be true, then only those tag/value pairs in a feature that match the specified tag and value will be displayed. e_advanced Advanced section e_threeletter Display protein sequences in three-letter code Boolean 0 ("", " -threeletter")[ bool(value) ] 14 e_number Number the sequences Boolean 0 ("", " -number")[ bool(value) ] 15 e_width Width of sequence to display (value greater than or equal to 1) Integer 60 ("", " -width=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 16 e_length Line length of page (0 for indefinite) (value greater than or equal to 0) Integer 0 ("", " -length=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 17 e_margin Margin around sequence for numbering (value greater than or equal to 0) Integer 10 ("", " -margin=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 18 e_name Show sequence id Boolean 1 (" -noname", "")[ bool(value) ] 19 Set this to be false if you do not wish to display the ID name of the sequence e_description Show description Boolean 1 (" -nodescription", "")[ bool(value) ] 20 Set this to be false if you do not wish to display the description of the sequence e_offset Offset to start numbering the sequence from Integer 1 ("", " -offset=" + str(value))[value is not None and value!=vdef] 21 e_html Use html formatting Boolean 0 ("", " -html")[ bool(value) ] 22 e_output Output section e_outfile Name of the output file (e_outfile) Filename showpep.e_outfile ("" , " -outfile=" + str(value))[value is not None] 23 e_outfile_out outfile_out option ShowpepReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 24 Programs-5.1.1/antigenic.xml0000644000175000001560000001761512072525233014650 0ustar bneronsis antigenic EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net antigenic Finds antigenic sites in proteins http://bioweb2.pasteur.fr/docs/EMBOSS/antigenic.html http://emboss.sourceforge.net/docs/themes sequence:protein:motifs antigenic e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_minlen Minimum length of antigenic region (value from 1 to 50) Integer 6 ("", " -minlen=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 Value less than or equal to 50 is required value <= 50 2 e_output Output section e_outfile Name of the report file Filename antigenic.report ("" , " -outfile=" + str(value))[value is not None] 3 e_rformat_outfile Choose the report output format Choice MOTIF DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 4 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/preg.xml0000644000175000001560000001645012072567465013655 0ustar bneronsis preg EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net preg Regular expression search of protein sequence(s) http://bioweb2.pasteur.fr/docs/EMBOSS/preg.html http://emboss.sourceforge.net/docs/themes sequence:protein:motifs preg e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_pattern Regular expression pattern Protein Pattern AbstractText ("", " -pattern=@" + str(value))[value is not None] 2 e_output Output section e_outfile Name of the report file Filename preg.report ("" , " -outfile=" + str(value))[value is not None] 3 e_rformat_outfile Choose the report output format Choice SEQTABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 4 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/comalign.xml0000644000175000001560000001251111767572177014511 0ustar bneronsis comalign ComAlign is a program, that given a number of sequences generates a number of heuristic alignments and combines these best possible. O. Caprani, K. Bucka-Lassen http://www.daimi.au.dk/~ocaprani/ComAlign/ComAlign.html http://www.daimi.au.dk/~ocaprani/ComAlign/programs/ http://www.daimi.au.dk/~ocaprani/ComAlign/HowTo.html alignment:multiple ComAlign seq Sequences File (-f) Sequence GenAl " -f$value" " -f" + str(value) 1 the sequences must be in the GenAl format >Test1 ATGAAA * * >Test2 ATCAAA * * seed Random seed number (-s) Integer 1 (defined $value and $value != $vdef) ? " -s$value" : "" ( "" , " -s" + str(value) )[ value is not None and value != vdef] 1 seqnb Number of sequences that are to be aligned (-n) Integer 2 (defined $value and $value != $vdef) ? " -n$value" : "" ( "" , " -n" + str(value) )[ value is not None and value != vdef] 1 iterations Number of iterations (-i) Integer 10 (defined $value and $value != $vdef) ? " -i$value" : "" ( "" , " -i" + str(value) )[ value is not None and value != vdef] 1 On each iteration a new alignment is added to the pool of alignments ComAlign is working on score Score: ComAlign records the time it took to find a solution as good as this score (-l) Integer (defined $value) ? " -l$value" : "" ( "" , " -l" + str(value) )[ value is not None] 1 time ComAlign marks the best solution found after this number of 1/100 seconds (-t) Integer (defined $value) ? " -t$value" : "" ( "" , " -t" + str(value) )[ value is not None ] 1 last_iterations Makes ComAlign terminate if the score hasn't changed within the last this number of iterations (-c) Integer 20 (defined $value and $value != $vdef) ? " -c$value" : "" ( "" , " -c" + str(value) )[ value is not None and value != vdef] 1 print_best Print the best found alignment on termination (-a) Boolean 1 ($value) ? " -a" : "" ( "" , " -a" )[ value ] 1 Programs-5.1.1/plotcon.xml0000644000175000001560000004205311672346320014362 0ustar bneronsis plotcon EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net plotcon Plot conservation of a sequence alignment http://bioweb2.pasteur.fr/docs/EMBOSS/plotcon.html http://emboss.sourceforge.net/docs/themes alignment:multiple plotcon e_input Input section e_sequences sequences option Alignment FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH 1,n ("", " -sequences=" + str(value))[value is not None] 1 File containing a sequence alignment e_scorefile Comparison matrix file Choice mobyle_null mobyle_null EBLOSUM30 EBLOSUM35 EBLOSUM40 EBLOSUM45 EBLOSUM50 EBLOSUM55 EBLOSUM60 EBLOSUM62 EBLOSUM62-12 EBLOSUM65 EBLOSUM70 EBLOSUM75 EBLOSUM80 EBLOSUM85 EBLOSUM90 EBLOSUMN EDNAFULL EDNAMAT EDNASIMPLE EPAM10 EPAM100 EPAM110 EPAM120 EPAM130 EPAM140 EPAM150 EPAM160 EPAM170 EPAM180 EPAM190 EPAM20 EPAM200 EPAM210 EPAM220 EPAM230 EPAM240 EPAM250 EPAM260 EPAM270 EPAM280 EPAM290 EPAM30 EPAM300 EPAM310 EPAM320 EPAM330 EPAM340 EPAM350 EPAM360 EPAM370 EPAM380 EPAM390 EPAM40 EPAM400 EPAM410 EPAM420 EPAM430 EPAM440 EPAM450 EPAM460 EPAM470 EPAM480 EPAM490 EPAM50 EPAM500 EPAM60 EPAM70 EPAM80 EPAM90 SSSUB ("", " -scorefile=" + str(value))[value is not None and value!=vdef] 2 This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. e_required Required section e_winsize Window size Integer 4 ("", " -winsize=" + str(value))[value is not None and value!=vdef] 3 Number of columns to average alignment quality over. The larger this value is, the smoother the plot will be. e_output Output section e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 4 xy_goutfile Name of the output graph Filename plotcon_xygraph ("" , " -goutfile=" + str(value))[value is not None] 5 xy_outgraph_png Graph file Picture Binary e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_graph == "data" "*.dat" auto Turn off any prompting String " -auto -stdout" 6 Programs-5.1.1/wublast2.xml0000644000175000001560000022652411767572177014476 0ustar bneronsis wublast2 2.0 WUBLAST2 Wash-U. BLAST, with gaps Gish. W Gish, Warren (1994-1997). unpublished. Gish, W, and DJ States (1993). Identification of protein coding regions by database similarity search. Nature Genetics 3:266-72. Altschul, SF, and W Gish (1996). Local alignment statistics. ed. R. Doolittle. Methods in Enzymology 266:460-80. Korf, I, and W Gish (2000). MPBLAST: improved BLAST performance with multiplexed queries. Bioinformatics in press. Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman (1990). Basic local alignment search tool. J. Mol. Biol. 215:403-10. http://blast.wustl.edu/ database:search:homology wublast2 Blast program Choice null null blastn blastp blastx tblastn tblastx "$value" str(value) 1 The five BLAST programs described here perform the following tasks: - blastp compares an amino acid query sequence against a protein sequence database; - blastn compares a nucleotide query sequence against a nucleotide sequence database; - blastx compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database; - tblastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands). - tblastx compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. db Database 2 protein_db Protein db Choice $wublast2 =~ /^blast[px]$/ wublast2 in [ "blastp", "blastx" ] null null " $value" " "+str(value) Choose a protein db for blastp or blastx. nucleotid_db Nucleotid db Choice $wublast2 =~ /^(blastn|tblast[nx])$/) wublast2 in [ "blastn", "tblastn", "tblastx" ] null null " $value" " "+str(value) Choose a nucleotide db for blastn, tblastn or tblastx query Query Sequence query_seq Query Sequence FASTA " $query_seq" " " + str(value) 3 nosegs Do not segment the query sequence on hyphen (-) characters (-nosegs) Boolean 0 ($value) ? " -nosegs" : "" ( "" , " -nosegs" )[ value ] 5 compat BLAST version Choice current current compat1.4 compat1.3 (defined $value and $value ne $vdef) ? " -$value" : "" ( "" , " -" + str(value) )[ value is not None and value != vdef] 5 scoring_opt Scoring options 5 open_a_gap Open gap penalty (Q) Integer (defined $value) ? "Q=$value" : "" ( "" , " Q=" + str(value) )[ value is not None ] Default: 9 for proteins and 10 for nucleics. requires an integral value in the range 1 <= Q < 2147483647 $value >= 1 and $value < 2147483647 value >= 1 and value < 2147483647 extend_a_gap Extending a gap penalty (R) Integer (defined $value) ? " R=$value" : "" ("" , " R=" + str(value))[ value is not None ] Default: 2 for proteins; 10 for nucleics. requires an integral value in the range 0 <= R < 2147483647 $value >= 0 and $value < 2147483647 value >= 0 and value < 2147483647 scoring_blast Protein penalty (not for blastn) $wublast2 ne "blastn" wublast2 != "blastn" matrix Similarity matrix (-matrix) Choice BLOSUM62 BLOSUM30 BLOSUM35 BLOSUM40 BLOSUM45 BLOSUM50 BLOSUM55 BLOSUM60 BLOSUM62 BLOSUM65 BLOSUM70 BLOSUM75 BLOSUM80 BLOSUM85 BLOSUM90 PAM10 PAM20 PAM30 PAM40 PAM50 PAM60 PAM70 PAM80 PAM90 PAM100 PAM110 PAM120 PAM130 PAM140 PAM150 PAM160 PAM170 PAM180 PAM190 PAM200 PAM210 PAM220 PAM230 PAM240 PAM250 PAM260 PAM270 PAM280 PAM290 PAM300 PAM310 PAM320 PAM330 PAM340 PAM350 PAM360 PAM370 PAM380 PAM390 PAM400 identity (defined $value and $value ne $vdef) ? " -matrix $value" : "" ( "" , " -matrix " + str(value) )[ value is not None and value != vdef] Several PAM (point accepted mutations per 100 residues) amino acid scoring matrices are provided in the BLAST software distribution, including the PAM40, PAM120, and PAM250. While the BLOSUM62 matrix is a good general purpose scoring matrix and is the default matrix used by the BLAST programs, if one is restricted to using only PAM scoring matrices, then the PAM120 is recommended for general protein similarity searches (Altschul, 1991). The pam(1 program can be used to produce PAM matrices of any desired iteration from 2 to 511. Each matrix is most sensitive at finding similarities at its particular PAM distance. For more thorough searches, particularly when the mutational distance between potential homologs is unknown and the significance of their similarity may be only marginal, Altschul (1991, 1992) recommends performing at least three searches, one each with the PAM40, PAM120 and PAM250 matrices. scoring_blastn Blastn penalty $wublast2 eq "blastn" wublast2 == "blastn" mismatch Penalty for a nucleotid mismatch (N) Float $wublast2 eq blastn wublast2 == "blastn" -4 (defined $value and $value != $vdef) ? " N=$value" : "" ( "" , " N=" + str(value) )[ value is not None and value != vdef] match Reward for a nucleotid match (M) Float 5 (defined $value and $value != $vdef) ? " M=$value" : "" ( "" , " M=" + str(value) )[ value is not None and value != vdef] filter_opt Filtering and masking options 6 filter Filter or Masking query sequence Choice null null filter wordmask Mask letters in the query sequence without altering the sequence itself, during neighborhood word generation. other_filters Filtering or Masking options Choice defined($filter) filter is not None seg dust seg xnu seg+xnu " -$filter $value" " -" + str(filter) + " " + str(value) This option activates filtering or masking of segments of the query sequence based on a potentially wide variety of criteria. The usual intent of filtering is to mask regions that are non-specific for protein identification using sequence similarity. For instance, it may be desired to mask acidic or basic segments that would otherwise yield overwhelming amounts of uninteresting, non-specific matches against a wide array of protein families from a comprehensive database search. The BLAST programs have internally-coded knowledge of the specific command line options needed to invoke the SEG and XNU programs as query sequence filters, but these two filter programs are not included in the BLAST software distribution and must be independently installed. The SEG program (Wootton and Federhen, 1993) masks low compositional complexity regions, while XNU (Claverie and States, 1993) masks regions containing short-periodicity internal repeats. The BLAST programs can pipe the filtered output from one program into another. For instance, XNU+SEG or SEG+XNU can be specified as the filtermethod to have each program filter the query sequence in succession. Note that neither SEG nor XNU is suitable for filtering untranslated nucleotide sequences for use by blastn maskextra Extend masking additional distance into flanking regions (-maskextra) Boolean $filter eq "wordmask" filter == "wordmask" 0 ($value) ? " -maskextra" : "" ( "" , " -maskextra" )[ value ] lc Filter lower-case letters in query Choice null null lcfilter lcmask (defined $value and $value ne $vdef) ? " -$value" : "" ( "" , " -" + str(value) )[ value is not None and value != vdef] selectivite Selectivity Options 7 Expect Expected value (E) Float 10.0 (defined $value and $value != $vdef) ? " E=$value" : "" ( "" , " E=" + str(value) )[ value is not None and value != vdef] The parameter Expect (E) establishes a statistical significance threshold for reporting database sequence matches. E is interpreted as the upper bound on the expected frequency of chance occurrence of an HSP (or set of HSPs) within the context of the entire database search. Any database sequence whose matching satisfies E is subject to being reported in the program output. If the query sequence and database sequences follow the random sequence model of Karlin and Altschul (1990), and if sufficiently sensitive BLAST algorithm parameters are used, then E may be thought of as the number of matches one expects to observe by chance alone during the database search. The default value for E is 10, while the permitted range for this Real valued parameter is 0 < E <= 1000. hspmax Maximal number of HSPs saved or reported per subject sequence (-hspmax) Integer 1000 (defined $value and $value != $vdef) ? " -hspmax $value" : "" ( "" , " -hspmax " + str(value) )[ value is not None and value != vdef] E2 Expected number of HSPs that will be found when comparing two sequences that each have the same length (E2) Float (defined $value) ? " E2=$value" : "" ("" , " E2=" + str(value))[ value is not None ] E2 is interpreted as the expected number of HSPs that will be found when comparing two sequences that each have the same length -- either 300 amino acids or 1000 nucleotides, whichever is appropriate for the particular program being used. The default value for E2 is typically about 0.15 but may vary from version to version of each program. Cutoff Cutoff score: threshold for report (S) Float (defined $value) ? " S=$value" : "" ( "" , " S=" + str(value) )[ value is not None ] The parameter Cutoff (S) represents the score at which a single HSP would by itself satisfy the significance threshold E. Higher scores -- higher values for S -- correspond to increasing statistical significance (lower probability of chance occurrence). Unless S is explicitly set on the command line, its default value is calculated from the value of E. If both S and E are set on the command line, the one which is the most restrictive is used. When neither parameter is specified on the command line, the default value for E is used to calculate S. S2 Cutoff score which defines HSPs (S2) Float (defined $value) ? " S2=$value" : "" ( "" , " S2=" + str(value) )[ value is not None ] S2 may be thought of as the score expected for the MSP between two sequences that each have the same length -- either 300 amino acids or 1000 nucleotides, whichever is appropriate for the particular program being used. The default value for S2 will be calculated from E2 and, like the relationship between E and S, is dependent on the residue composition of the query sequence and the scoring system employed, as conveyed by the Karlin-Altschul K and Lambda statistics. W Length of words identified in the query sequence (W) Integer (defined $value) ? " W=$value" : "" ( "" , " W=" + str(value) )[ value is not None ] The task of finding HSPs begins with identifying short words of length W in the query sequence that either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al., 1990). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. T Neighborhood word score threshold (T) Float (defined $value) ? " T=$value" : "" ( "" , " T=" + str(value) )[ value is not None ] nwstart Start generating neighborhood words here in query (blastp/blastx) (-nwstart) Float $wublast2 =~ /^blast[px]$/ wublast2 in [ "blastp", "blastx" ] (defined $value) ? " -nwstart $value" : "" ( "" , " -nwstart " + str(value) )[ value is not None ] Restrict blast neighborhood word generation to a specific segment of the query sequence that begins at 'nwstart' and continues for 'nwlen' residues or until the end of the query sequence is reached. HSP alignments may extend outside the region of neighborhood word generation but hte alignments can only be initiated by word hits occurring within the region. Through the use of these options, a very long query sequence can be searched piecemeal, using short, overlapping segments each time. The amount of overlap from one neighborhood region to the next need only be the blast wordlength W minus 1, in order to be assured of detecting all HSPs. However, to provide greater freedom for statistical interpretation of multiple HSP findings (eg. matches against exons) more extensive overlapping is recommanded, with the extent to be chosen based on the expected gene density and length of introns. nwlen Generate neighborhood words over this distance from 'nwstart' in query (blastp/blastx) (-nwlen) Float $wublast2 =~ /^blast[px]$/ wublast2 in [ "blastp", "blastx" ] (defined $value) ? " -nwlen $value" : "" ( "" , " -nwlen " + str(value) )[ value is not None ] X Word hit extension drop-off score (X) Float (defined $value) ? " X=$value" : "" ( "" , " X=" + str(value) )[ value is not None ] hitdist Maximum word separation distance for 2-hit BLAST algorithm (-hitdist) Integer 0 (defined $value and $value != $vdef) ? " -hitdist $value" : "" ( "" , " -hitdist " + str(value) )[ value is not None and value != vdef] Invoke a 2-hit BLAST algorithm similar to that of Altschul et al. (1997), with maximum wordhit separation distance, as measured from the end of each wordhit. Altschul et al. (1997) use the equivalent of hitdist=40 in their software by default (except NCBI-BLASTN, where 2-hit BLAST is not available). In WU-BLASTN, setting 'hitdist' and 'wink' (see below) is akin to using double-length words generated on W-mer boundaries. For best sensitivity, 2-hit BLAST should generally not be used. wink Generate word hits at every wink-th position (-wink) Integer 1 (defined $value and $value != $vdef) ? " -wink $value" : "" ( "" , " -wink " + str(value) )[ value is not None and value != vdef] Generate word hits at every wink-th ('W increment') position along the query, where the default wink=1 produces neighborhood words at every position. For good sensitivity, this option should not be used. The benefit of using 'wink' is in finding identical or nearly identical sequences rapidly. When used in conjunction with the 'hitdist' option to obtain the highest speed, care should be taken that desired matches are not precluded by these parameters. consistency Turn off HSP consistency rules for statistics (-consistency) Boolean 0 ($value) ? " -consistency" : "" ( "" , " -consistency" )[ value ] This option turns off both the determination of the number of HSPs that ar consistent with each other in a gapped alignment and an adjustment that is made to the Sum and poisson statistics to account for the consistency of combined HSPs. hspsepqmax Maximal separation allowed between HSPs along query (-hspsepqmax) Integer not $consistency not consistency (defined $value) ? " -hspsepqmax $value" : "" ( "" , " -hspsepqmax " + str(value) )[ value is not None ] hspsepsmax Maximal separation allowed between HSPs along subject (-hspsepsmax) Integer not $consistency not consistency (defined $value) ? " -hspsepsmax $value" : "" ( "" , " -hspsepsmax " + str(value) )[ value is not None ] span Discard HSPs spanned on (-span*) Choice span2 span2 span1 span (defined $value and $value ne $vdef) ? " -$value" : "" ( "" , " -" + str(value) )[ value is not None and value != vdef] nogap Do not create gapped alignments (-nogap) Boolean 0 ($value) ? " -nogap" : "" ( "" , " -nogap" )[ value ] gapall Generate a gapped alignment for every ungapped HSP found (-gapall) Boolean 1 ($value) ? " -gapall" : "" ( "" , " -gapall" )[ value ] gap_selectivite Selectivity options for gapped alignments not $nogap and not $gapall not nogap and not gapall 5 gapE Expectation threshold of sets of ungapped HSPs for subsequent use in seeding gapped alignments (-gapE) Float 2000 (defined $value and $value != $vdef) ? " -gapE $value" : "" ( "" , " -gapE " + str(value) )[ value is not None and value != vdef] gapE2 Expectation threshold for saving individual gapped alignments (-gapE2) Float (defined $value) ? " -gapE2 $value" : "" ( "" , " -gapE2 " + str(value) )[ value is not None ] gapS2 Cutoff score for saving individual gapped alignments (-gapS2) Float (defined $value) ? " -gapS2 $value" : "" ( "" , " -gapS2 " + str(value) )[ value is not None ] gapW Set the window width within which gapped alignments are generated (-gapW) Integer (defined $value) ? " -gapW $value" : "" ( "" , " -gapW " + str(value) )[ value is not None ] Default values are 32 for protein comparisons and 16 for blastn. gapX Set the maximum drop-off score during banded gapped alignment (gapX) Float (defined $value) ? " gapX=$value" : "" ("" , " gapX=" + str(value))[ value is not None ] gapsepqmax Maximal permitted distance on the QUERY sequence between two consistent gapped alignments (-gapsepqmax) Integer not $nogap and not $consistency not nogap and not consistency (defined $value) ? " -gapsepqmax $value" : "" ( "" , " -gapsepqmax " + str(value) )[ value is not None ] gapsepsmax Maximal permitted distance on the subject sequence between two consistent gapped alignments (-gapsepsmax) Integer not $nogap and not $consistency not nogap and not consistency (defined $value) ? " -gapsepsmax $value" : "" ( "" , " -gapsepsmax " + str(value) )[ value is not None ] translation_opt Translation Option $wublast2 ne blastn wublast2 != "blastn" 6 gcode Genetic code to translate the query (blastx,tblastx) (-gcode) Choice $wublast2 =~ /^t?blastx$/ wublast2 in [ "blastx" , "tblastx" ] 1 1 2 3 4 5 6 9 10 11 12 13 14 (defined $value and $value ne $vdef) ? " -gcode $value" : "" ( "" , " -gcode " + str(value) )[ value is not None and value != vdef] strand Which strands (for nucleotid query) Choice $wublast2 =~ /^blast[nx]$/ wublast2 in [ "blastn", "blastx" ] null null -top -bottom (defined $value) ? " $value" : "" ( "" , " " + str(value) )[value is not None] dbgcode Genetic code for database translation (tblastx,tblastn) (-dbgcode) Choice $wublast2 =~ /^tblast[nx]$/ wublast2 in [ "tblastx", "tblastn" ] 1 1 2 3 4 5 6 9 10 11 12 13 14 (defined $value and $value ne $vdef) ? "-dbgcode $value" : "" ( "" , "-dbgcode " + str(value) )[ value is not None and value != vdef] dbstrand Which strands of the database sequences (tblastn,tblastx) (-db) Choice $wublast2 =~ /^tblast[nx]$/ wublast2 in [ "tblastn" , "tblastx" ] null null -dbtop -dbbottom (defined $value and $value ne $vdef) ? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] statistics Statistic options 6 Parameters to use when evaluating the significance of gapped and ungapped alignment scores. Useful when precomputed values are unavailable for the chosen scoring matrix and gap penalty combination in the programs internal tables. stat Use statistics Choice sump poissonp kap sump (defined $value and $value ne $vdef) ? " -$value" : "" ( "" , " -" + str(value) )[ value is not None and value != vdef] wordstats Collect word-hit statistics (-stats) Boolean 0 ($value) ? " -stats" : "" ( "" , " -stats" )[ value ] This option consumes marginally more cpu time. ctxfactor Base statistics on this number of independent contexts or reading frames (-ctxfactor) Float (defined $value) ? " -ctxfactor $value" : "" ( "" , " -ctxfactor " + str(value) )[ value is not None ] olf Maximal fractional length of overlap for HSP consistency of two ungapped alignment (-olf) Float 0.125 (defined $value and $value != $vdef) ? " -olf $value" : "" ( "" , " -olf " + str(value) )[ value is not None and value != vdef] golf Maximal fractional length of overlap for HSP consistency of two gapped alignments (-golf) Float 0.10 (defined $value and $value != $vdef) ? " -golf $value" : "" ( "" , " -golf " + str(value) )[ value is not None and value != vdef] olmax Maximal absolute length of overlap for HSP consistency of two ungapped alignment (default unlimited) (-olmax) Integer (defined $value) ? " -olmax $value" : "" ( "" , " -olmax " + str(value) )[ value is not None ] golmax Maximal absolute length of overlap for HSP consistency of two gapped alignment (default unlimited) (-golmax) Integer (defined $value) ? " -golmax $value" : "" ( "" , " -golmax " + str(value) )[ value is not None ] gapdecayrate Gap decay rate (-gapdecayrate) Float 0.5 (defined $value and $value != $vdef) ? " -gapdecayrate $value" : "" ( "" , " -gapdecayrate " + str(value) )[ value is not None and value != vdef] This option defines the common ratio of the terms in a geometric progression used in normalizing probabilities across all numbers of Poisson events (typically the number of 'consistent' HSPs). A Poisson probability for N segments is eighted by the reciprocal of the Nth term in the progression, where the first term has a value of (1-rate), the second term is (1-rate)*rate, the third term is (1-rate)*rate*rate, and so on. The default rate is 0.5, such that the probability assigned to a single HSP is discounted by a factor of 2, the Poisson probability of 2 HSPs is discounted by a factor of 4, for 3 HSPs the discount factor is 8, and so on. The rate essentially defines a penalty imposed on the gap between each HSP, where the default penalty is equivalent to 1 bit of information. kastats Parameters for Karlin-Altschul statistics $stat =~ /^(kap|sump)$/ stat in [ "kap", "sump" ] 6 K K parameter for ungapped alignment scores (K) Float (defined $value) ? " K=$value" : "" ( "" , " K=" + str(value) )[ value is not None ] L Lambda parameter for ungapped alignment scores (L) Float (defined $value) ? " L=$value" : "" ( "" , " L=" + str(value) )[ value is not None ] H H parameter for ungapped alignment scores (H) Float (defined $value) ? " H=$value" : "" ( "" , " H=" + str(value) )[ value is not None ] gapK K parameter for gapped alignment scores (gapK) Float (defined $value) ? " gapK=$value" : "" ( "" , " gapK=" + str(value) )[ value is not None ] gapL Lambda parameter for gapped alignment scores (gapL) Float (defined $value) ? " gapL=$value" : "" ( "" , " gapL=" + str(value) )[ value is not None ] gapH H parameter for gapped alignment scores (gapH) Float (defined $value) ? " gapH=$value" : "" ( "" , " gapH=" + str(value) )[ value is not None ] affichage Report options 5 Histogram Histogram (H) Boolean 0 ($value) ? " H=1" : "" ( "" , " H=1" )[ value ] Descriptions How many short descriptions? (V) Integer 500 (defined $value and $value != $vdef) ? " V=$value" : "" ( "" , " V=" + str(value) )[ value is not None and value != vdef] Maximum number of database sequences for which one-line descriptions will be reported (V). Alignments How many alignments? (B) Integer 250 (defined $value and $value != $vdef) ? " B=$value" : "" ( "" , " B=" + str(value) )[ value is not None and value != vdef] Maximum number of database sequences for which high-scoring segment pairs will be reported (B). sortby Sort order for reporting database sequences Choice sort_by_pvalue sort_by_pvalue sort_by_count sort_by_highscore sort_by_totalscore (defined $value and $value ne $vdef) ? " -$value" : "" ( "" , " -" + str(value) )[ value is not None and value != vdef] postsw Perform full Smith-Waterman before output (blastp only) (-postsw) Boolean $wublast2 eq "blastp" wublast2 == "blastp" 0 ($value) ? " -postsw" : "" ( "" , " -postsw" )[ value ] output_file Output file name String " -o blast.txt" " -o blast.txt" 499 cpunum CPU number to use String " -cpus 1" " -cpus 1" 499 output_format Html output format Boolean 1 " && html4blast -g -o blast.html blast.txt" ("" , " && html4blast -g -o blast.html blast.txt")[ value ] 500 echofilter Display filter sequences in output (-echofilter) Boolean 0 ($value) ? " -echofilter" : "" ( "" , " -echofilter" )[ value ] prune Do not prune insignificant HSPs from the output lists (-prune) Boolean 0 ($value) ? " -prune" : "" ( "" , " -prune" )[ value ] topcomboN Report this number of consistent (colinear) groups of HSPs (-topcomboN) Integer (defined $value) ? " -topcomboN $value" : "" ( "" , " -topcomboN " + str(value) )[ value is not None ] topcomboE Only show HSP combos within this factor of the best combo (-topcomboE) Float (defined $value) ? " -topcomboE $value" : "" ( "" , " -topcomboE " + str(value) )[ value is not None ] gi Display gi identifiers, when available (-gi) Boolean 0 ($value) ? " -gi" : "" ( "" , " -gi" )[ value ] noseqs Do not display sequence alignments (-noseqs) Boolean 0 ($value) ? " -noseqs" : "" ( "" , " -noseqs" )[ value ] tmp_outfile Blast report BlastTextReport Report "blast.txt" "blast.txt" htmlfile Blast html report BlastHtmlReport Report "blast.html" "blast.html" imgfile Picture Binary "*.png" "*.gif" "*.png" "*.gif" xmloutput Blast xml report BlastXmlReport Report "blast.xml" "blast.xml" Programs-5.1.1/nw_cat.xml0000644000175000001560000000406611767601016014162 0ustar bneronsis nw_cat GNU 7.4 nw_cat concatenate phylogenetic tree files phylogeny:others cat input First treeset enter a phylogenetic tree Tree NEWICK " $value" " " + str(value) 1 input_2 Second treeset type any kind of text example text Tree NEWICK (defined $value) ? " $value" : "" ("" , " " + str(value))[value is not None] 2 output Output treeset Tree NEWICK 1 "nw_cat.out" "nw_cat.out" Programs-5.1.1/extractfeat.xml0000644000175000001560000004377012072525233015222 0ustar bneronsis extractfeat EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net extractfeat Extract features from sequence(s) http://bioweb2.pasteur.fr/docs/EMBOSS/extractfeat.html http://emboss.sourceforge.net/docs/themes sequence:edit:feature_table extractfeat e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_additional Additional section e_before Amount of sequence before feature to extract Integer 0 ("", " -before=" + str(value))[value is not None and value!=vdef] 2 If this value is greater than 0 then that number of bases or residues before the feature are included in the extracted sequence. This allows you to get the context of the feature. If this value is negative then the start of the extracted sequence will be this number of bases/residues before the end of the feature. So a value of '10' will start the extraction 10 bases/residues before the start of the sequence, and a value of '-10' will start the extraction 10 bases/residues before the end of the feature. The output sequence will be padded with 'N' or 'X' characters if the sequence starts after the required start of the extraction. e_after Amount of sequence after feature to extract Integer 0 ("", " -after=" + str(value))[value is not None and value!=vdef] 3 If this value is greater than 0 then that number of bases or residues after the feature are included in the extracted sequence. This allows you to get the context of the feature. If this value is negative then the end of the extracted sequence will be this number of bases/residues after the start of the feature. So a value of '10' will end the extraction 10 bases/residues after the end of the sequence, and a value of '-10' will end the extraction 10 bases/residues after the start of the feature. The output sequence will be padded with 'N' or 'X' characters if the sequence ends before the required end of the extraction. e_source Source of feature to display String ("", " -source=" + str(value))[value is not None] 4 By default any feature source in the feature table is shown. You can set this to match any feature source you wish to show. The source name is usually either the name of the program that detected the feature or it is the feature table (eg: EMBL) that the feature came from. The source may be wildcarded by using '*'. If you wish to show more than one source, separate their names with the character '|', eg: gene* | embl e_type Type of feature to extract String ("", " -type=" + str(value))[value is not None] 5 By default every feature in the feature table is extracted. You can set this to be any feature type you wish to extract. See http://www.ebi.ac.uk/embl/WebFeat/ for a list of the EMBL feature types and see the Uniprot user manual in http://www.uniprot.org/manual/sequence_annotation for a list of the Uniprot feature types. The type may be wildcarded by using '*'. If you wish to extract more than one type, separate their names with the character '|', eg: *UTR | intron e_sense Sense of feature to extract Integer 0 ("", " -sense=" + str(value))[value is not None and value!=vdef] 6 By default any feature type in the feature table is extracted. You can set this to match any feature sense you wish. 0 - any sense, 1 - forward sense, -1 - reverse sense e_minscore Minimum score of feature to extract Float 0.0 ("", " -minscore=" + str(value))[value is not None and value!=vdef] 7 Minimum score of feature to extract (see also maxscore) e_maxscore Maximum score of feature to extract Float 0.0 ("", " -maxscore=" + str(value))[value is not None and value!=vdef] 8 Maximum score of feature to extract. If both minscore and maxscore are zero (the default), then any score is ignored e_tag Tag of feature to extract String ("", " -tag=" + str(value))[value is not None] 9 Tags are the types of extra values that a feature may have. For example in the EMBL feature table, a 'CDS' type of feature may have the tags '/codon', '/codon_start', '/db_xref', '/EC_number', '/evidence', '/exception', '/function', '/gene', '/label', '/map', '/note', '/number', '/partial', '/product', '/protein_id', '/pseudo', '/standard_name', '/translation', '/transl_except', '/transl_table', or '/usedin'. Some of these tags also have values, for example '/gene' can have the value of the gene name. By default any feature tag in the feature table is extracted. You can set this to match any feature tag you wish to show. The tag may be wildcarded by using '*'. If you wish to extract more than one tag, separate their names with the character '|', eg: gene | label e_value Value of feature tags to extract String ("", " -value=" + str(value))[value is not None] 10 Tag values are the values associated with a feature tag. Tags are the types of extra values that a feature may have. For example in the EMBL feature table, a 'CDS' type of feature may have the tags '/codon', '/codon_start', '/db_xref', '/EC_number', '/evidence', '/exception', '/function', '/gene', '/label', '/map', '/note', '/number', '/partial', '/product', '/protein_id', '/pseudo', '/standard_name', '/translation', '/transl_except', '/transl_table', or '/usedin'. Only some of these tags can have values, for example '/gene' can have the value of the gene name. By default any feature tag value in the feature table is shown. You can set this to match any feature tag value you wish to show. The tag value may be wildcarded by using '*'. If you wish to show more than one tag value, separate their names with a space or the character '|', eg: pax* | 10 e_output Output section e_join Output introns etc. as one sequence Boolean 0 ("", " -join")[ bool(value) ] 11 Some features, such as CDS (coding sequence) and mRNA are composed of introns concatenated together. There may be other forms of 'joined' sequence, depending on the feature table. If this option is set TRUE, then any group of these features will be output as a single sequence. If the 'before' and 'after' qualifiers have been set, then only the sequence before the first feature and after the last feature are added. e_featinname Append type of feature to output sequence name Boolean 0 ("", " -featinname")[ bool(value) ] 12 To aid you in identifying the type of feature that has been output, the type of feature is added to the start of the description of the output sequence. Sometimes the description of a sequence is lost in subsequent processing of the sequences file, so it is useful for the type to be a part of the sequence ID name. If you set this to be TRUE then the name is added to the ID name of the output sequence. e_describe Feature tag names to add to the description String ("", " -describe=" + str(value))[value is not None] 13 To aid you in identifying some further properties of a feature that has been output, this lets you specify one or more tag names that should be added to the output sequence Description text, together with their values (if any). For example, if this is set to be 'gene', then if any output feature has the tag (for example) '/gene=BRCA1' associated with it, then the text '(gene=BRCA1)' will be added to the Description line. Tags are the types of extra values that a feature may have. For example in the EMBL feature table, a 'CDS' type of feature may have the tags '/codon', '/codon_start', '/db_xref', '/EC_number', '/evidence', '/exception', '/function', '/gene', '/label', '/map', '/note', '/number', '/partial', '/product', '/protein_id', '/pseudo', '/standard_name', '/translation', '/transl_except', '/transl_table', or '/usedin'. Some of these tags also have values, for example '/gene' can have the value of the gene name. By default no feature tag is displayed. You can set this to match any feature tag you wish to show. The tag may be wildcarded by using '*'. If you wish to extract more than one tag, separate their names with the character '|', eg: gene | label e_outseq Name of the output sequence file (e_outseq) Filename extractfeat.e_outseq ("" , " -outseq=" + str(value))[value is not None] 14 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 15 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 16 Programs-5.1.1/prophet.xml0000644000175000001560000002276612072525233014373 0ustar bneronsis prophet EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net prophet Scan one or more sequences with a Gribskov or Henikoff profile http://bioweb2.pasteur.fr/docs/EMBOSS/prophet.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:profiles sequence:protein:profiles prophet e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_infile Profile or weight matrix file ProfileOrMatrix AbstractText ("", " -infile=" + str(value))[value is not None] 2 e_required Required section e_gapopen Gap opening coefficient (value from 0.0 to 100.0) Float 1.0 ("", " -gapopen=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 100.0 is required value <= 100.0 3 e_gapextend Gap extension coefficient (value from 0.0 to 100.0) Float 1.0 ("", " -gapextend=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 100.0 is required value <= 100.0 4 e_output Output section e_outfile Name of the output alignment file Filename prophet.align ("" , " -outfile=" + str(value))[value is not None] 5 e_aformat_outfile Choose the alignment output format Choice SIMPLE FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH ("", " -aformat=" + str(value))[value is not None and value!=vdef] 6 e_outfile_out outfile_out option Alignment e_aformat_outfile in ['FASTA', 'MSF'] e_outfile e_outfile_out2 outfile_out2 option Text e_aformat_outfile in ['PAIR', 'MARKX0', 'MARKX1', 'MARKX2', 'MARKX3', 'MARKX10', 'SRS', 'SRSPAIR', 'SCORE', 'UNKNOWN', 'MULTIPLE', 'SIMPLE', 'MATCH'] e_outfile auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/golden.xml0000644000175000001560000001377712127560631014166 0ustar bneronsis golden 1.1a GOLDEN Fetch a database entry N. Joly ftp://ftp.pasteur.fr/pub/gensoft/projects/golden/ database:search:sequence golden db Database Choice null " $db:" " " + db + ":" 2 query Query (Entry name or Accession number) String "$value" str(value) 3 ac Search with Accession number only (-a) Boolean 0 ($value) ? " -a" : "" ( "" , " -a" )[ value ] 1 Id Search with entry name only (-i) Boolean 0 ($value) ? " -i" : "" ( "" , " -i" )[ value ] 1 nucleic_sequence_out Sequence $db =~ /^(embl|genbank|imgt|rdpii)$/ db in ( 'embl' , 'genbank' , 'imgt' ,'rdpii' ) DNA Sequence EMBL GENBANK EMBL GENBANK GENBANK "golden.out" "golden.out" protein_sequence_out Sequence $db =~ /^(genpept|uniprot)$/ db in ( 'genpept' , 'uniprot' ) Protein Sequence GENBANK SWISSPROT "golden.out" "golden.out" refseq_out Sequence $db eq 'refseq' db == 'refseq' DNA Protein Sequence GENBANK "golden.out" "golden.out" motif_out Motif $db eq 'prosite' db == 'prosite' Protein AbstractText Motif PROSITE "golden.out" "golden.out" enzyme_out Enzyme $db eq 'enzyme' db == 'enzyme' Protein AbstractText Enzyme ENZYME "golden.out" "golden.out" Programs-5.1.1/checktrans.xml0000644000175000001560000002532112072525233015025 0ustar bneronsis checktrans EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net checktrans Reports STOP codons and ORF statistics of a protein http://bioweb2.pasteur.fr/docs/EMBOSS/checktrans.html http://emboss.sourceforge.net/docs/themes sequence:protein:composition checktrans e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_orfml Minimum orf length to report (value greater than or equal to 1) Integer 100 ("", " -orfml=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 2 e_additional Additional section e_addlast Force the sequence to end with an asterisk Boolean 1 (" -noaddlast", "")[ bool(value) ] 3 An asterisk in the protein sequence indicates the position of a STOP codon. Checktrans assumes that all ORFs end in a STOP codon. Forcing the sequence to end with an asterisk, if there is not one there already, makes checktrans treat the end as a potential ORF. If an asterisk is added, it is not included in the reported count of STOPs. e_output Output section e_outfile Name of the output file (e_outfile) Filename checktrans.e_outfile ("" , " -outfile=" + str(value))[value is not None] 4 e_outfile_out outfile_out option ChecktransReport Report e_outfile e_outseq Name of the output sequence file (e_outseq) Filename checktrans.e_outseq ("" , " -outseq=" + str(value))[value is not None] 5 Sequence file to hold output ORF sequences e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 6 e_outseq_out outseq_out option Sequence e_outseq e_outfeat Name of the output feature file (e_outfeat) Protein Filename checktrans.e_outfeat ("" , " -outfeat=" + str(value))[value is not None] 7 File for output features e_offormat_outfeat Choose the feature output format Protein Choice GFF GFF EMBL SWISSPROT NBRF CODATA ("", " -offormat=" + str(value))[value is not None and value!=vdef] 8 e_outfeat_out outfeat_out option Protein Feature AbstractText e_outfeat auto Turn off any prompting String " -auto -stdout" 9 Programs-5.1.1/pars.xml0000644000175000001560000012431011745213176013651 0ustar bneronsis pars pars Discrete character parsimony http://bioweb2.pasteur.fr/docs/phylip/doc/pars.html PARS is a general parsimony program which carries out the Wagner parsimony method with multiple states. Wagner parsimony allows changes among all states. The criterion is to find the tree which requires the minimum number of changes. phylogeny:parsimony pars String "pars < pars.params" "pars < pars.params" 0 infile Input File PhylipDiscreteCharMatrix AbstractText $infile ne "infile" infile != "infile" "ln -s $infile infile && " "ln -s " + str( infile ) +" infile && " -10 5 6 Alpha 110110 Beta 110000 Gamma 100110 Delta 001001 Epsilon 001110 Warnning: if you want to perform a bootstrap (seqboot method) before your pars analysis your data must be in sequential format. input_format Input File Format Choice if you intend to perform pars alone, your data must be in Phylip format interleaved or sequential. But if you want to perform a bootsrap (seqboot) before the pars analysis, the data must be in Phylip sequential format. ( defined $seqboot_or_jumble and ( $seqboot_or_jumble >= 0 and $seqboot_or_jumble < 5 ) )? (value eq 'sequential') : 1 (value == 'sequential') if ( ( seqboot_or_jumble is not None ) and ( int( seqboot_or_jumble ) >= 0 and int( seqboot_or_jumble ) < 5) ) else True if seqboot is selected, the discrete character matrix must be in sequential format. sequential sequential "I\\n" "I\n" interleaved "" "" 90 pars.params pars_opt Pars options search_opt Search option (S) Choice 0 0 "" "" 1 "S\\nY\\n" "S\nY\n" 2 "S\\nN\\n" "S\nN\n" 1 PARS is a general parsimony program which carries out the Wagner parsimony method with multiple states. Wagner parsimony allows changes among all states. The criterion is to find the tree which requires the minimum number of changes. The Wagner method was originated by Eck and Dayhoff (1966) and by Kluge and Farris (1969). Here are its assumptions: 1. Ancestral states are unknown. 2. Different characters evolve independently. 3. Different lineages evolve independently. 4. Changes to all other states are equally probable (Wagner). 5. These changes are a priori improbable over the evolutionary time spans involved in the differentiation of the group in question. 6. Other kinds of evolutionary event such as retention of polymorphism are far less probable than these state changes. 7. Rates of evolution in different lineages are sufficiently low that two changes in a long segment of the tree are far less probable than one change in a short segment. PARS can handle both bifurcating and multifurcating trees. In doing its search for most parsimonious trees, it adds species not only by creating new forks in the middle of existing branches, but it also tries putting them at the end of new branches which are added to existing forks. Thus it searches among both bifurcating and multifurcating trees. If a branch in a tree does not have any characters which might change in that branch in the most parsimonious tree, it does not save that tree. Thus in any tree that results, a branch exists only if some character has a most parsimonious reconstruction that would involve change in that branch. pars.params save_trees Number of trees to save? (V) Integer 100 (defined $value and $value != $vdef) ? "V\\n$value\\n" : "" ("", "V\n"+str(value)+"\n")[value is not None and value != vdef] 1 pars.params weight_opt Weight options weights Weighted sites (W) Boolean 0 ($value) ? "W\\n" : "" ("" , "W\n")[ value ] 1 The weights follow the format described in the main documentation file, with integer weights from 0 to 35 allowed by using the characters 0, 1, 2, ..., 9 and A, B, ... Z. pars.params weight_file Weight file PhylipWeight AbstractText $weights weights "ln -s $weight_file weights && " "ln -s " + str( weight_file ) + " weights && " -9 user_tree_opt User tree options user_tree Use user tree (default: no, search for best tree)? (U) Boolean 0 ($value) ? "U\\n" : "" ("", "U\n")[ value ] You cannot randomize (jumble) your dataset and give a user tree at the same time not ( $user_tree and $jumble ) not ( user_tree and jumble ) 1 To give your tree to the program, you must normally put it in the alignment file, after the sequences, preceded by a line indicating how many trees you give. Here, this will be automatically appended: just give a treefile and the number of trees in it. pars.params tree_file User Tree file Tree NEWICK $user_tree user_tree (defined $value) ? "cat $tree_file >> intree && " : "" ("", " cat "+ str( tree_file ) + " >> intree && " )[value is not None] -1 Give a tree whenever the infile does not already contain the tree. tree_nb How many tree(s) in the User Tree file Integer defined $tree_file tree_file is not None 1 "echo $value >> intree && " "echo " + str( value ) + " >> intree && " -2 Give this information whenever the infile does not already contain the tree. jumble_bootstrap Bootstrap and Jumble options By selecting this option, the bootstrap will be performed on your sequence file. So you don't need to perform a separated seqboot before. Don't give an already bootstrapped file to the program, this won't work! seqboot_or_jumble I want to Choice null null 0 1 2 3 4 5 The resampling methods available are: The bootstrap. Bootstrapping was invented by Bradley Efron in 1979, and its use in phylogeny estimation was introduced by me (Felsenstein, 1985b; see also Penny and Hendy, 1985). It involves creating a new data set by sampling N characters randomly with replacement, so that the resulting data set has the same size as the original, but some characters have been left out and others are duplicated. The random variation of the results from analyzing these bootstrapped data sets can be shown statistically to be typical of the variation that you would get from collecting new data sets. The method assumes that the characters evolve independently, an assumption that may not be realistic for many kinds of data. Delete-half-jackknifing. This alternative to the bootstrap involves sampling a random half of the characters, and including them in the data but dropping the others. The resulting data sets are half the size of the original, and no characters are duplicated. The random variation from doing this should be very similar to that obtained from the bootstrap. The method is advocated by Wu (1986). It was mentioned by me in my bootstrapping paper (Felsenstein, 1985b), and has been available for many years in this program as an option. Note that, for the present, block-jackknifing is not available, because I cannot figure out how to do it straightforwardly when the block size is not a divisor of the number of characters. Permuting species within characters. This method of resampling (well, OK, it may not be best to call it resampling) was introduced by Archie (1989) and Faith (1990; see also Faith and Cranston, 1991). It involves permuting the columns of the data matrix separately. This produces data matrices that have the same number and kinds of characters but no taxonomic structure. It is used for different purposes than the bootstrap, as it tests not the variation around an estimated tree but the hypothesis that there is no taxonomic structure in the data: if a statistic such as number of steps is significantly smaller in the actual data than it is in replicates that are permuted, then we can argue that there is some taxonomic structure in the data (though perhaps it might be just the presence of aa pair of sibling species). Permuting characters. This simply permutes the order of the characters, the same reordering being applied to all species. For many methods of tree inference this will make no difference to the outcome (unless one has rates of evolution correlated among adjacent sites). It is included as a possible step in carrying out a permutation test of homogeneity of characters (such as the Incongruence Length Difference test). Permuting characters separately for each species. This is a method introduced by Steel, Lockhart, and Penny (1993) to permute data so as to destroy all phylogenetic structure, while keeping the base composition of each species the same as before. It shuffles the character order separately for each species. Jumble the tree construction programs the exact details of the search of different trees depend on the order of input of species. In these programs J option enables you to tell the program to use a random number generator to choose the input order of species. seed Random number seed (must be odd) Integer defined $seqboot_or_jumble seqboot_or_jumble is not None Random number seed must be odd $value > 0 and (($value % 2) != 0) value > 0 and (( value % 2 ) != 0 ) The seqboot and jumble seed option should be an integer between 1 and 32767, and should of form 4n+1, which means that it must give a remainder of 1 when divided by 4. This can be judged by looking at the last two digits of the number. Each different seed leads to a different sequence of addition of species. By simply changing the random number seed and re-running the programs one can look for other, and better trees. If the seed entered is not odd, the program will not proceed. seqboot Integer ( defined $seqboot_or_jumble ) and ( $seqboot_or_jumble >= 0 and $ seqboot_or_jumble < 5 ) ( seqboot_or_jumble is not None ) and ( int( seqboot_or_jumble ) >= 0 and int( seqboot_or_jumble ) < 5) ($value) ? "seqboot <seqboot.params && mv outfile seqboot.outfile && rm infile && ln -s seqboot.outfile infile && " : "" "seqboot <seqboot.params && mv outfile seqboot.outfile && rm infile && ln -s seqboot.outfile infile && " -5 seqboot_method Integer ( defined $seqboot_or_jumble ) and ( $seqboot_or_jumble >= 0 and $seqboot_or_jumble < 5 ) ( seqboot_or_jumble is not None ) and ( int( seqboot_or_jumble ) >= 0 and int( seqboot_or_jumble ) < 5) qw ( D\\n D\\nJ\\n D\\nJ\\nJ\\n D\\nJ\\nJ\\nJ\\n D\\nJ\\nJ\\nJ\\nJ\\n )[$seqboot_or_jumble] ( 'D\n' , 'D\nJ\n' , 'D\nJ\nJ\n' , 'D\nJ\nJ\nJ\n' , 'D\nJ\nJ\nJ\nJ\n' , )[ int( seqboot_or_jumble ) ] 1 seqboot.params seqboot_seed Integer ( defined $seqboot_or_jumble ) and ( $seqboot_or_jumble >= 0 and $ seqboot_or_jumble < 5 ) ( seqboot_or_jumble is not None ) and ( int( seqboot_or_jumble >= 0 ) and int( seqboot_or_jumble ) < 5) "$seed\\n" str( seed ) + "\n" 1000 seqboot.params seqboot_replicates How many replicates Integer ( defined $seqboot_or_jumble ) and ( $seqboot_or_jumble >= 0 and $ seqboot_or_jumble < 5 ) ( seqboot_or_jumble is not None ) and ( int( seqboot_or_jumble ) >= 0 and int( seqboot_or_jumble ) < 5) 100 "R\\n$value\\n" "R\n" + str( value ) + "\n" This server allows no more than 1000 replicates $value <= 1000 value <= 1000 Bad data sets number: it must be greater than 1 $value > 1 value > 1 20 This option is mandatory if you select a seqboot method. This value indicate how many set of data you will generate. This option could generate huge data and should be used with discernment. If you provide 10 "sequences" of 1000 char lenght each (a file of ~10Kb) and select 1000 replicates wou will generate 10.000 sequences of 1000 char lenght (a file of ~10Mb which could lead to some problems to view or download the results) seqboot.params bootconfirm String ( defined $seqboot_or_jumble ) and ( $seqboot_or_jumble >= 0 and $ seqboot_or_jumble < 5 ) ( seqboot_or_jumble is not None ) and ( int( seqboot_or_jumble ) >= 0 and int( seqboot_or_jumble ) < 5) "y\\n" "y\n" 100 seqboot.params bootterminal_type String ( defined $seqboot_or_jumble ) and ( $seqboot_or_jumble >= 0 and $ seqboot_or_jumble < 5 ) ( seqboot_or_jumble is not None ) and ( int( seqboot_or_jumble ) >= 0 and int( seqboot_or_jumble ) < 5) "0\\n" "0\n" -1 seqboot.params jumble Integer ( defined $seqboot_or_jumble ) and ( $seqboot_or_jumble eq "5" ) ( seqboot_or_jumble is not None ) and ( seqboot_or_jumble == "5" ) "J\\n$seed\\n$jumble_times\\n" 'J\n' + str( seed ) + "\n" + str( jumble_times ) +"\n" 10 pars.params jumble_times Number of times to jumble Integer defined $seqboot_or_jumble seqboot_or_jumble is not None the product of "number of times to jumble" and replicates ( if defined ) must be less than 100000 ( (seqboot_or_jumble is not None ) and ( int( seqboot_or_jumble ) >= 0 and int( seqboot_or_jumble ) < 5)) ? ($jumble_times * $seqboot_replicates) <= 100000 : ( $value < 1000000 ) (jumble_times * seqboot_replicates) <= 100000 if ( ( seqboot_or_jumble is not None ) and ( int( seqboot_or_jumble ) >= 0 and int( seqboot_or_jumble ) < 5) ) else (value < 1000000 ) the minimum times to jumble is 1 $jumble_times >= 1 jumble_times >= 1 The Seqboot or Jumble options also causes the program to ask you how many times you want to restart the jumble process. If you answer 10, the program will try ten different orders of species in constructing the trees, and the results printed out will reflect this entire search process (that is, the best trees found among all 10 runs will be printed out, not the best trees from each individual run). multiple_dataset String defined $seqboot_or_jumble and $seqboot_or_jumble ne "5" seqboot_or_jumble is not None and seqboot_or_jumble != "5" "M\nD\n" + str( seqboot_replicates ) + "\n" + str( seed ) +"\n" + str( jumble_times ) +"\n" "M\nD\n" + str( seqboot_replicates ) + "\n" + str( seed ) +"\n" + str( jumble_times ) +"\n" 20 pars.params consense Compute a consensus tree ( seqboot ) Boolean ( defined $seqboot_or_jumble and seqboot_or_jumble ne "5" ) and $print_treefile ( seqboot_or_jumble is not None ) and ( seqboot_or_jumble != "5" ) 0 ($value) ? " && cp infile pars.infile && cp pars.outtree intree && consense <consense.params && mv outtree consense.outtree && mv outfile consense.outfile" : "" ( "" , " && cp infile pars.infile && cp pars.outtree intree && consense <consense.params && mv outtree consense.outtree && mv outfile consense.outfile" )[ value ] 30 this option make sense only if you have multiple data set ( seqboot ) output Output options print_tree Print out tree (3) Boolean 1 ($value) ? "" : "3\\n" ("3\n", "")[ value ] 1 Tells the program to print a semi-graphical picture of the tree in the outfile. pars.params print_step Print out steps in each character (4) Boolean 0 ($value) ? "4\\n" : "" ("", "4\n")[ value ] 1 pars.params print_states Print states at all nodes of tree (5) Boolean 0 ($value) ? "5\\n" : "" ("", "5\n")[ value ] 1 pars.params print_treefile Write out trees onto tree file (6) Boolean 1 ($value) ? "" : "6\\n" ("6\n", "")[ value ] 1 Tells the program to save the tree in a treefile (a standard representation of trees where the tree is specified by a nested pairs of parentheses, enclosing names and separated by commas). pars.params printdata Print out the data at start of run (1) Boolean 0 ($value) ? "1\\n" : "" ("", "1\n")[ value ] 1 pars.params parsimony_opt Parsimony options use_threshold Use Threshold parsimony (T) Boolean 0 ($value) ? "T\\n$threshold\\n" : "" ("", "T\n" + str(threshold) + "\n")[ value ] 3 pars.params threshold Threshold parsimony value Integer $use_threshold use_threshold "" "" You must enter a numeric value, greater than 1 $threshold > 1 threshold > 1 2 pars.params other_options Other options outgroup Outgroup root species (O) Integer 1 (defined $value and $value != $vdef) ? "o\\n$value\\n" : "" ("", "o\n" + str(value) + "\n")[value is not None and value!=vdef] Please enter a value greater than 0 $value > 0 value > 0 1 The O (Outgroup) option specifies which species is to have the root of the tree be on the line leading to it. For example, if the outgroup is a species "Mouse" then the root of the tree will be placed in the middle of the branch which is connected to this species, with Mouse branching off on one side of the root and the lineage leading to the rest of the tree on the other. This option is toggle on by choosing the number of the outgroup (the species being taken in the numerical order that they occur in the input file). Outgroup-rooting will not be attempted if it is a user-defined tree, despite your invoking the option. When it is used, the tree as printed out is still listed as being an unrooted tree, though the outgroup is connected to the bottommost node so that it is easy to visually convert the tree into rooted form. pars.params outfile Pars output file Text " && mv outfile pars.outfile" " && mv outfile pars.outfile" "pars.outfile" "pars.outfile" treefile Pars output tree Tree NEWICK $print_treefile print_treefile " && mv outtree pars.outtree" " && mv outtree pars.outtree" "pars.outtree" "pars.outtree" seqboot_out seqboot outfile SetOfPhylipDiscreteCharMatrix AbstractText ( defined $seqboot_or_jumble ) and ( $seqboot_or_jumble >= 0 and $ seqboot_or_jumble < 5 ) ( seqboot_or_jumble is not None ) and ( int( seqboot_or_jumble ) >= 0 and int( seqboot_or_jumble ) < 5) 40 "seqboot.outfile" "seqboot.outfile" confirm String "y\\n" "y\n" 1000 pars.params terminal_type String "0\\n" "0\n" -1 pars.params consense_confirm String $consense consense "Y\\n" "Y\n" 1000 consense.params consense_terminal_type String $consense consense "T\\n" "T\n" -2 consense.params consense_outfile Consense output file Text $consense consense "consense.outfile" "consense.outfile" consense_treefile Consense output tree Tree NEWICK $consense consense "consense.outtree" "consense.outtree" Programs-5.1.1/transeq.xml0000644000175000001560000003250612072525233014360 0ustar bneronsis transeq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net transeq Translate nucleic acid sequences http://bioweb2.pasteur.fr/docs/EMBOSS/transeq.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:translation transeq e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_additional Additional section e_frame Translation frames (value from 1 to 6) Choice 1 1 2 3 F -1 -2 -3 R 6 ("", " -frame=" + str(value))[value is not None and value!=vdef] 2 e_table Genetic codes Choice 0 0 1 2 3 4 5 6 9 10 11 12 13 14 15 16 21 22 23 ("", " -table=" + str(value))[value is not None and value!=vdef] 3 e_regions Regions to translate (eg: 4-57,78-94) String ("", " -regions=" + str(value))[value is not None] 4 Regions to translate. If this is left blank, then the complete sequence is translated. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 Note: you should not try to use this option with any other frame than the default, -frame=1 e_trim Trim trailing x's and *'s Boolean 0 ("", " -trim")[ bool(value) ] 5 This removes all 'X' and '*' characters from the right end of the translation. The trimming process starts at the end and continues until the next character is not a 'X' or a '*' e_clean Change all *'s to x's Boolean 0 ("", " -clean")[ bool(value) ] 6 This changes all STOP codon positions from the '*' character to 'X' (an unknown residue). This is useful because some programs will not accept protein sequences with '*' characters in them. e_advanced Advanced section e_alternative Define frame '-1' as starting in the last codon Boolean 0 ("", " -alternative")[ bool(value) ] 7 The default definition of frame '-1' is the reverse-complement of the set of codons used in frame 1. (Frame -2 is the set of codons used by frame 2, similarly frames -3 and 3). This is a common standard, used by the Staden package and other programs. If you prefer to define frame '-1' as using the set of codons starting with the last codon of the sequence, then set this to be true. e_output Output section e_outseq Name of the output sequence file (e_outseq) Protein Filename outseq.pep ("" , " -outseq=" + str(value))[value is not None] 8 e_osformat_outseq Choose the sequence output format Protein Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 9 e_outseq_out outseq_out option Protein Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 10 Programs-5.1.1/repeatoire.xml0000644000175000001560000002670211767572177015066 0ustar bneronsis repeatoire 1.0 repeatoire Locating DNA repeats inside of sequenced genome Todd Treangen, Aaron Darling and Eduardo PC Rocha Todd J. Treangen, Aaron E. Darling, Guillaume Achaz, Mark A. Ragan, Xavier Messeguer, Eduardo P.C. Rocha, "A Novel Heuristic for Local Multiple Alignment of Interspersed DNA Repeats," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 6, no. 2, pp. 180-189 http://bioweb2.pasteur.fr/docs/repeatoire/Repeatoire_UserGuide.pdf http://wwwabi.snv.jussieu.fr/public/Repeatoire/ http://wwwabi.snv.jussieu.fr/public/Repeatoire/ sequence:nucleic:repeats repeatoire infile Input sequence DNA Sequence FASTA " --sequence=$value" " --sequence=" + str(value) Fasta sequence input file 1 main Main options 1 seedSize Seed weight Integer 17 (defined $value and $value != $vdef) ? " --z=$value" : "" ( "" , " --z=" + str(value))[ value is not None and value != vdef ] Forms the foundation of the repeats families. Smaller values are more sensitive but allow for more noise (less specific), whereas larger values are less sensitive but more specific. onlydirect Only process seed matches on same strand Boolean 0 (defined $value) ? " --onlydirect=1" : "" ( "" , " --onlydirect=1")[ value ] Often one wants to only analyze repeats in the same orientation. Else, program will include inverted repeats in the output. extend Perform gapped extension Boolean 1 (defined $value and $value != $vdef) ? "" : " --extend=0" ( " --extend=0" , "" )[ value ] If gapped extension is disabled, output will be the local multiple alignments of the gapped chains. large-repeats Optimize for large repeats Boolean 1 (defined $value) ? "" : " --large-repeats=0" ( " --large-repeats=0" , "" )[ value ] minreplen Minimum final repeat length Integer 100 (defined $value and $value != $vdef) ? " --minreplen=$value" : "" ( "" , " --minreplen=" + str(value))[ value is not None and value != vdef ] To increase specificity when wanting to focus on larger repeats, this option will only return repeat families containing repeats > value maxmulti Maximum repeat multiplicity Integer 500 (defined $value and $value != $vdef) ? " --maxmulti=$value" : "" ( "" , " --maxmulti=" + str(value))[ value is not None and value != vdef ] Option to increase specificity. If one knows a priori that high copy repeats can be excluded from the analysis, this option can be used to discard all repeat families with more repeat copies than this specified value. minmulti Minimum repeat multiplicity Integer 2 (defined $value and $value != $vdef) ? " --minmulti=$value" : "" ( "" , " --minmulti=" + str(value))[ value is not None and value != vdef ] Similar to maxmulti, this option can be used to discard all repeat families that have fewer copies than this value. solid Use solid/exact seeds Boolean 0 (defined $value) ? " --solid=1" : "" ( "" , " --solid=1")[ value ] By default, the program uses palindromic spaces seed patterns to find seed matches inside of the input sequence. However, this can be turned off, which will use solid seed patterns (no mismatches in the seed matches allowed). expert Expert options 1 redundant Allow redundant alignments Boolean 0 (defined $value) ? " --allow-redundant=1" : "" ( "" , " --allow-redundant=1" )[ value ] two_hits Require two hits for gapped extension Boolean 0 (defined $value) ? " --two-hits=1" : "" ( "" , " --two-hits=1" )[ value ] chain Chain seeds Boolean 1 (defined $value) ? "" : " --chain=0" ( " --chain=0" , "" )[ value ] xml XML format output Boolean 1 (defined $value) ? " --xml=repeatoire.xml" : "" ( "" , " --xml=repeatoire.xml" )[ value ] gle_output General results Report "reps.out" "reps.out" stat_output Statistical results Report "stats.highest" "stats.highest" xml_output Xml output Report $xml xml "repeatoire.xml" "repeatoire.xml" Programs-5.1.1/showfeat.xml0000644000175000001560000005472712072525233014534 0ustar bneronsis showfeat EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net showfeat Display features of a sequence in pretty format http://bioweb2.pasteur.fr/docs/EMBOSS/showfeat.html http://emboss.sourceforge.net/docs/themes display:feature_table showfeat e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_additional Additional section e_sourcematch Source of feature to display String ("", " -sourcematch=" + str(value))[value is not None] 2 By default any feature source in the feature table is shown. You can set this to match any feature source you wish to show. The source name is usually either the name of the program that detected the feature or it is the feature table (eg: EMBL) that the feature came from. The source may be wildcarded by using '*'. If you wish to show more than one source, separate their names with the character '|', eg: gene* | embl e_typematch Type of feature to display String ("", " -typematch=" + str(value))[value is not None] 3 By default any feature type in the feature table is shown. You can set this to match any feature type you wish to show. See http://www.ebi.ac.uk/embl/WebFeat/ for a list of the EMBL feature types and see Appendix A of the Swissprot user manual in http://www.expasy.org/sprot/userman.html for a list of the Swissprot feature types. The type may be wildcarded by using '*'. If you wish to show more than one type, separate their names with the character '|', eg: *UTR | intron e_tagmatch Tag of feature to display String ("", " -tagmatch=" + str(value))[value is not None] 4 Tags are the types of extra values that a feature may have. For example in the EMBL feature table, a 'CDS' type of feature may have the tags '/codon', '/codon_start', '/db_xref', '/EC_number', '/evidence', '/exception', '/function', '/gene', '/label', '/map', '/note', '/number', '/partial', '/product', '/protein_id', '/pseudo', '/standard_name', '/translation', '/transl_except', '/transl_table', or '/usedin'. Some of these tags also have values, for example '/gene' can have the value of the gene name. By default any feature tag in the feature table is shown. You can set this to match any feature tag you wish to show. The tag may be wildcarded by using '*'. If you wish to show more than one tag, separate their names with the character '|', eg: gene | label e_valuematch Value of feature tags to display String ("", " -valuematch=" + str(value))[value is not None] 5 Tag values are the values associated with a feature tag. Tags are the types of extra values that a feature may have. For example in the EMBL feature table, a 'CDS' type of feature may have the tags '/codon', '/codon_start', '/db_xref', '/EC_number', '/evidence', '/exception', '/function', '/gene', '/label', '/map', '/note', '/number', '/partial', '/product', '/protein_id', '/pseudo', '/standard_name', '/translation', '/transl_except', '/transl_table', or '/usedin'. Only some of these tags can have values, for example '/gene' can have the value of the gene name. By default any feature tag value in the feature table is shown. You can set this to match any feature tag value you wish to show. The tag value may be wildcarded by using '*'. If you wish to show more than one tag value, separate their names with the character '|', eg: pax* | 10 e_sort Sorting features Choice start source start type nosort ("", " -sort=" + str(value))[value is not None and value!=vdef] 6 e_joinfeatures Join coding regions together Boolean 0 ("", " -joinfeatures")[ bool(value) ] 7 e_annotation Regions to mark (eg: 4-57 promoter region 78-94 first exon) String ("", " -annotation=" + str(value))[value is not None] 8 Regions to annotate by marking. If this is left blank, then no annotation is added. A set of regions is specified by a set of pairs of positions followed by optional text. The positions are integers. They are followed by any text (but not digits when on the command-line). Examples of region specifications are: 24-45 new domain 56-78 match to Mouse 1-100 First part 120-156 oligo A file of ranges to annotate (one range per line) can be specified as '@filename'. e_advanced Advanced section e_html Use html formatting Boolean 0 ("", " -html")[ bool(value) ] 9 e_id Show sequence id Boolean 1 (" -noid", "")[ bool(value) ] 10 Set this to be false if you do not wish to display the ID name of the sequence. e_description Show description Boolean 1 (" -nodescription", "")[ bool(value) ] 11 Set this to be false if you do not wish to display the description of the sequence. e_scale Show scale line Boolean 1 (" -noscale", "")[ bool(value) ] 12 Set this to be false if you do not wish to display the scale line. e_width Width of graphics lines (value greater than or equal to 0) Integer 60 ("", " -width=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 13 You can expand (or contract) the width of the ASCII-character graphics display of the positions of the features using this value. For example, a width of 80 characters would cover a standard page width and a width a 10 characters would be nearly unreadable. If the width is set to less than 4, the graphics lines and the scale line will not be displayed. e_collapse Display features with the same type on one line Boolean 0 ("", " -collapse")[ bool(value) ] 14 If this is set, then features from the same source and of the same type and sense are all printed on the same line. For instance if there are several features from the EMBL feature table (ie. the same source) which are all of type 'exon' in the same sense, then they will all be displayed on the same line. This makes it hard to distinguish overlapping features. If this is set to false then each feature is displayed on a separate line making it easier to distinguish where features start and end. e_forward Display forward sense features Boolean 1 (" -noforward", "")[ bool(value) ] 15 Set this to be false if you do not wish to display forward sense features. e_reverse Display reverse sense features Boolean 1 (" -noreverse", "")[ bool(value) ] 16 Set this to be false if you do not wish to display reverse sense features. e_unknown Display unknown sense features Boolean 1 (" -nounknown", "")[ bool(value) ] 17 Set this to be false if you do not wish to display unknown sense features. (ie. features with no directionality - all protein features are of this type and some nucleic features (for example, CG-rich regions)). e_strand Display strand of features Boolean 0 ("", " -strand")[ bool(value) ] 18 Set this if you wish to display the strand of the features. Protein features are always directionless (indicated by '0'), forward is indicated by '+' and reverse is '-'. e_origin Display source of features Boolean 0 ("", " -origin")[ bool(value) ] 19 Set this if you wish to display the origin of the features. The source name is usually either the name of the program that detected the feature or it is the name of the feature table (eg: EMBL) that the feature came from. e_position Display position of features Boolean 0 ("", " -position")[ bool(value) ] 20 Set this if you wish to display the start and end position of the features. If several features are being displayed on the same line, then the start and end positions will be joined by a comma, for example: '189-189,225-225'. e_type Display type of features Boolean 1 (" -notype", "")[ bool(value) ] 21 Set this to be false if you do not wish to display the type of the features. e_tags Display tags of features Boolean 0 ("", " -tags")[ bool(value) ] 22 Set this to be false if you do not wish to display the tags and values of the features. e_values Display tag values of features Boolean 1 (" -novalues", "")[ bool(value) ] 23 Set this to be false if you do not wish to display the tag values of the features. If this is set to be false, only the tag names will be displayed. If the tags are not displayed, then the values will not be displayed. The value of the 'translation' tag is never displayed as it is often extremely long. e_stricttags Only display the matching tags Boolean 0 ("", " -stricttags")[ bool(value) ] 24 By default if any tag/value pair in a feature matches the specified tag and value, then all the tags/value pairs of that feature will be displayed. If this is set to be true, then only those tag/value pairs in a feature that match the specified tag and value will be displayed. e_output Output section e_outfile Name of the output file (e_outfile) Filename outfile.showfeat ("" , " -outfile=" + str(value))[value is not None] 25 e_outfile_out outfile_out option ShowfeatReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 26 Programs-5.1.1/rbvotree.xml0000644000175000001560000000657511767572177014565 0ustar bneronsis rbvotree 1.0 rbvotree Report bootstrap values found with consensus program on related tree found with phylogeny algorithm without bootstrap analysis. C. Maufrais phylogeny:tree_analyser rbvotree njtree Tree File (Newick standard form)(-n) Tree NEWICK " -n $value" " -n " + str(value) 1 constree Consensus tree file with bootstrap values (Newick standard form)(-c) Tree NEWICK " -c $value" " -c " + str(value) 2 output Output options outname Name of output tree file (-o) Filename (defined $value)? " -o $value" : "" ("", " -o " + str(value)) [ value is not None] 3 outfile_name Output tree file Tree NEWICK defined $outname outname is not None "$outname" str(outname) outfile Output tree file Tree NEWICK not defined $outname outname is None "rbvotree.out" "rbvotree.out" Programs-5.1.1/wobble.xml0000644000175000001560000002266012072525233014155 0ustar bneronsis wobble EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net wobble Plot third base position variability in a nucleotide sequence http://bioweb2.pasteur.fr/docs/EMBOSS/wobble.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:gene_finding wobble e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_additional Additional section e_window Window size in codons (value greater than or equal to 1) Integer 30 ("", " -window=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 2 e_advanced Advanced section e_bases Bases used String GC ("", " -bases=" + str(value))[value is not None and value!=vdef] 3 e_output Output section e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 4 xy_goutfile Name of the output graph Filename wobble_xygraph ("" , " -goutfile=" + str(value))[value is not None] 5 xy_outgraph_png Graph file Picture Binary e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_graph == "data" "*.dat" e_outfile Name of the output file (e_outfile) Filename wobble.e_outfile ("" , " -outfile=" + str(value))[value is not None] 6 e_outfile_out outfile_out option WobbleReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/banana.xml0000644000175000001560000002157112072525233014123 0ustar bneronsis banana EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net banana Plot bending and curvature data for B-DNA http://bioweb2.pasteur.fr/docs/EMBOSS/banana.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:composition banana e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_anglesfile Dna base trimer roll angles data file BaseTrimerRollAngles AbstractText ("", " -anglesfile=" + str(value))[value is not None ] 2 e_output Output section e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 3 e_goutfile Name of the output graph Filename banana_graph ("" , " -goutfile=" + str(value))[value is not None] 4 outgraph_png Graph file Picture Binary e_graph == "png" "*.png" outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" outgraph_data Graph file Text e_graph == "data" "*.dat" e_residuesperline Number of residues to be displayed on each line Integer 50 ("", " -residuesperline=" + str(value))[value is not None and value!=vdef] 5 e_outfile Name of the output file (e_outfile) Filename banana.e_outfile ("" , " -outfile=" + str(value))[value is not None] 6 e_outfile_out outfile_out option BananaReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/gruppi.xml0000644000175000001560000000436211767572177014233 0ustar bneronsis gruppi 1.0 gruppi clusters of binding sites Marco Pontoglio ftp://ftp.pasteur.fr/pub/GenSoft/unix/nucleic_acid/ sequence:nucleic:pattern sequence:nucleic:clusters gruppi String "gruppi " "gruppi " sequences Sequences DNA Sequence NBRF " $sequences" " " + str(sequences) 1 output Gruppi output GruppiReport Report "gruppi.out" "gruppi.out" Programs-5.1.1/concatfasta.xml0000644000175000001560000000447711441651470015201 0ustar bneronsis concatfasta 1.0 concatfasta Concatenation of sequences from two fasta files with exactly the same sequence identifiers Maufrais Corinne sequence:formatter concatfasta input Input section firstFile Fasta file Sequence FASTA 1,1 ("", " -f " + str(value))[value is not None] 1 scdFile Fasta file Sequence FASTA 1,1 ("", " -s " + str(value))[value is not None] 2 outfile Sequence(s) file Sequence FASTA 1,n "concatfasta.out" "concatfasta.out" Programs-5.1.1/rnaeval.xml0000644000175000001560000002336711767656243014360 0ustar bneronsis rnaeval RNAeval Calculate energy of RNA sequences on given secondary structure Hofacker, Stadler I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994) Fast Folding and Comparison of RNA Secondary Structures. Monatshefte f. Chemie 125: 167-188 http://www.tbi.univie.ac.at/RNA/RNAeval.html RNAeval evaluates the free energy of an RNA molecule in fixed secondary structure. sequence:nucleic:2D_structure structure:2D_structure rnaeval String "RNAeval" "RNAeval" seqstruct Sequences/Structures File RNAStructure AbstractText " < $value" " < " + str(value) Sequences and structures are read alternately from stdin. The energy in Kcal/Mol is written to stdout. The program will continue to read new sequences and structures until a line consisting of the single character "@" or an end of file condition is encountered. If the input sequence or structure contains the separator character "&" the program calculates the energy of the co-folding of two RNA strands, where the "&" marks the boundary between the two strands. ACGAUCAGAGAUCAGAGCAUACGACAGCAG ..((((...))))...((........)).. UUUUUUUAAUAUAAA&AGACAAAAAGUCGGG .(.(.(()).).)..&.(((.....)))... @ 1000 others_options Other options 2 temperature Rescale energy parameters to a temperature of temp C. (-T) Float 37.0 (defined $value and $value != $vdef)? " -T $value" : "" ( "" , " -T " + str(value) )[ value is not None and value != vdef] tetraloops Do not include special stabilizing energies for certain tetraloops (-4) Boolean 0 ($value)? " -4" : "" ( "" , " -4" )[ value ] dangling How to treat dangling end energies for bases adjacent to helices in free ends and multiloops (-d) Choice -d1 -d1 -d -d2 (defined $value and $value ne $vdef)? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] How to treat 'dangling end' energies for bases adjacent to helices in free ends and multiloops: Normally only unpaired bases can participate in at most one dangling end. -d Do not give stabilizing energies to unpaired bases adjacent to helices in multiloops and free ends ("dangling ends"). Same as -d0, opposite of -d1 (the default). -d2 Treat dangling ends as in the partition function algorithm, i.e. bases adjacent to helices in multiloops and free ends give a stabilizing energy contribution, regardless whether they're paired or unpaired. -d3 Allow coaxial stacking of adjacent helices in multi-loops. logML Let multiloop energies depend logarithmically on the size (-logML) Boolean 0 ($value)? " -logML" : "" ( "" , " -logML" )[ value ] Let multiloop energies depend logarithmically on the size, instead of the usual linear energy function. circular -circ Assume a cricular (rather than linear) RNA molecule. Boolean 0 ($value)? " -circ" : "" ( "" , " -circ" )[ value ] parameter Energy parameter file (-P) EnergyParameterFile AbstractText (defined $value)? " -P $value" : "" ( "" , " -P " + str(value) )[ value is not None ]

Read energy parameters from paramfile, instead of using the default parameter set. ( documentation for details on the file format.)

output_options Output options verbose Print out energy contribution of each loop in the structure. Boolean 0 ($value)? " -v ": "" ( "" , " -v ")[ value ]
Programs-5.1.1/edialign.xml0000644000175000001560000003246312072525233014461 0ustar bneronsis edialign EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net edialign Local multiple alignment of sequences http://bioweb2.pasteur.fr/docs/EMBOSS/edialign.html http://emboss.sourceforge.net/docs/themes alignment:multiple edialign e_input Input section e_sequences sequences option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 2,n ("", " -sequences=" + str(value))[value is not None] 1 e_additional Additional section e_nucmode Nucleic acid sequence alignment mode Choice n n nt ma ("", " -nucmode=" + str(value))[value is not None and value!=vdef] 2 Nucleic acid sequence alignment mode (simple, translated or mixed) e_revcomp Also consider the reverse complement Boolean 0 ("", " -revcomp")[ bool(value) ] 3 e_overlapw Use overlap weights Choice 1 1 2 3 ("", " -overlapw=" + str(value))[value is not None and value!=vdef] 4 By default overlap weights are used when Nseq =<35 but you can set this to 'yes' or 'no' e_linkage Clustering method to construct sequence tree Choice UPGMA UPGMA max min ("", " -linkage=" + str(value))[value is not None and value!=vdef] 5 Clustering method to construct sequence tree (UPGMA, minimum linkage or maximum linkage) e_maxfragl Maximum fragment length (value greater than or equal to 0) Integer 40 ("", " -maxfragl=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 6 e_fragmat Consider only n-fragment pairs that start with two matches Boolean 0 ("", " -fragmat")[ bool(value) ] 7 e_fragsim Consider only p-fragment pairs if first amino acid or codon pair has similarity score of at least n (value greater than or equal to 0) Integer 4 ("", " -fragsim=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 8 e_itscore Use iterative score Boolean 0 ("", " -itscore")[ bool(value) ] 9 e_threshold Threshold for considering diagonal for alignment (value greater than or equal to 0.0) Float 0.0 ("", " -threshold=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 10 e_output Output section e_mask Replace unaligned characters by stars '*' rather then putting them in lowercase Boolean 0 ("", " -mask")[ bool(value) ] 11 e_dostars Activate writing of stars instead of numbers Boolean 0 ("", " -dostars")[ bool(value) ] 12 e_starnum Put up to n stars '*' instead of digits 0-9 to indicate level of conservation (value greater than or equal to 0) Integer 4 ("", " -starnum=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 13 e_outfile Name of the output file (e_outfile) Filename edialign.e_outfile ("" , " -outfile=" + str(value))[value is not None] 14 e_outfile_out outfile_out option EdialignReport Report e_outfile e_outseq Name of the output sequence file (e_outseq) Filename edialign.e_outseq ("" , " -outseq=" + str(value))[value is not None] 15 e_outseq_out outseq_out option Text e_outseq auto Turn off any prompting String " -auto -stdout" 16 Programs-5.1.1/showalign.xml0000644000175000001560000006775011672346320014712 0ustar bneronsis showalign EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net showalign Display a multiple sequence alignment in pretty format http://bioweb2.pasteur.fr/docs/EMBOSS/showalign.html http://emboss.sourceforge.net/docs/themes alignment:multiple:display showalign e_input Input section e_sequence sequence option Alignment FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH 1,n ("", " -sequence=" + str(value))[value is not None] 1 The sequence alignment to be displayed. e_matrix Similarity scoring matrix file Choice mobyle_null mobyle_null EBLOSUM30 EBLOSUM35 EBLOSUM40 EBLOSUM45 EBLOSUM50 EBLOSUM55 EBLOSUM60 EBLOSUM62 EBLOSUM62-12 EBLOSUM65 EBLOSUM70 EBLOSUM75 EBLOSUM80 EBLOSUM85 EBLOSUM90 EBLOSUMN EDNAFULL EDNAMAT EDNASIMPLE EPAM10 EPAM100 EPAM110 EPAM120 EPAM130 EPAM140 EPAM150 EPAM160 EPAM170 EPAM180 EPAM190 EPAM20 EPAM200 EPAM210 EPAM220 EPAM230 EPAM240 EPAM250 EPAM260 EPAM270 EPAM280 EPAM290 EPAM30 EPAM300 EPAM310 EPAM320 EPAM330 EPAM340 EPAM350 EPAM360 EPAM370 EPAM380 EPAM390 EPAM40 EPAM400 EPAM410 EPAM420 EPAM430 EPAM440 EPAM450 EPAM460 EPAM470 EPAM480 EPAM490 EPAM50 EPAM500 EPAM60 EPAM70 EPAM80 EPAM90 SSSUB ("", " -matrix=" + str(value))[value is not None and value!=vdef] 2 This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. e_additional Additional section e_refseq The number or the name of the reference sequence String 0 ("", " -refseq=" + str(value))[value is not None and value!=vdef] 3 If you give the number in the alignment or the name of a sequence, it will be taken to be the reference sequence. The reference sequence is always shown in full and is the one against which all the other sequences are compared. If this is set to 0 then the consensus sequence will be used as the reference sequence. By default the consensus sequence is used as the reference sequence. e_bottom Display the reference sequence at the bottom Boolean 1 (" -nobottom", "")[ bool(value) ] 4 If this is true then the reference sequence is displayed at the bottom of the alignment instead of the top. e_show What to show Choice N A I N S D ("", " -show=" + str(value))[value is not None and value!=vdef] 5 e_order Output order of the sequences Choice I I A S ("", " -order=" + str(value))[value is not None and value!=vdef] 6 e_similarcase Show similar residues in lower-case Boolean 1 (" -nosimilarcase", "")[ bool(value) ] 7 If this is set True, then when -show is set to 'Similarities' or 'Non-identities' and a residue is similar but not identical to the reference sequence residue, it will be changed to lower-case. If -show is set to 'All' then non-identical, non-similar residues will be changed to lower-case. If this is False then no change to the case of the residues is made on the basis of their similarity to the reference sequence. e_consensus Display the consensus line Boolean 1 (" -noconsensus", "")[ bool(value) ] 8 If this is true then the consensus line is displayed. e_advanced Advanced section e_uppercase Regions to put in uppercase (eg: 4-57,78-94) String ("", " -uppercase=" + str(value))[value is not None] 9 Regions to put in uppercase. If this is left blank, then the sequence case is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 e_number Number the sequences Boolean 1 (" -nonumber", "")[ bool(value) ] 10 If this option is true then a line giving the positions in the alignment is displayed every 10 characters above the alignment. e_ruler Display ruler Boolean 1 (" -noruler", "")[ bool(value) ] 11 If this option is true then a ruler line marking every 5th and 10th character in the alignment is displayed. e_width Width of sequence to display (value greater than or equal to 1) Integer 60 ("", " -width=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 12 e_margin Length of margin for sequence names (value greater than or equal to -1) Integer -1 ("", " -margin=" + str(value))[value is not None and value!=vdef] Value greater than or equal to -1 is required value >= -1 13 This sets the length of the left-hand margin for sequence names. If the margin is set at 0 then no margin and no names are displayed. If the margin is set to a value that is less than the length of a sequence name then the sequence name is displayed truncated to the length of the margin. If the margin is set to -1 then the minimum margin width that will allow all the sequence names to be displayed in full plus a space at the end of the name will automatically be selected. e_html Use html formatting Boolean 0 ("", " -html")[ bool(value) ] 14 e_highlight Regions to colour in html (eg: 4-57 red 78-94 green) String ("", " -highlight=" + str(value))[value is not None] 15 Regions to colour if formatting for HTML. If this is left blank, then the sequence is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are followed by any valid HTML font colour. Examples of region specifications are: 24-45 blue 56-78 orange 1-100 green 120-156 red A file of ranges to colour (one range per line) can be specified as '@filename'. e_plurality Plurality check % for consensus (value from 0.0 to 100.0) Float 50.0 ("", " -plurality=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 100.0 is required value <= 100.0 16 Set a cut-off for the % of positive scoring matches below which there is no consensus. The default plurality is taken as 50% of the total weight of all the sequences in the alignment. e_setcase Threshold above which the consensus is given in uppercase Float ("", " -setcase=" + str(value))[value is not None] 17 Sets the threshold for the scores of the positive matches above which the consensus is in upper-case and below which the consensus is in lower-case. By default this is set to be half of the (weight-adjusted) number of sequences in the alignment. e_identity Required % of identities at a position for consensus (value from 0.0 to 100.0) Float 0.0 ("", " -identity=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 100.0 is required value <= 100.0 18 Provides the facility of setting the required number of identities at a position for it to give a consensus. Therefore, if this is set to 100% only columns of identities contribute to the consensus. e_gaps Use gap characters in consensus Boolean 1 (" -nogaps", "")[ bool(value) ] 19 If this option is true then gap characters can appear in the consensus. The alternative is 'N' for nucleotide, or 'X' for protein e_output Output section e_outfile Name of the output file (e_outfile) Filename showalign.e_outfile ("" , " -outfile=" + str(value))[value is not None] 20 e_outfile_out outfile_out option ShowalignReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 21 Programs-5.1.1/featcopy.xml0000644000175000001560000001113412072525233014507 0ustar bneronsis featcopy EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net featcopy Reads and writes a feature table http://bioweb2.pasteur.fr/docs/EMBOSS/featcopy.html http://emboss.sourceforge.net/docs/themes sequence:edit featcopy e_input Input section e_features features option Features AbstractText ("", " -features=" + str(value))[value is not None] 1 e_output Output section e_outfeat Name of the output feature file (e_outfeat) Filename featcopy.e_outfeat ("" , " -outfeat=" + str(value))[value is not None] 2 e_offormat_outfeat Choose the feature output format Choice GFF GFF EMBL SWISSPROT NBRF CODATA ("", " -offormat=" + str(value))[value is not None and value!=vdef] 3 e_outfeat_out outfeat_out option Feature AbstractText e_outfeat auto Turn off any prompting String " -auto -stdout" 4 Programs-5.1.1/lvb.xml0000644000175000001560000001547311526743220013473 0ustar bneronsis lvb 2.3 LVB Reconstructing Evolution With Parsimony And Simulated Annealing D. Barker Barker, D. 2004. LVB: Parsimony and simulated annealing in the search for phylogenetic trees. Bioinformatics, 20, 274-275. http://biology.st-andrews.ac.uk/cegg/lvb.htm http://biology.st-andrews.ac.uk/cegg/lvb.htm http://biology.st-andrews.ac.uk/cegg/lvb.htm phylogeny:parsimony lvb String "lvb <lvb.params" "lvb <lvb.params" 0 infile Alignment File Alignment PHYLIPI $infile ne "infile" infile != "infile" "ln -s $value infile; " "ln -s " + str(value) + " infile; " -1 format Format is INTERLEAVED or SEQUENTIAL Choice I I S "$value\\n" str(value) + "\n" 1 lvb.params gaps_treatment Treatment of gaps represented by '-' Choice U U F "$value\\n" str(value) + "\n" 2 A gap represented by the letter 'O' in the data matrix is always treated as a character state in its own right (fifth state). lvb can treat gaps represented by'-' in either of the following ways: Fifth state '-' is treated as equivalent to 'O'. Unknown '-' is treated as equivalent to '?', i.e., as an ambiguous site that may contain 'A' or 'C' or 'G' or 'T' or 'O'. 'Fifth state' may give excessive weight to multi-site gaps, since each affected base will be counted as one event. lvb.params seed Seed for the random number generator Integer (defined $value)? "$value\\n" : "\\n" ( "\n" , str(value) + "\n" )[ value is not None ] 3 When prompted for the random number seed, enter an integer in the range 0 to 900000000 inclusive. The default value is taken from the system clock and hence will vary from one analysis to the next, changing every second. The default is usually appropriate. lvb.params bootstrap How many bootstrap replicates required (bootstrap) Integer 0 "$value\\n" str(value) + "\n" This server allows no more than 1000 replicates $bootstrap < 1000 bootstrap <= 1000 5 Number of bootstrap replicates required as an integer in the range 1 to 1000 inclusive, or 0 for no bootstrapping (this server allows no more than 1000 replicates) lvb.params res Tree file Tree NEWICK "outtree" "outtree" other_results Results files Text "stat*" "sum" "log" "data" "ini*" "stat*" "sum" "log" "data" "ini*" Programs-5.1.1/fastaRename.xml0000644000175000001560000000626311767601016015136 0ustar bneronsis fastaRename 1.0 fasta header shortener helps out with the 10-character limit of the PHYLIP-PHYML formats Bertrand Néron
Due to an incompatibility between the PHYLIP format and phyml and morePhyml named rules, the using of long identifier in phyml or morePhyml failed. We proposed the following workaround:
  1. use fastaRename to generate an alignment with short ID and a file of ID mapping
  2. perform your analysis with the alignment with short id
  3. replace the short IDs in your tree (in NEWICK format) with nw_rename and the file of IDs mapping generated at the step 1.
sequence:formatter fastaRename input Input sequence file >VERY_VERY_LONG_ID1 ---AAAAGAAAATAGTNNTTCTGGTTGATCCTGCCAGAGGCCATTGCTATCAGGGTNTGACTAAGCCATGCGAGTCGAGAGGTGT-------AAGACCTCGGC ATACTGCTCAGTAACAC >>VERY_VERY_LONG_ID2 --------------------------AACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGCGAAAGGGG--GCTTCGGCGGGGGGAGTAGAGTGGC GCACGGGTGAGTAACGC >>VERY_VERY_LONG_ID3 TTATGGAGAGTTTGATCCTGGCTCAGAGTGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAACGATGAAGCTTTTAGCTTGCTAGAAGTGGATTAGTGGC GCACGGGTGAGTAATGC Alignment FASTA " $value" " " + str(value) 1 output_fasta Output file with renamed headers type any kind of text example text Alignment FASTA "*.rename.fasta" "*.rename.fasta" output_map Output file header map ID_Mapping AbstractText "*.map" "*.map"
Programs-5.1.1/prettyseq.xml0000644000175000001560000002315112072525233014737 0ustar bneronsis prettyseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net prettyseq Write a nucleotide sequence and its translation to file http://bioweb2.pasteur.fr/docs/EMBOSS/prettyseq.html http://emboss.sourceforge.net/docs/themes display:nucleic:translation prettyseq e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_range Range(s) to translate String ("", " -range=" + str(value))[value is not None] 2 Whole sequence e_additional Additional section e_table Genetic codes Choice 0 0 1 2 3 4 5 6 9 10 11 12 13 14 15 16 21 22 23 ("", " -table=" + str(value))[value is not None and value!=vdef] 3 e_ruler Add a ruler Boolean 1 (" -noruler", "")[ bool(value) ] 4 e_plabel Number translations Boolean 1 (" -noplabel", "")[ bool(value) ] 5 e_nlabel Number dna sequence Boolean 1 (" -nonlabel", "")[ bool(value) ] 6 e_advanced Advanced section e_width Width of screen (value greater than or equal to 10) Integer 60 ("", " -width=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 10 is required value >= 10 7 e_output Output section e_outfile Name of the output file (e_outfile) Filename prettyseq.e_outfile ("" , " -outfile=" + str(value))[value is not None] 8 e_outfile_out outfile_out option PrettyseqReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 9 Programs-5.1.1/showseq.xml0000644000175000001560000012541312072525233014374 0ustar bneronsis showseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net showseq Displays sequences with features in pretty format http://bioweb2.pasteur.fr/docs/EMBOSS/showseq.html http://emboss.sourceforge.net/docs/themes display:nucleic:restriction display:nucleic:translation showseq e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_mfile Restriction enzyme methylation data file (optional) RestrictionEnzymeMethylationData AbstractText ("", " -mfile=" + str(value))[value is not None ] 2 e_required Required section e_format Things to display Choice 2 0 1 2 3 4 5 6 7 8 ("", " -format=" + str(value))[value is not None and value!=vdef] 3 e_things Specify your own things to display (value from 1 to 100) String e_format=="0" B,N,T,S,A,F ("", " -things=" + str(value))[value is not None and value!=vdef] 4 Specify a list of one or more code characters in the order in which you wish things to be displayed one above the other down the page. S: Sequence B: Blank line 1: Frame1 translation 2: Frame2 translation 3: Frame3 translation -1: Compframe1 translation -2: Compframe2 translation -3: Compframe3 translation T: Ticks line N: Number ticks line C: Complement sequence F: Features R: Restriction enzyme cut sites in forward sense -R: Restriction enzyme cut sites in reverse sense A: Annotation For example if you wish to see things displayed in the order: sequence, complement sequence, ticks line, frame 1 translation, blank line; then you should enter 'S,C,T,1,B'. e_additional Additional section e_translate Regions to translate (eg: 4-57,78-94) String ("", " -translate=" + str(value))[value is not None] 5 Regions to translate (if translating). If this is left blank the complete sequence is translated. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 e_revtranslate Regions to translate in reverse direction (eg: 78-94,4-57) String ("", " -revtranslate=" + str(value))[value is not None] 6 Regions to translate (if translating). If this is left blank the complete sequence is translated. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 78-56, 45-24, 888..765, 99=67; 45:1 e_uppercase Regions to put in uppercase (eg: 4-57,78-94) String ("", " -uppercase=" + str(value))[value is not None] 7 Regions to put in uppercase. If this is left blank, then the sequence case is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 e_highlight Regions to colour in html (eg: 4-57 red 78-94 green) String ("", " -highlight=" + str(value))[value is not None] 8 Regions to colour if formatting for HTML. If this is left blank, then the sequence is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are followed by any valid HTML font colour. Examples of region specifications are: 24-45 blue 56-78 orange 1-100 green 120-156 red A file of ranges to colour (one range per line) can be specified as '@filename'. e_annotation Regions to mark (eg: 4-57 promoter region 78-94 first exon) String ("", " -annotation=" + str(value))[value is not None] 9 Regions to annotate by marking. If this is left blank, then no annotation is added. A set of regions is specified by a set of pairs of positions followed by optional text. The positions are integers. They are followed by any text (but not digits when on the command-line). Examples of region specifications are: 24-45 new domain 56-78 match to Mouse 1-100 First part 120-156 oligo A file of ranges to annotate (one range per line) can be specified as '@filename'. e_enzymes Comma separated restriction enzyme list String all ("", " -enzymes=" + str(value))[value is not None and value!=vdef] 10 The name 'all' reads in all enzyme names from the REBASE database. You can specify enzymes by giving their names with commas between then, such as: 'HincII,hinfI,ppiI,hindiii'. The case of the names is not important. You can specify a file of enzyme names to read in by giving the name of the file holding the enzyme names with a '@' character in front of it, for example, '@enz.list'. Blank lines and lines starting with a hash character or '!' are ignored and all other lines are concatenated together with a comma character ',' and then treated as the list of enzymes to search for. An example of a file of enzyme names is: ! my enzymes HincII, ppiII ! other enzymes hindiii HinfI PpiI e_table Genetic codes Choice 0 0 1 2 3 4 5 6 9 10 11 12 13 14 15 16 21 22 23 ("", " -table=" + str(value))[value is not None and value!=vdef] 11 e_featuresection Feature display options e_sourcematch Source of feature to display String ("", " -sourcematch=" + str(value))[value is not None] 12 By default any feature source in the feature table is shown. You can set this to match any feature source you wish to show. The source name is usually either the name of the program that detected the feature or it is the feature table (eg: EMBL) that the feature came from. The source may be wildcarded by using '*'. If you wish to show more than one source, separate their names with the character '|', eg: gene* | embl e_typematch Type of feature to display String ("", " -typematch=" + str(value))[value is not None] 13 By default any feature type in the feature table is shown. You can set this to match any feature type you wish to show. See http://www.ebi.ac.uk/embl/WebFeat/ for a list of the EMBL feature types and see Appendix A of the Swissprot user manual in http://www.expasy.org/sprot/userman.html for a list of the Swissprot feature types. The type may be wildcarded by using '*'. If you wish to show more than one type, separate their names with the character '|', eg: *UTR | intron e_sensematch Sense of feature to display (value from -1 to 1) Integer 0 ("", " -sensematch=" + str(value))[value is not None and value!=vdef] Value greater than or equal to -1 is required value >= -1 Value less than or equal to 1 is required value <= 1 14 By default any feature type in the feature table is shown. You can set this to match any feature sense you wish to show. 0 - any sense, 1 - forward sense, -1 - reverse sense e_minscore Minimum score of feature to display Float 0.0 ("", " -minscore=" + str(value))[value is not None and value!=vdef] 15 Minimum score of feature to display (see also maxscore) e_maxscore Maximum score of feature to display Float 0.0 ("", " -maxscore=" + str(value))[value is not None and value!=vdef] 16 Maximum score of feature to display. If both minscore and maxscore are zero (the default), then any score is ignored e_tagmatch Tag of feature to display String ("", " -tagmatch=" + str(value))[value is not None] 17 Tags are the types of extra values that a feature may have. For example in the EMBL feature table, a 'CDS' type of feature may have the tags '/codon', '/codon_start', '/db_xref', '/EC_number', '/evidence', '/exception', '/function', '/gene', '/label', '/map', '/note', '/number', '/partial', '/product', '/protein_id', '/pseudo', '/standard_name', '/translation', '/transl_except', '/transl_table', or '/usedin'. Some of these tags also have values, for example '/gene' can have the value of the gene name. By default any feature tag in the feature table is shown. You can set this to match any feature tag you wish to show. The tag may be wildcarded by using '*'. If you wish to show more than one tag, separate their names with the character '|', eg: gene | label e_valuematch Value of feature tags to display String ("", " -valuematch=" + str(value))[value is not None] 18 Tag values are the values associated with a feature tag. Tags are the types of extra values that a feature may have. For example in the EMBL feature table, a 'CDS' type of feature may have the tags '/codon', '/codon_start', '/db_xref', '/EC_number', '/evidence', '/exception', '/function', '/gene', '/label', '/map', '/note', '/number', '/partial', '/product', '/protein_id', '/pseudo', '/standard_name', '/translation', '/transl_except', '/transl_table', or '/usedin'. Only some of these tags can have values, for example '/gene' can have the value of the gene name. By default any feature tag value in the feature table is shown. You can set this to match any feature tag value you wish to show. The tag value may be wildcarded by using '*'. If you wish to show more than one tag value, separate their names with the character '|', eg: pax* | 10 e_stricttags Only display the matching tags Boolean 0 ("", " -stricttags")[ bool(value) ] 19 By default if any tag/value pair in a feature matches the specified tag and value, then all the tags/value pairs of that feature will be displayed. If this is set to be true, then only those tag/value pairs in a feature that match the specified tag and value will be displayed. e_advanced Advanced section e_remapsection Restriction map options e_flatreformat Display re sites in flat format Boolean 0 ("", " -flatreformat")[ bool(value) ] 20 This changes the output format to one where the recognition site is indicated by a row of '===' characters and the cut site is pointed to by a '>' character in the forward sense, or a '<' in the reverse sense strand. e_mincuts Minimum cuts per re (value from 1 to 1000) Integer 1 ("", " -mincuts=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 Value less than or equal to 1000 is required value <= 1000 21 This sets the minimum number of cuts for any restriction enzyme that will be considered. Any enzymes that cut fewer times than this will be ignored. e_maxcuts Maximum cuts per re Integer 2000000000 ("", " -maxcuts=" + str(value))[value is not None and value!=vdef] 22 This sets the maximum number of cuts for any restriction enzyme that will be considered. Any enzymes that cut more times than this will be ignored. e_sitelen Minimum recognition site length (value from 2 to 20) Integer 4 ("", " -sitelen=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 2 is required value >= 2 Value less than or equal to 20 is required value <= 20 23 This sets the minimum length of the restriction enzyme recognition site. Any enzymes with sites shorter than this will be ignored. e_single Force single re site only cuts Boolean 0 ("", " -single")[ bool(value) ] 24 If this is set then this forces the values of the mincuts and maxcuts qualifiers to both be 1. Any other value you may have set them to will be ignored. e_blunt Allow blunt end re cutters Boolean 1 (" -noblunt", "")[ bool(value) ] 25 This allows those enzymes which cut at the same position on the forward and reverse strands to be considered. e_sticky Allow sticky end re cutters Boolean 1 (" -nosticky", "")[ bool(value) ] 26 This allows those enzymes which cut at different positions on the forward and reverse strands, leaving an overhang, to be considered. e_ambiguity Allow ambiguous re matches Boolean 1 (" -noambiguity", "")[ bool(value) ] 27 This allows those enzymes which have one or more 'N' ambiguity codes in their pattern to be considered e_plasmid Allow circular dna Boolean 0 ("", " -plasmid")[ bool(value) ] 28 If this is set then this allows searches for restriction enzyme recognition site and cut positions that span the end of the sequence to be considered. e_methylation Use methylation data Boolean 0 ("", " -methylation")[ bool(value) ] 29 If this is set then RE recognition sites will not match methylated bases. e_commercial Only use restriction enzymes with suppliers Boolean 1 (" -nocommercial", "")[ bool(value) ] 30 If this is set, then only those enzymes with a commercial supplier will be searched for. This qualifier is ignored if you have specified an explicit list of enzymes to search for, rather than searching through 'all' the enzymes in the REBASE database. It is assumed that, if you are asking for an explicit enzyme, then you probably know where to get it from and so all enzymes names that you have asked to be searched for, and which cut, will be reported whether or not they have a commercial supplier. e_limit Limits re hits to one isoschizomer Boolean 1 (" -nolimit", "")[ bool(value) ] 31 This limits the reporting of enzymes to just one enzyme from each group of isoschizomers. The enzyme chosen to represent an isoschizomer group is the prototype indicated in the data file 'embossre.equ', which is created by the program 'rebaseextract'. If you prefer different prototypes to be used, make a copy of embossre.equ in your home directory and edit it. If this value is set to be false then all of the input enzymes will be reported. You might like to set this to false if you are supplying an explicit set of enzymes rather than searching 'all' of them. e_orfminsize Minimum size of orfs (value greater than or equal to 0) Integer 0 ("", " -orfminsize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 32 This sets the minimum size of Open Reading Frames (ORFs) to display in the translations. All other translation regions are masked by changing the amino acids to '-' characters. e_threeletter Display protein sequences in three-letter code Boolean 0 ("", " -threeletter")[ bool(value) ] 33 e_number Number the sequences Boolean 0 ("", " -number")[ bool(value) ] 34 e_width Width of sequence to display (value greater than or equal to 1) Integer 60 ("", " -width=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 35 e_length Line length of page (0 for indefinite) (value greater than or equal to 0) Integer 0 ("", " -length=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 36 e_margin Margin around sequence for numbering (value greater than or equal to 0) Integer 10 ("", " -margin=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 37 e_name Show sequence id Boolean 1 (" -noname", "")[ bool(value) ] 38 Set this to be false if you do not wish to display the ID name of the sequence e_description Show description Boolean 1 (" -nodescription", "")[ bool(value) ] 39 Set this to be false if you do not wish to display the description of the sequence e_offset Offset to start numbering the sequence from Integer 1 ("", " -offset=" + str(value))[value is not None and value!=vdef] 40 e_html Use html formatting Boolean 0 ("", " -html")[ bool(value) ] 41 e_output Output section e_outfile Name of the output file (e_outfile) Filename showseq.e_outfile ("" , " -outfile=" + str(value))[value is not None] 42 e_outfile_out outfile_out option ShowseqReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 43 Programs-5.1.1/sixpack.xml0000644000175000001560000004772212072525233014353 0ustar bneronsis sixpack EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net sixpack Display a DNA sequence with 6-frame translation and ORFs http://bioweb2.pasteur.fr/docs/EMBOSS/sixpack.html http://emboss.sourceforge.net/docs/themes display:nucleic:gene_finding display:nucleic:translation sixpack e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_additional Additional section e_table Genetic codes Choice 0 0 1 2 3 4 5 6 9 10 11 12 13 14 15 16 21 22 23 ("", " -table=" + str(value))[value is not None and value!=vdef] 2 Genetics code used for the translation e_firstorf Orf at the beginning of the sequence Boolean 1 (" -nofirstorf", "")[ bool(value) ] 3 Count the beginning of a sequence as a possible ORF, even if it's inferior to the minimal ORF size. e_lastorf Orf at the end of the sequence Boolean 1 (" -nolastorf", "")[ bool(value) ] 4 Count the end of a sequence as a possible ORF, even if it's not finishing with a STOP, or inferior to the minimal ORF size. e_mstart Orf start with an m Boolean 0 ("", " -mstart")[ bool(value) ] 5 Displays only ORFs starting with an M. e_output Output section e_outfile Name of the output file (e_outfile) Filename sixpack.e_outfile ("" , " -outfile=" + str(value))[value is not None] 6 e_outfile_out outfile_out option SixpackReport Report e_outfile e_outseq Name of the output sequence file (e_outseq) Protein Filename sixpack.e_outseq ("" , " -outseq=" + str(value))[value is not None] 7 ORF sequence output e_osformat_outseq Choose the sequence output format Protein Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 8 e_outseq_out outseq_out option Protein Sequence e_outseq e_reverse Display translation of reverse sense Boolean 1 (" -noreverse", "")[ bool(value) ] 9 Display also the translation of the DNA sequence in the 3 reverse frames e_orfminsize Minimum size of orfs (value greater than or equal to 1) Integer 1 ("", " -orfminsize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 10 Minimum size of Open Reading Frames (ORFs) to display in the translations. e_uppercase Regions to put in uppercase (eg: 4-57,78-94) String ("", " -uppercase=" + str(value))[value is not None] 11 Regions to put in uppercase. If this is left blank, then the sequence case is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 e_highlight Regions to colour in html (eg: 4-57 red 78-94 green) String ("", " -highlight=" + str(value))[value is not None] 12 Regions to colour if formatting for HTML. If this is left blank, then the sequence is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are followed by any valid HTML font colour. Examples of region specifications are: 24-45 blue 56-78 orange 1-100 green 120-156 red A file of ranges to colour (one range per line) can be specified as '@filename'. e_number Number the sequences Boolean 1 (" -nonumber", "")[ bool(value) ] 13 Number the sequence at the beginning and the end of each line. e_width Width of sequence to display (value greater than or equal to 1) Integer 60 ("", " -width=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 14 Number of nucleotides displayed on each line e_length Line length of page (0 for indefinite) (value greater than or equal to 0) Integer 0 ("", " -length=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 15 e_margin Margin around sequence for numbering. (value greater than or equal to 0) Integer 10 ("", " -margin=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 16 e_name Display sequence id Boolean 1 (" -noname", "")[ bool(value) ] 17 Set this to be false if you do not wish to display the ID name of the sequence. e_description Display description Boolean 1 (" -nodescription", "")[ bool(value) ] 18 Set this to be false if you do not wish to display the description of the sequence. e_offset Offset to start numbering the sequence from Integer 1 ("", " -offset=" + str(value))[value is not None and value!=vdef] 19 Number from which you want the DNA sequence to be numbered. e_html Use html formatting Boolean 0 ("", " -html")[ bool(value) ] 20 auto Turn off any prompting String " -auto -stdout" 21 Programs-5.1.1/hmmstat.xml0000644000175000001560000000222511767572177014376 0ustar bneronsis hmmstat HMMSTAT Sisplay summary statistics for a profile file hmm:statistic hmmstat Name Name of the HMM profile HmmProfile AbstractText " $value" " "+str(value) 2 Programs-5.1.1/matcher.xml0000644000175000001560000005007012072525233014322 0ustar bneronsis matcher EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net matcher Waterman-Eggert local alignment of two sequences http://bioweb2.pasteur.fr/docs/EMBOSS/matcher.html http://emboss.sourceforge.net/docs/themes alignment:pairwise:local matcher e_input Input section e_asequence asequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -asequence=" + str(value))[value is not None] 1 e_bsequence bsequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -bsequence=" + str(value))[value is not None] 2 e_datafile Matrix file Choice mobyle_null mobyle_null EBLOSUM30 EBLOSUM35 EBLOSUM40 EBLOSUM45 EBLOSUM50 EBLOSUM55 EBLOSUM60 EBLOSUM62 EBLOSUM62-12 EBLOSUM65 EBLOSUM70 EBLOSUM75 EBLOSUM80 EBLOSUM85 EBLOSUM90 EBLOSUMN EDNAFULL EDNAMAT EDNASIMPLE EPAM10 EPAM100 EPAM110 EPAM120 EPAM130 EPAM140 EPAM150 EPAM160 EPAM170 EPAM180 EPAM190 EPAM20 EPAM200 EPAM210 EPAM220 EPAM230 EPAM240 EPAM250 EPAM260 EPAM270 EPAM280 EPAM290 EPAM30 EPAM300 EPAM310 EPAM320 EPAM330 EPAM340 EPAM350 EPAM360 EPAM370 EPAM380 EPAM390 EPAM40 EPAM400 EPAM410 EPAM420 EPAM430 EPAM440 EPAM450 EPAM460 EPAM470 EPAM480 EPAM490 EPAM50 EPAM500 EPAM60 EPAM70 EPAM80 EPAM90 SSSUB ("", " -datafile=" + str(value))[value is not None and value!=vdef] 3 This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. e_additional Additional section e_alternatives Number of alternative matches (value greater than or equal to 1) Integer 1 ("", " -alternatives=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 4 This sets the number of alternative matches output. By default only the highest scoring alignment is shown. A value of 2 gives you other reasonable alignments. In some cases, for example multidomain proteins of cDNA and genomic DNA comparisons, there may be other interesting and significant alignments. e_gapopen Gap penalty (Positive integer) Integer ("", " -gapopen=" + str(value))[value is not None] Value greater than or equal to 0 is required value >= 0 5 The gap penalty is the score taken away when a gap is created. The best value depends on the choice of comparison matrix. The default value of 14 assumes you are using the EBLOSUM62 matrix for protein sequences, or a value of 16 and the EDNAFULL matrix for nucleotide sequences. e_gapextend Gap length penalty (Positive integer) Integer ("", " -gapextend=" + str(value))[value is not None] Value greater than or equal to 0 is required value >= 0 6 The gap length, or gap extension, penalty is added to the standard gap penalty for each base or residue in the gap. This is how long gaps are penalized. Usually you will expect a few long gaps rather than many short gaps, so the gap extension penalty should be lower than the gap penalty. An exception is where one or both sequences are single reads with possible sequencing errors in which case you would expect many single base gaps. You can get this result by setting the gap penalty to zero (or very low) and using the gap extension penalty to control gap scoring. e_output Output section e_outfile Name of the output alignment file Filename matcher.align ("" , " -outfile=" + str(value))[value is not None] 7 e_aformat_outfile Choose the alignment output format Choice MARKX0 FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH ("", " -aformat=" + str(value))[value is not None and value!=vdef] 8 e_outfile_out outfile_out option Alignment e_aformat_outfile in ['FASTA', 'MSF'] e_outfile e_outfile_out2 outfile_out2 option Text e_aformat_outfile in ['PAIR', 'MARKX0', 'MARKX1', 'MARKX2', 'MARKX3', 'MARKX10', 'SRS', 'SRSPAIR', 'SCORE', 'UNKNOWN', 'MULTIPLE', 'SIMPLE', 'MATCH'] e_outfile auto Turn off any prompting String " -auto -stdout" 9 Programs-5.1.1/patmatdb.xml0000644000175000001560000001746312072525233014504 0ustar bneronsis patmatdb EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net patmatdb Searches protein sequences with a sequence motif http://bioweb2.pasteur.fr/docs/EMBOSS/patmatdb.html http://emboss.sourceforge.net/docs/themes sequence:protein:motifs patmatdb e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_motif Protein motif to search for String ("", " -motif=" + str(value))[value is not None] 2 Patterns for patmatdb are based on the format of pattern used in the PROSITE database. For example: '[DE](2)HS{P}X(2)PX(2,4)C' means two Asps or Glus in any order followed by His, Ser, any residue other then Pro, then two of any residue followed by Pro followed by two to four of any residue followed by Cys. The search is case-independent, so 'AAA' matches 'aaa'. e_output Output section e_outfile Name of the report file Filename patmatdb.report ("" , " -outfile=" + str(value))[value is not None] 3 e_rformat_outfile Choose the report output format Choice DBMOTIF DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 4 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/featreport.xml0000644000175000001560000001632012072525233015052 0ustar bneronsis featreport EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net featreport Reads and writes a feature table http://bioweb2.pasteur.fr/docs/EMBOSS/featreport.html http://emboss.sourceforge.net/docs/themes sequence:edit featreport e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_features features option Features AbstractText ("", " -features=" + str(value))[value is not None] 2 e_output Output section e_outfile Name of the report file Filename featreport.report ("" , " -outfile=" + str(value))[value is not None] 3 e_rformat_outfile Choose the report output format Choice GFF DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 4 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/hmmalign.xml0000644000175000001560000002661011767572177014521 0ustar bneronsis hmmalign HMMALIGN Align sequences to a profile HMM hmm:alignment:multiple alignment:multiple:hmm hmmcmd String "hmmalign" "hmmalign" 0 infile Input file file Input sequences file 3 Input sequence file may be in any unaligned or aligned file format. If it is in a multiple alignment format (e.g. Stockholm, MSF, ClustalW), the existing alignment is ignored (i.e., the sequences are read as if they were unaligned). seqfile Sequences File Sequence FASTA EMBL Genbank Uniprot 2,n " $seqfile" " " + str(seqfile) hmmfile Profile HMM file HmmProfile AbstractText " $value" " "+str(value) 2 expert_options Expert Options 1 biotype Assert sequences and hmm files are both in same type Choice null null --amino --dna --rna (defined $value and $value ne $vdef)? " $value" : "" ("", " " + str(value))[ value is not None and value != vdef] The alphabet type (amino, DNA, or RNA) is autodetected by default, by looking at the composition of the msafile. Autodetection is normally quite reliable, but occasionally alphabet type may be ambiguous and autodetection can fail (for instance, on tiny toy alignments of just a few residues). To avoid this, or to increase robustness in automated analysis pipelines, you may specify the alphabet type of msafile with these options. Protein: Specify that all sequences in seqfile are proteins. By default, alphabet type is autodetected from looking at the residue composition. DNA: Specify that all sequences in seqfile are DNAs. RNA: Specify that all sequences in seqfile are RNAs. common_options Common Options 1 allcol Include all consensus columns in ali, even if all gaps Boolean 0 ($value) ? " --allcol : "" ( "" , " --allcol " )[ value ] Include columns in the output alignment for every match (consensus) state in the hmmfile, even if it means having all-gap columns. This is useful in analysis pipelines that need to be able to maintain a predetermined profile HMM architecture (with an unchanging number of consensus columns) through an hmmalign step. mapali Include alignment file Alignment STOCKHOLM (defined $value) ? " --mapali $value" : "" ( "" , " --mapali " + str(value) )[ value is not None ] Merge the existing alignment in file into the result, where the file is exactly the same alignment that was used to build the model in hmmfile. This is done using a 'map' of alignment columns to consensus profile positions that is stored in the hmmfile. The multiple alignment in the file will be exactly reproduced in its consensus columns (as defined by the profile), but the displayed alignment in insert columns may be altered, because insertions relative to a profile are considered by convention to be unaligned data. output_options Output Options 1 trim Trim terminal tails of nonaligned residues from alignment Boolean 0 ($value) ? " --trim" : "" ( "", " --trim" )[ value ] Trim nonhomologous residues (assigned to N and C states in the optimal alignments) from the resulting multiple alignment output. outfile_name Outfile name (-o) Filename (defined $value)? " -o $value" : "" ("", " -o " + str(value))[value is not None] outputfile Alignment file Alignment defined($outfile_name) outfile_name is not None "$outfile_name" str(outfile_name) outfile Alignment file Alignment not defined($outfile_name) outfile_name is None "hmmalign.out" "hmmalign.out" outputformat Output format Choice STOCKHOLM STOCKHOLM Pfam A2M PSIBLAST (defined $value and $value ne $vdef) ? " --outformat $value" : "" ( "" , " --outformat " + str( value) )[ value is not None and value != vdef ] Specify that the msafile is in the selected format. Currently the accepted multiple alignment sequence file formats only include Stockholm and SELEX. Default is to autodetect the format of the file. Programs-5.1.1/twofeat.xml0000644000175000001560000007570112072525233014360 0ustar bneronsis twofeat EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net twofeat Finds neighbouring pairs of features in sequence(s) http://bioweb2.pasteur.fr/docs/EMBOSS/twofeat.html http://emboss.sourceforge.net/docs/themes sequence:edit:feature_table twofeat e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_afeaturesection First feature options e_asource Source of first feature String ("", " -asource=" + str(value))[value is not None] 2 By default any feature source in the feature table is allowed. You can set this to match any feature source you wish to allow. The source name is usually either the name of the program that detected the feature or it is the feature table (eg: EMBL) that the feature came from. The source may be wildcarded by using '*'. If you wish to allow more than one source, separate their names with the character '|', eg: gene* | embl e_atype Type of first feature String ("", " -atype=" + str(value))[value is not None] 3 By default every feature in the feature table is allowed. You can set this to be any feature type you wish to allow. See http://www.ebi.ac.uk/embl/WebFeat/ for a list of the EMBL feature types and see Appendix A of the Swissprot user manual in http://www.expasy.org/sprot/userman.html for a list of the Swissprot feature types. The type may be wildcarded by using '*'. If you wish to allow more than one type, separate their names with the character '|', eg: *UTR | intron e_asense Sense of first feature Choice 0 0 + - ("", " -asense=" + str(value))[value is not None and value!=vdef] 4 By default any feature sense is allowed. You can set this to match the required sense. e_aminscore Minimum score of first feature Float 0.0 ("", " -aminscore=" + str(value))[value is not None and value!=vdef] 5 If this is greater than or equal to the maximum score, then any score is allowed. e_amaxscore Maximum score of first feature Float 0.0 ("", " -amaxscore=" + str(value))[value is not None and value!=vdef] 6 If this is less than or equal to the maximum score, then any score is permitted. e_atag Tag of first feature String ("", " -atag=" + str(value))[value is not None] 7 Tags are the types of extra values that a feature may have. For example in the EMBL feature table, a 'CDS' type of feature may have the tags '/codon', '/codon_start', '/db_xref', '/EC_number', '/evidence', '/exception', '/function', '/gene', '/label', '/map', '/note', '/number', '/partial', '/product', '/protein_id', '/pseudo', '/standard_name', '/translation', '/transl_except', '/transl_table', or '/usedin'. Some of these tags also have values, for example '/gene' can have the value of the gene name. By default any feature tag in the feature table is allowed. You can set this to match any feature tag you wish to allow. The tag may be wildcarded by using '*'. If you wish to allow more than one tag, separate their names with the character '|', eg: gene | label e_avalue Value of first feature's tags String ("", " -avalue=" + str(value))[value is not None] 8 Tag values are the values associated with a feature tag. Tags are the types of extra values that a feature may have. For example in the EMBL feature table, a 'CDS' type of feature may have the tags '/codon', '/codon_start', '/db_xref', '/EC_number', '/evidence', '/exception', '/function', '/gene', '/label', '/map', '/note', '/number', '/partial', '/product', '/protein_id', '/pseudo', '/standard_name', '/translation', '/transl_except', '/transl_table', or '/usedin'. Only some of these tags can have values, for example '/gene' can have the value of the gene name. By default any feature tag value in the feature table is allowed. You can set this to match any feature tag value you wish to allow. The tag value may be wildcarded by using '*'. If you wish to allow more than one tag value, separate their names with the character '|', eg: pax* | 10 e_bfeaturesection Second feature options e_bsource Source of second feature String ("", " -bsource=" + str(value))[value is not None] 9 By default any feature source in the feature table is allowed. You can set this to match any feature source you wish to allow. The source name is usually either the name of the program that detected the feature or it is the feature table (eg: EMBL) that the feature came from. The source may be wildcarded by using '*'. If you wish to allow more than one source, separate their names with the character '|', eg: gene* | embl e_btype Type of second feature String ("", " -btype=" + str(value))[value is not None] 10 By default every feature in the feature table is allowed. You can set this to be any feature type you wish to allow. See http://www.ebi.ac.uk/embl/WebFeat/ for a list of the EMBL feature types and see Appendix A of the Swissprot user manual in http://www.expasy.org/sprot/userman.html for a list of the Swissprot feature types. The type may be wildcarded by using '*'. If you wish to allow more than one type, separate their names with the character '|', eg: *UTR | intron e_bsense Sense of second feature Choice 0 0 + - ("", " -bsense=" + str(value))[value is not None and value!=vdef] 11 By default any feature sense is allowed. You can set this to match the required sense. e_bminscore Minimum score of second feature Float 0.0 ("", " -bminscore=" + str(value))[value is not None and value!=vdef] 12 If this is greater than or equal to the maximum score, then any score is allowed. e_bmaxscore Maximum score of second feature Float 0.0 ("", " -bmaxscore=" + str(value))[value is not None and value!=vdef] 13 If this is less than or equal to the maximum score, then any score is permitted. e_btag Tag of second feature String ("", " -btag=" + str(value))[value is not None] 14 Tags are the types of extra values that a feature may have. For example in the EMBL feature table, a 'CDS' type of feature may have the tags '/codon', '/codon_start', '/db_xref', '/EC_number', '/evidence', '/exception', '/function', '/gene', '/label', '/map', '/note', '/number', '/partial', '/product', '/protein_id', '/pseudo', '/standard_name', '/translation', '/transl_except', '/transl_table', or '/usedin'. Some of these tags also have values, for example '/gene' can have the value of the gene name. By default any feature tag in the feature table is allowed. You can set this to match any feature tag you wish to allow. The tag may be wildcarded by using '*'. If you wish to allow more than one tag, separate their names with the character '|', eg: gene | label e_bvalue Value of second feature's tags String ("", " -bvalue=" + str(value))[value is not None] 15 Tag values are the values associated with a feature tag. Tags are the types of extra values that a feature may have. For example in the EMBL feature table, a 'CDS' type of feature may have the tags '/codon', '/codon_start', '/db_xref', '/EC_number', '/evidence', '/exception', '/function', '/gene', '/label', '/map', '/note', '/number', '/partial', '/product', '/protein_id', '/pseudo', '/standard_name', '/translation', '/transl_except', '/transl_table', or '/usedin'. Only some of these tags can have values, for example '/gene' can have the value of the gene name. By default any feature tag value in the feature table is allowed. You can set this to match any feature tag value you wish to allow. The tag value may be wildcarded by using '*'. If you wish to allow more than one tag value, separate their names with the character '|', eg: pax* | 10 e_featurerelationsection Feature relation options e_overlap Type of overlap required Choice A A O NO NW AW BW ("", " -overlap=" + str(value))[value is not None and value!=vdef] 16 This allows you to specify the allowed overlaps of the features A and B. You can allow any or no overlaps, specify that they must or must not overlap, that one must or must not be wholly enclosed within another feature. e_minrange The minimum distance between the features Integer 0 ("", " -minrange=" + str(value))[value is not None and value!=vdef] 17 If this is greater or equal to 'maxrange', then no min or max range is specified e_maxrange The maximum distance between the features Integer 0 ("", " -maxrange=" + str(value))[value is not None and value!=vdef] 18 If this is less than or equal to 'minrange', then no min or max range is specified e_rangetype Positions from which to measure the distance Choice N N L R F ("", " -rangetype=" + str(value))[value is not None and value!=vdef] 19 This allows you to specify the positions from which the allowed minimum or maximum distance between the features is measured e_sense Sense of the features Choice A A S O ("", " -sense=" + str(value))[value is not None and value!=vdef] 20 This allows you to specify the required sense that the two features must be on. This is ignored (always 'Any') when looking at protein sequence features. e_order Order of the features Choice A A AB BA ("", " -order=" + str(value))[value is not None and value!=vdef] 21 This allows you to specify the required order of the two features. The order is measured from the start positions of the features. This criterion is always applied despite the specified overlap type required. e_output Output section e_twoout Do you want the two features written out individually Boolean 0 ("", " -twoout")[ bool(value) ] 22 If you set this to be true, then the two features themselves will be written out. If it is left as false, then a single feature will be written out covering the two features you found. e_typeout Name of the output new feature String not e_twoout misc_feature ("", " -typeout=" + str(value))[value is not None and value!=vdef] 23 If you have specified that the pairs of features that are found should be reported as one feature in the ouput, then you can specify the 'type' name of the new feature here. By default every feature in the feature table is allowed. See http://www.ebi.ac.uk/embl/WebFeat/ for a list of the EMBL feature types and see Appendix A of the Swissprot user manual in http://www.expasy.org/sprot/userman.html for a list of the Swissprot feature types. If you specify an invalid feature type name, then the default name 'misc_feature' is used. e_outfile Name of the report file Filename twofeat.report ("" , " -outfile=" + str(value))[value is not None] 24 e_rformat_outfile Choose the report output format Choice TABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 25 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 26 Programs-5.1.1/genscan.xml0000644000175000001560000002305211441651470014317 0ustar bneronsis genscan 1.0 GENSCAN Gene Identification C. Burge Burge, C., Karlin, S. (1997) Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78-94. Burge, C., Karlin, S. (1997) Gene structure, exon prediction, and alternative splicing. (in preparation). Burge, C. (1997) Identification of genes in human genomic DNA. PhD thesis, Stanford University, Stanford, CA. Burset, M., Guigo, R. (1996) Evaluation of gene structure prediction programs. Genomics 34, 353-367. http://genes.mit.edu/GENSCANinfo.html sequence:nucleic:gene_finding genscan seq DNA Sequence File DNA Sequence FASTA " $value" " " + str(value) 2 organism Organism Choice null null Arabidopsis.smat HumanIso.smat Maize.smat " $value" " "+str(value) 1 Currently available parameter files are: HumanIso.smat for human/vertebrate sequences (also Drosophila) Arabidopsis.smat for Arabidopsis thaliana sequences Maize.smat for Zea mays sequences output Output parameters 3 verbose Verbose output (-v) Boolean 0 ($value) ? " -v" : "" ("" , " -v")[ value ] 3 Add some extra explanatory information to the text output. This information may be helpful the first few times you run the program but will soon become tiresome (that's why its optional). cds Print predicted coding sequences (-cds) Boolean 0 ($value) ? " -cds" : "" ("" , " -cds")[ value ] 3 subopt Identify suboptimal exons (-subopt)? Boolean 0 ($value and defined $cutoff ) ? " -subopt $cutoff" : "" ("" , " -subopt " + str( cutoff ))[ value and cutoff is not None] 3 The default output of the program is the optimal 'parse' of the sequence, i.e. the highest probability gene structure(s) which is present: the exons in this optimal parse are referred to as 'optimal exons' and are always printed out by GENSCAN. Suboptimal exons, on the other hand, are defined as potential exons which have probability above a certian threshold but which are not contained in the optimal parse of the sequence. Suboptimal exons have a variety of potential uses. First, suboptimal exons sometimes correspond to real exons which were missed for whatever reason by the optimal parse of the sequence. Second, regions of a prediction which contain multiple overlapping and/or incompatible optimal and suboptimal exons may in some cases indicate alternatively spliced regions of a gene (Burge & Karlin, in preparation). cutoff Cutoff value for suboptimal exons Float $subopt subopt Choose a cutoff value for identify suboptimal exon ($subopt and defined $value) or ( not $subopt) (subopt and value is not None) or ( not subopt) 3 The cutoff is the probability cutoff used to determine which potential exons qualify as suboptimal exons. This argument should be a number between 0.01 and 0.99. For most applications, a cutoff value of about 0.10 is recommended. Setting the value much lower than 0.10 will often lead to an explosion in the number of suboptimal exons, most of which will probably not be useful. On the other hand, if the value is set much higher than 0.10, then potentially interesting suboptimal exons may be missed. ps Create Postscript output (-ps) Boolean 0 ($value) ? (defined $scale) ? " -ps $psfname $scale" : " -ps $psfname" : "" ( "" , (" -ps " + str( psfname ), " -ps " + str( psfname ) + " " + str( scale ))[scale is not None] )[ value ] 3 scale Scale for PostScript output Integer $ps ps psfname Filename for PostScript output Filename $ps ps genscan.ps psFile PostScript output PostScript Binary $ps ps $psfname str( psfname ) Programs-5.1.1/xxr.xml0000644000175000001560000001772011767572177013550 0ustar bneronsis xxr 3.02 xxr Integrons Analysis and Cassette Identification P. Bouige Rowe-Magnus D.A., Guerout A.M., Biskri L., Bouige P., Mazel D. Comparative analysis of superintegrons: Engineering extensive genetic diversity in the Vibrionaceae. Genome Res. 2003;13:428-442. This software is able to extract putative cassette structures that fulfill the criteria established from analysis of previously known cassettes from integrons and superintegrons. sequence:nucleic:prediction sequence:nucleic:gene_finding xxr String "xxr <xxr.params" "xxr <xxr.params" input Input sequence DNA Sequence FASTA "$value\\n.\\n" str( value )+"\n.\n" xxr.params input_opt Options outsuffix Extension to add to files String (defined $value) ? "$value\\n" : "\\n" ( "\n" , str( value ) + "\n" )[value is not None] xxr.params minsize Minimal core size Integer 4 ($value != $vdef) ? "$value\\n" : "\\n" ( "\n" , str( value ) + "\n" )[value != vdef] xxr.params maxsize Maximal core size Integer 10 ($value != $vdef) ? "$value\\n" : "\\n" ( "\n" , str( value ) + "\n" )[value != vdef] xxr.params maxxxr Maximal XXR size Integer 200 ($value != $vdef) ? "$value\\n" : "\\n" ( "\n" , str( value ) + "\n" )[value != vdef] xxr.params maxgene Maximal gene size Integer 2800 ($value != $vdef) ? "$value\\n" : "\\n" ( "\n" , str( value ) + "\n" )[value != vdef] xxr.params cs Core Site (CS) - Variable site part upstream GTT String GC ($value ne $vdef) ? "$value\\n" : "\\n" ( "\n" , str( value ) + "\n" )[value != vdef] By default, Core Site is GCGTT. xxr.params ICS Inverted Core Site (ICS) - Variable site part downstream AAC String AAA ($value ne $vdef) ? "$value\\n" : "\\n" ( "\n" , str( value ) + "\n" )[value != vdef] By default, Inverted Core Site is AACAAA. xxr.params res XXR results report Text "Resultat_*" "Resultat_*" sk7 Cassette gene files Text "1_*_XXR_*" "2_*_XXR_*" "3_*_XXR_*" "1_*_XXR_*" "2_*_XXR_*" "3_*_XXR_*" If they exist, the 3th cassette gene files ONLY are displayed but you will find ALL of them in the job archive. xxrfasta XXR fasta Sequence FASTA "XXR.fasta_*" "XXR.fasta_*" hk7 Cassette gene files Filename "*_XXR_*" "*_XXR_*" Programs-5.1.1/cirdna.xml0000644000175000001560000005340311672346320014145 0ustar bneronsis cirdna EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net cirdna Draws circular maps of DNA constructs http://bioweb2.pasteur.fr/docs/EMBOSS/cirdna.html http://emboss.sourceforge.net/docs/themes display cirdna e_input Input section e_infile Commands to the cirdna drawing program file cirdnaMappingCommands AbstractText ("", " -infile=" + str(value))[value is not None ] 1 e_additional Additional section e_maxgroups Maximum number of groups (value greater than or equal to 1) Integer 20 ("", " -maxgroups=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 2 e_maxlabels Maximum number of labels (value greater than or equal to 1) Integer 10000 ("", " -maxlabels=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 3 e_output Output section e_ruler Draw a ruler Boolean 1 (" -noruler", "")[ bool(value) ] 4 e_blocktype Type of blocks Choice Filled Open Filled Outline ("", " -blocktype=" + str(value))[value is not None and value!=vdef] 5 e_originangle Position in degrees of the molecule's origin on the circle (value from 0 to 360) Float 90 ("", " -originangle=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 Value less than or equal to 360 is required value <= 360 6 e_posticks Ticks inside or outside the circle Choice 2 1 2 ("", " -posticks=" + str(value))[value is not None and value!=vdef] 7 e_posblocks Text inside or outside the blocks Choice 1 1 2 ("", " -posblocks=" + str(value))[value is not None and value!=vdef] 8 e_intersymbol Horizontal junctions between blocks Boolean 1 (" -nointersymbol", "")[ bool(value) ] 9 e_intercolour Colour of junctions between blocks (enter a colour number) (value from 0 to 15) Integer 1 ("", " -intercolour=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 Value less than or equal to 15 is required value <= 15 10 e_interticks Horizontal junctions between ticks Boolean 0 ("", " -interticks")[ bool(value) ] 11 e_gapsize Interval between ticks in the ruler (value greater than or equal to 0) Integer 500 ("", " -gapsize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 12 e_ticklines Vertical lines at the ruler's ticks Boolean 0 ("", " -ticklines")[ bool(value) ] 13 e_textheight Text scale factor (value greater than or equal to 0.0) Float 1.0 ("", " -textheight=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 14 Height of text. Enter a number <1.0 or >1.0 to decrease or increase the size, respectively e_textlength Length of text multiplier (value greater than or equal to 0.0) Float 1.0 ("", " -textlength=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 15 Length of text. Enter a number <1.0 or >1.0 to decrease or increase the size, respectively e_tickheight Height of ticks multiplier (value greater than or equal to 0.0) Float 1.0 ("", " -tickheight=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 16 Height of ticks. Enter a number <1.0 or >1.0 to decrease or increase the size, respectively e_blockheight Height of blocks multiplier (value greater than or equal to 0.0) Float 1.0 ("", " -blockheight=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 17 Height of blocks. Enter a number <1.0 or >1.0 to decrease or increase the size, respectively e_rangeheight Height of range ends multiplier (value greater than or equal to 0.0) Float 1.0 ("", " -rangeheight=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 18 Height of range ends. Enter a number <1.0 or >1.0 to decrease or increase the size, respectively e_gapgroup Space between groups multiplier (value greater than or equal to 0.0) Float 1.0 ("", " -gapgroup=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 19 Space between groups. Enter a number <1.0 or >1.0 to decrease or increase the size, respectively e_postext Space between text and ticks, blocks, and ranges multiplier (value greater than or equal to 0.0) Float 1.0 ("", " -postext=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 20 Space between text and ticks, blocks, and ranges. Enter a number <1.0 or >1.0 to decrease or increase the size, respectively e_graphout Choose the e_graphout output format Choice png png gif cps ps meta data (" -graphout=" + str(vdef), " -graphout=" + str(value))[value is not None and value!=vdef] 21 e_goutfile Name of the output graph Filename cirdna_graph ("" , " -goutfile=" + str(value))[value is not None] 22 outgraph_png Graph file Picture Binary e_graphout == "png" "*.png" outgraph_gif Graph file Picture Binary e_graphout == "gif" "*.gif" outgraph_ps Graph file PostScript Binary e_graphout == "ps" or e_graphout == "cps" "*.ps" outgraph_meta Graph file Picture Binary e_graphout == "meta" "*.meta" outgraph_data Graph file Text e_graphout == "data" "*.dat" auto Turn off any prompting String " -auto -stdout" 23 Programs-5.1.1/abiview.xml0000644000175000001560000003171012072525233014325 0ustar bneronsis abiview EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net abiview Display the trace in an ABI sequencer file http://bioweb2.pasteur.fr/docs/EMBOSS/abiview.html http://emboss.sourceforge.net/docs/themes display abiview e_input Input section e_infile Abi sequencing trace file Binary ("", " -infile=" + str(value))[value is not None] 1 e_output Output section e_outseq Name of the output sequence file (e_outseq) DNA Filename abiview.e_outseq ("" , " -outseq=" + str(value))[value is not None] 2 e_osformat_outseq Choose the sequence output format DNA Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 3 e_outseq_out outseq_out option DNA Sequence e_outseq e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 4 xy_goutfile Name of the output graph Filename abiview_xygraph ("" , " -goutfile=" + str(value))[value is not None] 5 xy_outgraph_png Graph file Picture Binary e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_graph == "data" "*.dat" e_startbase First base to report or display (value greater than or equal to 0) Integer 0 ("", " -startbase=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 6 e_endbase Last base to report or display Integer 0 ("", " -endbase=" + str(value))[value is not None and value!=vdef] 7 Last sequence base to report or display. If the default is set to zero then the value of this is taken as the maximum number of bases. e_separate Separate the trace graphs for the 4 bases Boolean 0 ("", " -separate")[ bool(value) ] 8 e_yticks Display y-axis ticks Boolean 0 ("", " -yticks")[ bool(value) ] 9 e_sequence Display the sequence on the graph Boolean 1 (" -nosequence", "")[ bool(value) ] 10 e_window Sequence display window size Integer 40 ("", " -window=" + str(value))[value is not None and value!=vdef] 11 e_bases Base graphs to be displayed String GATC ("", " -bases=" + str(value))[value is not None and value!=vdef] 12 auto Turn off any prompting String " -auto -stdout" 12 Programs-5.1.1/wise2.xml0000644000175000001560000010402111767572177013747 0ustar bneronsis wise2 2.2.0 WISE2 Comparisons of protein/DNA sequences E. Birney http://www.ebi.ac.uk/Tools/Wise2/doc_wise2.html ftp://ftp.ebi.ac.uk/pub/software/unix/wise2/ alignment:pairwise wise2 Wise program Choice Null Null genewise estwise "$value" str(value) 1 protein_file Protein file protein Protein sequence File Protein Sequence FASTA not $hmmer hmmer is None " $value" " " +str(value) You must give a protein sequence file in fasta format not ($hmmer) not (hmmer) 2 You must give a protein sequence file in fasta format. hmmer or Protein HMM File Protein HmmTextProfile AbstractText $protein is not defined protein is None " $value" " " + str(value) You must give an HMMER file not ($protein) not (protein) 2 You must give an HMMER file. hmmer_command HMM command (-hmmer) String defined $hmmer hmmer is not None " -hmmer" " -hmmer" 4 dna DNA sequence File DNA Sequence FASTA " $value" " " + str(value) 3 quiet Silent mode (-silent -quiet) String " -silent -quiet" " -silent -quiet" 100 dna_options DNA sequence Options 5 dna_start Start position in dna (-u) Integer (defined $value) ? " -u $value" : "" ( "" , " -u " + str(value) )[ value is not None ] dna_end End position in dna (-v) Integer (defined $value) ? " -v $value" : "" ( "" , " -v " + str(value) )[ value is not None ] strand Strand comparison Choice -tfor -tfor -trev -both (defined $value and $value ne $vdef) ? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] tabs Report positions as absolute to truncated/reverse sequence (-tabs) Boolean 0 ($value) ? " -tabs" : "" ( "" , " -tabs" )[ value ] protein_options Protein comparison Options not $hmmer hmmer is None 6 protein_start Start position in protein (-s) Integer (defined $value) ? " -s $value" : "" ( "" , " -s " + str(value) )[ value is not None ] protein_end End position in protein (-t) Integer (defined $value) ? " -t $value" : "" ( "" , " -t " + str(value) )[ value is not None ] gap Gap penalty (-g) Integer 12 (defined $value and $value != $vdef) ? " -g $value" : "" ( "" , " -g " + str(value) )[ value is not None and value != vdef] ext Gap extension penalty (-e) Integer 2 (defined $value and $value != $vdef) ? " -e $value" : "" ( "" , " -e " + str(value) )[ value is not None and value != vdef] matrix Comparison matrix (-m) Choice BLOSUM62.bla BLOSUM30.bla BLOSUM45.bla BLOSUM62.bla BLOSUM80.bla gon120.bla gon160.bla gon200.bla gon250.bla gon350.bla idenity.bla (defined $value and $value ne $vdef) ? " -m $value" : "" ( "" , " -m" + str(value) )[ value is not None and value != vdef] gene_model_options Model Options 7 init Type of match (-init) Choice default default global local wing endbias (defined $value and $value ne $vdef) ? " -init $value" : "" ( "" , " -init " + str(value) )[ value is not None and value != vdef] subs Substitution error rate (-subs) Float 1e-5 (defined $value and $value != $vdef) ? " -subs $value" : "" ( "" , " -subs " + str(value) )[ value is not None and value != vdef] indel Insertion/deletion error rate (-indel) Float 1e-5 (defined $value and $value != $vdef) ? " -indel $value" : "" ( "" , " -indel " + str(value) )[ value is not None and value != vdef] null Random Model as synchronous or flat (-null) Choice syn syn flat (defined $value and $value ne $vdef) ? " -null $value" : "" ( "" , " -null " + str(value) )[ value is not None and value != vdef] alln Probability of matching a NNN codon (-alln) Float 1.0 (defined $value and $value != $vdef) ? " -alln $value" : "" ( "" , " -alln " + str(value) )[ value is not None and value != vdef] wise2_model_opt Genewise special option $wise2 eq "genewise" wise2 == "genewise" gene Gene parameter file (-gene) Choice human.gf human.gf pb.gf pombe.gf worm.gf (defined $value and $value ne $vdef) ? " -gene $value" : "" ( "" , " -gene " + str(value) )[ value is not None and value != vdef] cfreq Using codon bias or not (-cfreq)? Choice flat model flat (defined $value and $value ne $vdef) ? " -cfreq $value" : "" ( "" , " -cfreq " + str(value) )[ value is not None and value != vdef] splice Using splice model or GT/AG (-splice)? Choice model model flat (defined $value and $value ne $vdef) ? " -splice $value" : "" ( "" , " -splice " + str(value) )[ value is not None and value != vdef] intron Use tied model for introns (-intron) Choice tied model tied (defined $value and $value ne $vdef) ? " -intron $value" : "" ( "" , " -intron " + str(value) )[ value is not None and value != vdef] insert Protein insert model (-insert) Choice flat model flat (defined $value and $value ne $vdef) ? " -insert $value" : "" ( "" , " -insert " + str(value) )[ value is not None and value != vdef] output_options Output Options 9 pretty Show pretty ascii output (-pretty) Boolean 1 ($value) ? " -pretty" : "" ( "" , " -pretty" )[ value ] para Show parameters (-para) Boolean 1 ($value) ? " -para" : "" ( "" , " -para" )[ value ] sum Show summary output (-sum) Boolean 0 ($value) ? " -sum" : "" ( "" , " -sum" )[ value ] pep Show protein translation, splicing frameshifts (-pep) Boolean 0 ($value) ? " -pep" : "" ( "" , " -pep" )[ value ] alb Show logical AlnBlock alignment (-alb) Boolean 0 ($value) ? " -alb" : "" ( "" , " -alb" )[ value ] pal Show raw matrix alignment (-pal) Boolean 0 ($value) ? " -pal" : "" ( "" , " -pal" )[ value ] block Length of main block in pretty output (-block) Integer 50 (defined $value and $value != $vdef) ? " -block $value" : "" ( "" , " -block " + str(value) )[ value is not None and value != vdef] divide Divide string for multiple outputs (-divide) String (defined $value) ? " -divide \"$value\"" : "" ( "" , " -divide " + str(value) )[ value is not None ] wise2_out_opt Genewise special option $wise2 eq "genewise" wise2 == "genewise" pseudo Mark genes with frameshifts as pseudogenes (-pseudo) Boolean 0 ($value) ? " -pseudo" : "" ( "" , " -pseudo" )[ value ] genes Show gene structure (-genes) Boolean 0 ($value) ? " -genes" : "" ( "" , " -genes" )[ value ] genesf Show gene structure with supporting evidence (-genesf) Boolean 0 ($value) ? " -genesf" : "" ( "" , " -genesf" )[ value ] embl Show EMBL feature format with CDS key (-embl) Boolean 0 ($value) ? " -embl" : "" ( "" , " -embl" )[ value ] diana Show EMBL feature format with misc_feature key for diana (-diana) Boolean 0 ($value) ? " -diana" : "" ( "" , " -diana" )[ value ] cdna Show cDNA (-cdna) Boolean 0 ($value) ? " -cdna" : "" ( "" , " -cdna" )[ value ] trans Show protein translation, breaking at frameshifts (-trans) Boolean 0 ($value) ? " -trans" : "" ( "" , " -trans" )[ value ] ace Ace file gene structure (-ace) Boolean 0 ($value) ? " -ace" : "" ( "" , " -ace" )[ value ] gff Gene Feature Format file (-gff) Boolean 0 ($value) ? " -gff" : "" ( "" , " -gff" )[ value ] gener Raw gene structure (-gener) Boolean 0 ($value) ? " -gener" : "" ( "" , " -gener" )[ value ] New_gene_options New gene model statistics for genewise $wise2 eq "genewise" wise2 == "genewise" 10 splice_max_collar Maximum Bits value for a splice site (-splice_max_collar) Float 5.0 (defined $value and $value != $vdef) ? " -splice_max_collar $value" : "" ( "" , " -splice_max_collar " + str(value) )[ value is not None and value != vdef] splice_min_collar Minimum Bits value for a splice site (-splice_min_collar) Float -5.0 (defined $value and $value != $vdef) ? " -splice_min_collar $value" : "" ( "" , " -splice_min_collar " + str(value) )[ value is not None and value != vdef] splice_score_offset Score offset for splice sites (-splice_score_offset) Float 4.5 (defined $value and $value != $vdef) ? " -splice_score_offset $value" : "" ( "" , " -splice_score_offset " + str(value) )[ value is not None and value != vdef] standard_options Standard Options 11 erroroffstd No warning messages (-erroroffstd) Boolean 0 ($value) ? " -erroroffstd" : "" ( "" , " -erroroffstd" )[ value ] Programs-5.1.1/jaspscan.xml0000644000175000001560000002746412072525233014514 0ustar bneronsis jaspscan EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net jaspscan Scans DNA sequences for transcription factors http://bioweb2.pasteur.fr/docs/EMBOSS/jaspscan.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:transcription jaspscan e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_menu Jaspar matrix set Choice C C F P N O S B L H ("", " -menu=" + str(value))[value is not None and value!=vdef] 2 e_matrices Comma separated matrix list String all ("", " -matrices=" + str(value))[value is not None and value!=vdef] 3 The name 'all' reads in all matrix files from the selected JASPAR matrix set. You can specify individual matrices by giving their names with commas between then, such as: 'ma0001.1,ma0015*'. The case of the names is not important. You can specify a file of matrix names to read in by giving the name of the file holding the matrix names with a '@' character in front of it, for example, '@matrix.list'. Blank lines and lines starting with a hash character or '!' are ignored and all other lines are concatenated together with a comma character ',' and then treated as the list of enzymes to search for. An example of a file of matrix names is: ! my matrices ma0001.1, ma0002.1 ! other matrices ma0010.1 ma0032* ma0053.1 e_required Required section e_threshold Threshold percentage Float 80.0 ("", " -threshold=" + str(value))[value is not None and value!=vdef] 4 If the matrix score is greater than or equal to this percentage then a hit will be reported e_additional Additional section e_exclude Comma separated matrix list for exclusion String ("", " -exclude=" + str(value))[value is not None] 5 The names of any matrices to exclude from the 'matrices' list. Matrices are specified in the same way as for the selection list. e_both Scan both strands Boolean 0 ("", " -both")[ bool(value) ] 6 If set then both the forward and reverse strands are searched e_output Output section e_outfile Name of the report file Filename jaspscan.report ("" , " -outfile=" + str(value))[value is not None] 7 e_rformat_outfile Choose the report output format Choice SEQTABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 8 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 9 Programs-5.1.1/BMGE.xml0000644000175000001560000010065311767572177013437 0ustar bneronsis BMGE Version 1.0 BMGE Block Mapping and Gathering using Entropy Alexis Criscuolo and Simonetta Gribaldo Criscuolo A, Gribaldo S (2010) BMGE (Block Mapping and Gathering with Entropy): selection of phylogenetic informative regions from multiple sequence alignments. BMC Evolutionary Biology 10:210. ftp://ftp.pasteur.fr/pub/gensoft/projects/BMGE/ http://bioweb2.pasteur.fr/docs/BMGE/BMGE_doc.pdf alignment:multiple:information BMGE input Input 2 infile Alignment (-i) Protein DNA Alignment FASTA PHYLIPS " -i $value" " -i "+str(value) 1 BMGE uses FASTA or PHYLIP sequential format for input. These are plain text files. There is no limit on the length of the alignment. There is also no limit on the length of the label of a sequence (i.e. its FASTA annotation line), although a too long label (e.g. more than 100 letters) will be truncated if the output format is PHYLIP sequential. input_type Type of sequence (-t) Choice null null AA DNA CODON (defined $value and $value ne $vdef) ? " -t $value" : "" ("", " -t "+str(value))[value is not None and value!=vdef] 2 Both standard single-letter amino acid and nucleotide alphabets are used by BMGE. When using amino acid sequences, degenerated character states B and Z are understood by BMGE; similarly, degenerated nucleotide characters are also understood. The character state X is understood to be any of the 4 or 20 character states when using as input nucleotide or amino acid sequences, respectively. Dashes (i.e. '-') are understood as gaps, whereas dots (i.e. '.'), as any other single letter that are not inside standard alphabets, are considered as unknown character state (i.e. '?'). Nucleotide sequences can be set as codon ones. In this case, each successive nucleotide character triplet is considered as one codon character. options Control options matrixaa Similarity Matrices for amino acid and codon sequences (-m) Choice $input_type eq 'AA' or $input_type eq 'CODON' input_type in ['AA', 'CODON'] BLOSUM62 BLOSUM62 ID BLOSUM30 BLOSUM35 BLOSUM40 BLOSUM45 BLOSUM50 BLOSUM55 BLOSUM60 BLOSUM65 BLOSUM70 BLOSUM75 BLOSUM80 BLOSUM85 BLOSUM90 BLOSUM95 (defined $value and $value ne $vdef) ? " -m $value" : "" ("", " -m "+str(value))[value is not None and value!=vdef] 3 For each character, BMGE computes a score mainly determined by the entropy induced by the respective proportion of each residue. To estimate realistic scores that take into account biologically relevant substitution processes, BMGE weights the entropy estimation with substitution matrices. These option can be used with the 15 estimated BLOSUM matrices. BMGE uses by default the popular BLOSUM62 matrix. The character trimming is progressively more stringent as the BLOSUM index increases (e.g. BLOSUM95); reciprocally, the trimming is progressively more relaxed as the BLOSUM index is lower (e.g. BLOSUM30). In practice, it is recommended to use BLOSUM95 with closely related sequences, and BLOSUM30 with distantly related sequences. If input sequences are set as codons, BMGE performs a conversion into amino acid sequences (following the universal genetic code) and uses BLOSUM matrices to estimate the entropy-like score for each codon character. So, with option -t set as CODON, one can modify the option -m only with BLOSUM matrices. It is also possible to use the identity matrix with any sequence types. matrixan Similarity Matrices for nucleotide sequences (-m) Choice $input_type eq 'DNA' input_type in ['DNA'] DNAPAM100 DNAPAM100 ID DNAPAM1 DNAPAM5 DNAPAM10 DNAPAM20 DNAPAM30 DNAPAM40 DNAPAM50 DNAPAM60 DNAPAM70 DNAPAM80 DNAPAM90 DNAPAM110 DNAPAM120 DNAPAM130 DNAPAM140 DNAPAM150 DNAPAM160 DNAPAM170 DNAPAM180 DNAPAM190 DNAPAM200 DNAPAM210 DNAPAM220 DNAPAM230 DNAPAM240 DNAPAM250 DNAPAM260 DNAPAM270 DNAPAM280 DNAPAM290 DNAPAM300 DNAPAM310 DNAPAM320 DNAPAM330 DNAPAM340 DNAPAM350 DNAPAM360 DNAPAM370 DNAPAM380 DNAPAM390 DNAPAM400 DNAPAM410 DNAPAM420 DNAPAM430 DNAPAM440 DNAPAM450 DNAPAM460 DNAPAM470 DNAPAM480 DNAPAM490 DNAPAM500 (defined $value and $value ne $vdef) ? " -m $value" : "" ("", " -m "+str(value))[value is not None and value!=vdef] 3 For nucleotide input sequences, BMGE uses PAM matrices with a fixed transition/transition ratio. BMGE can be used with all possible PAM matrices, from the most stringent (i.e. DNAPAM1) to highly relaxed ones (e.g. DNAPAM500). By default with nucleotide sequences, BMGE uses the PAM-100 matrix. It is also possible to use the identity matrix. transition Transition/transversion ratio for nucleotide sequences. Float $input_type eq 'DNA' and ($matrixan ne 'DNAPAM100' and $matrixan ne 'ID' ) input_type in ['DNA'] and (matrixan != 'DNAPAM100' and matrixan != 'ID' ) 2.0 (defined $value) ? ":$value " : "" ("", ":" +str(value) + " ")[value is not None] 4 It is possible to indicate a transition/transversion ratio to better define the PAM matrices with nucleotide sequences. By default, BMGE uses a transition/transversion ratio of 2. gap_rate_cutoff Gap Rate Cut-off (-g) Float 0.2 (defined $value and $value != $vdef) ? " -g $value" : "" ("", " -g "+str(value))[value is not None and value!=vdef] The value must be between 0 and 1 $value >= 0 and $value <= 1 value >= 0 and value <= 1 5 BMGE allows characters containing too many gaps to be removed with this option. By default, BMGE removes all characters with a gap frequency greater than 0.2. min_entropy Minimum entropy Score Cut-off (-h) Float 0.0 (($max_entropy != 0.5 or $value != $vdef) and ($max_entropy > $value)) ? " -h $value:$max_entropy" : "" ("", " -h %s:%s " % ( value, max_entropy) )[ (max_entropy != 0.5 or value !=vdef) and (max_entropy > value) ] The value must be between 0 and 1 $value >= 0 and $value <= 1 value >= 0 and value <= 1 6 Following the smoothing operation of the entropy-like score values across characters, BMGE selects characters associated with a score value greater than a fixed threshold. This cut-off is set to 0.0 by default. max_entropy Maximum entropy Score Cut-off (-h) Float 0.5 defined $min_entropy min_entropy is not None The value must be between 0 and 1 and greather than minimun entropy score ($value >= 0 and $value <= 1) and ($value > $min_entropy) (value >= 0 and value <= 1) and (value > min_entropy) 6 Following the smoothing operation of the entropy-like score values across characters, BMGE selects characters associated with a score value below a fixed threshold. This cut-off is set to 0.5 by default. minimun_block_size Minimum Block Size (-b) Integer 5 (defined $value and $value != $vdef) ? " -b $value" : "" ("", " -b "+str(value) )[value is not None and value!=vdef] BMGE only selects regions of size greater than or equal to 5. Use this option to modify this minimum block size parameter. 7 output_option Output format options phylip Output in phylip sequential format (-op) Boolean defined $infile infile is not None 1 ( $value ) ? " -op $infile.phy" : "" ("", " -op "+ infile.split('.')[0] + ".phy ")[ value ] 9 phylipout Output in phylip sequential format Alignment PHYLIPS $phylip phylip "$infile.phy" infile.split('.')[0] + ".phy" phylip_oppp Output in phylip sequential format. Special formating (-oppp) Boolean defined $infile infile is not None 0 ( $value ) ? " -oppp $infile.phyp" : "" ("", " -oppp "+ infile.split('.')[0] + ".phyp ")[ value] 9 If input sequences are in FASTA format with NCBI-formatted annotation lines, e.g. >field1|field2|field3|field4| field5 [field6] the options -oppp allow naming sequences by field6_____field4 ; knowing that field4 is generally an accession number, and field6 a taxon name, this option leads to PHYLIP files where each sequence is labelled as a taxon name and an accession number. phylipout_oppp Output in phylip sequential format Alignment PHYLIPS $phylip_oppp phylip_oppp "$infile.phyp" infile.split('.')[0] + ".phyp" nexus Output in nexus format (-on) Boolean defined $infile infile is not None 0 ( $value ) ? " -on $infile.nex" : "" ("", " -on "+ infile.split('.')[0] + ".nex ")[ value ] 9 nexusout Output in nexus format Alignment NEXUS $nexus nexus "$infile.nex" infile.split('.')[0] + ".nex" nexus_onnn Output in nexus format. Special formating (-onnn) Boolean defined $infile infile is not None 0 ( $value ) ? " -onnn $infile.nexn" : "" ("", " -onnn "+ infile.split('.')[0] + ".nexn ")[ value ] 9 If input sequences are in FASTA format with NCBI-formatted annotation lines, e.g. >field1|field2|field3|field4| field5 [field6] the option -onnn allow naming sequences by field6_____field4 ; knowing that field4 is generally an accession number and field6 a taxon name, this option leads to NEXUS files where each sequence is labelled as a taxon name and an accession number. nexusout_onnn Output in nexus format Alignment NEXUS $nexus_onnn nexus_onnn "$infile.nexn" infile.split('.')[0] + ".nexn" fasta Output in fasta format (-of) Boolean defined $infile infile is not None 0 ( $value ) ? " -of $infile.fa" : "" ("", " -of "+ infile.split('.')[0] + ".fa ")[ value ] 10 fastaout Output in fasta format Alignment FASTA $fasta fasta "$infile.fa" infile.split('.')[0] + ".fa" html Output in html format (-oh) Boolean defined $infile infile is not None 0 ( $value ) ? " -oh $infile.html" : "" ("", " -oh "+ infile.split('.')[0] + ".html ")[ value ] 11 htmlout Output in html format Report $html html "$infile.html" infile.split('.')[0] + ".html" Programs-5.1.1/dottup.xml0000644000175000001560000003462012072525233014221 0ustar bneronsis dottup EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net dottup Displays a wordmatch dotplot of two sequences http://bioweb2.pasteur.fr/docs/EMBOSS/dottup.html http://emboss.sourceforge.net/docs/themes alignment:pairwise:dot_plots dottup e_input Input section e_asequence asequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -asequence=" + str(value))[value is not None] 1 e_bsequence bsequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -bsequence=" + str(value))[value is not None] 2 e_required Required section e_wordsize Word size (value greater than or equal to 2) Integer 10 ("", " -wordsize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 2 is required value >= 2 3 e_output Output section e_stretch Stretch axes Boolean 0 ("", " -stretch")[ bool(value) ] 4 Use non-proportional axes e_graph Choose the e_graph output format Choice not e_stretch png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 5 e_goutfile Name of the output graph Filename not e_stretch dottup_graph ("" , " -goutfile=" + str(value))[value is not None] 6 outgraph_png Graph file Picture Binary not e_stretch and e_graph == "png" "*.png" outgraph_gif Graph file Picture Binary not e_stretch and e_graph == "gif" "*.gif" outgraph_ps Graph file PostScript Binary not e_stretch and e_graph == "ps" or e_graph == "cps" "*.ps" outgraph_meta Graph file Picture Binary not e_stretch and e_graph == "meta" "*.meta" outgraph_data Graph file Text not e_stretch and e_graph == "data" "*.dat" e_xygraph Choose the e_xygraph output format Choice e_stretch png png gif cps ps meta data (" -xygraph=" + str(vdef), " -xygraph=" + str(value))[value is not None and value!=vdef] 7 xy_goutfile Name of the output graph Filename e_stretch dottup_xygraph ("" , " -goutfile=" + str(value))[value is not None] 8 xy_outgraph_png Graph file Picture Binary e_stretch and e_xygraph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_stretch and e_xygraph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_stretch and e_xygraph == "ps" or e_xygraph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_stretch and e_xygraph == "meta" "*.meta" xy_outgraph_data Graph file Text e_stretch and e_xygraph == "data" "*.dat" e_boxit Draw a box around dotplot Boolean not e_stretch 1 (" -noboxit", "")[ bool(value) ] 9 auto Turn off any prompting String " -auto -stdout" 10 Programs-5.1.1/clustalw-profile.xml0000644000175000001560000011350012073003734016167 0ustar bneronsis clustalw-profile Clustalw: Profile alignments Merge two alignments by profile alignment alignment:multiple clustalw -profile profile Profile Alignments parameters 2 By PROFILE ALIGNMENT, we mean alignment using existing alignments. Profile alignments allow you to store alignments of your favorite sequences and add new sequences to them in small bunches at a time. (e.g. an alignment output file from CLUSTAL W). One or both sets of input sequences may include secondary structure assignments or gap penalty masks to guide the alignment. Merge 2 alignments by profile alignment profile1 Profile 1 Alignment CLUSTAL (defined $value) ? " -profile1=$value" : "" ( "" , " -profile1=" + str( value ) )[value is not None] profile2 Profile 2 Alignment CLUSTAL (defined $value) ? " -profile2=$value" : "" ( "" , " -profile2=" + str( value ) )[value is not None] usetree1 File for old guide tree for profile1 (-usetree1) Tree NEWICK (defined $value) ? " -usetree1=$value" : "" ( "" , " -usetree1=" + str( value ))[value is not None] usetree2 File for old guide tree for profile2 (-usetree2) Tree NEWICK (defined $value) ? " -usetree2=$value" : "" ( "" , " -usetree2=" + str( value ))[value is not None] general_settings General settings 3 typeseq Protein or DNA (-type) Choice auto auto protein dna (defined $value) ? " -type=$value" : "" ("", " -type="+str(value))[value is not None] quicktree Toggle Slow/Fast pairwise alignments (-quicktree) Choice slow slow fast ($value eq "fast") ? " -quicktree" : "" ( "" , " -quicktree")[ value == "fast"] slow: by dynamic programming (slow but accurate) fast: method of Wilbur and Lipman (extremely fast but approximate) fastpw Fast Pairwise Alignments parameters $quicktree eq "fast" quicktree == "fast" 2 These similarity scores are calculated from fast, approximate, global alignments, which are controlled by 4 parameters. 2 techniques are used to make these alignments very fast: 1) only exactly matching fragments (k-tuples) are considered; 2) only the 'best' diagonals (the ones with most k-tuple matches) are used. ktuple Word size (-ktuple) Integer 1 (defined $value and $value != $vdef) ? " -ktuple=$value" : "" ( "" , " -ktuple=" + str( value ) )[value is not None and value != vdef ] 2 K-TUPLE SIZE: This is the size of exactly matching fragment that is used. INCREASE for speed (max= 2 for proteins; 4 for DNA), DECREASE for sensitivity. For longer sequences (e.g. >1000 residues) you may need to increase the default. topdiags Number of best diagonals (-topdiags) Integer 5 (defined $value and $value != $vdef) ? " -topdiags=$value" : "" ( "" , " -topdiags=" + str( value ))[value is not None and value != vdef ] 2 The number of k-tuple matches on each diagonal (in an imaginary dot-matrix plot) is calculated. Only the best ones (with most matches) are used in the alignment. This parameter specifies how many. Decrease for speed; increase for sensitivity. window Window around best diags (-window) Integer 5 (defined $value and $value != $vdef) ? " -window=$value" : "" ( "" , " -window=" + str( value ) )[ value is not None and value != vdef ] 2 WINDOW SIZE: This is the number of diagonals around each of the 'best' diagonals that will be used. Decrease for speed; increase for sensitivity pairgap Gap penalty (-pairgap) Float 3 (defined $value and $value != $vdef) ? " -pairgap=$value" : "" ( "" , " -pairgap=" + str( value ))[ value is not None and value != vdef ] 2 This is a penalty for each gap in the fast alignments. It has little affect on the speed or sensitivity except for extreme values. score Percent or absolute score ? (-score) Choice percent percent absolute (defined $value and $value ne $vdef) ? " -score=$value" : "" ( "" , " -score=" +str( value ) )[value is not None and value !=vdef] 2 slowpw Slow Pairwise Alignments parameters $quicktree eq "slow" quicktree == "slow" 2 These parameters do not have any affect on the speed of the alignments. They are used to give initial alignments which are then rescored to give percent identity scores. These % scores are the ones which are displayed on the screen. The scores are converted to distances for the trees. pwgapopen Gap opening penalty (-pwgapopen) Float 10.00 (defined $value and $value != $vdef) ? " -pwgapopen=$value" : "" ( "" , " -pwgapopen=" + str( value ) )[ value is not None and value != vdef ] pwgapext Gap extension penalty (-pwgapext) Float 0.10 (defined $value and $value != $vdef) ? " -pwgapext=$value" : "" ( "" , " -pwgapext=" + str( value ) )[ value is not None and value != vdef ] slowpw_prot Protein parameters $typeseq eq "protein" typeseq == "protein" pwmatrix Protein weight matrix (-pwmatrix) Choice gonnet blosum gonnet pam id (defined $value and $value ne $vdef) ? " -pwmatrix=$value" : "" ( "" , " -pwmatrix=" + str(value) )[value is not None and value != vdef ] The scoring table which describes the similarity of each amino acid to each other. For DNA, an identity matrix is used. BLOSUM (Henikoff). These matrices appear to be the best available for carrying out data base similarity (homology searches). The matrices used are: Blosum80, 62, 40 and 30. The Gonnet Pam 250 matrix has been reported as the best single matrix for alignment, if you only choose one matrix. Our experience with profile database searches is that the Gonnet series is unambiguously superior to the Blosum series at high divergence. However, we did not get the series to perform systematically better than the Blosum series in Clustal W (communication of the authors). PAM (Dayhoff). These have been extremely widely used since the late '70s. We use the PAM 120, 160, 250 and 350 matrices. slowpw_dna DNA parameters $typeseq eq "dna" typeseq == "dna" pwdnamatrix DNA weight matrix (-pwdnamatrix) Choice iub iub clustalw (defined $value and $value ne $vdef) ? " -pwdnamatrix=$value" : "" ( "" , " -pwdnamatrix=" + str(value) )[ value is not None and value != vdef ] For DNA, a single matrix (not a series) is used. Two hard-coded matrices are available: 1) IUB. This is the default scoring matrix used by BESTFIT for the comparison of nucleic acid sequences. X's and N's are treated as matches to any IUB ambiguity symbol. All matches score 1.9; all mismatches for IUB symbols score 0. 2) CLUSTALW(1.6). The previous system used by ClustalW, in which matches score 1.0 and mismatches score 0. All matches for IUB symbols also score 0. structure Structure Alignments parameters 2 These options, when doing a profile alignment, allow you to set 2D structure parameters. If a solved structure is available, it can be used to guide the alignment by raising gap penalties within secondary structure elements, so that gaps will preferentially be inserted into unstructured surface loops. Alternatively, a user-specified gap penalty mask can be supplied directly. A gap penalty mask is a series of numbers between 1 and 9, one per position in the alignment. Each number specifies how much the gap opening penalty is to be raised at that position (raised by multiplying the basic gap opening penalty by the number) i.e. a mask figure of 1 at a position means no change in gap opening penalty; a figure of 4 means that the gap opening penalty is four times greater at that position, making gaps 4 times harder to open. Gap penalty masks is to be supplied with the input sequences. The masks work by raising gap penalties in specified regions (typically secondary structure elements) so that gaps are preferentially opened in the less well conserved regions (typically surface loops). CLUSTAL W can read the masks from SWISS-PROT, CLUSTAL or GDE format input files. For many 3-D protein structures, secondary structure information is recorded in the feature tables of SWISS-PROT database entries. You should always check that the assignments are correct - some are quite inaccurate. CLUSTAL W looks for SWISS-PROT HELIX and STRAND assignments e.g. FT HELIX 100 115 FT HELIX 100 115 The structure and penalty masks can also be read from CLUSTAL alignment format as comment lines beginning !SS_ or GM_ e.g. !SS_HBA_HUMA ..aaaAAAAAAAAAAaaa.aaaAAAAAAAAAAaaaaaaAaaa.........aaaAAAAAA !GM_HBA_HUMA 112224444444444222122244444444442222224222111111111222444444 HBA_HUMA VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK Note that the mask itself is a set of numbers between 1 and 9 each of which is assigned to the residue(s) in the same column below. In GDE flat file format, the masks are specified as text and the names must begin with SS_ or GM_. Either a structure or penalty mask or both may be used. If both are included in an alignment, the user will be asked which is to be used. nosecstr1 Do not use secondary structure-gap penalty mask for profile 1 (-nosecstr1) Boolean 0 ($value) ? " -nosecstr1" : "" ( "" , " -nosecstr1")[ value ] 2 This option controls whether the input secondary structure information or gap penalty masks will be used. nosecstr2 Do not use secondary structure-gap penalty mask for profile 2 (-nosecstr2) Boolean 0 ($value) ? " -nosecstr2" : "" ( "" , " -nosecstr2")[ value ] This option controls whether the input secondary structure information or gap penalty masks will be used. helixgap Helix gap penalty (-helixgap) Integer 4 (defined $value and $value != $vdef) ? " -helixgap=$value" : "" ( "" , " -helixgap=" + str( value ) )[ value is not None and value != vdef ] This option provides the value for raising the gap penalty at core Alpha Helical (A) residues. In CLUSTAL format, capital residues denote the A and B core structure notation. The basic gap penalties are multiplied by the amount specified. strandgap Strand gap penalty (-strandgap) Integer 4 (defined $value and $value != $vdef) ? " -strandgap=$value" : "" ( "" , " -strandgap=" + str( value ) )[ value is not None and value != vdef ] This option provides the value for raising the gap penalty at Beta Strand (B) residues. In CLUSTAL format, capital residues denote the A and B core structure notation. The basic gap penalties are multiplied by the amount specified. loopgap Loop gap penalty (-loopgap) Integer 1 (defined $value and $value != $vdef) ? " -loopgap=$value" : "" ( "" , " -loopgap=" + str( value ) )[ value is not None and value != vdef ] This option provides the value for the gap penalty in Loops. By default this penalty is not raised. In CLUSTAL format, loops are specified by . in the secondary structure notation. terminalgap Secondary structure terminal penalty (-terminalgap) Integer 2 (defined $value and $value != $vdef) ? " -terminalgap=$value" : "" ( "" , " -terminalgap=" + str( value ) )[ value is not None and value != vdef ] This option provides the value for setting the gap penalty at the ends of secondary structures. Ends of secondary structures are observed to grow and-or shrink in related structures. Therefore by default these are given intermediate values, lower than the core penalties. All secondary structure read in as lower case in CLUSTAL format gets the reduced terminal penalty. helixendin Helix terminal positions: number of residues inside helix to be treated as terminal (-helixendin) Integer 3 (defined $value and $value != $vdef) ? " -helixendin=$value" : "" ( "" , " -helixendin=" + str( value ) )[ value is not None and value != vdef ] This option (together with the -helixendin) specify the range of structure termini for the intermediate penalties. In the alignment output, these are indicated as lower case. For Alpha Helices, by default, the range spans the end helical turn. helixendout Helix terminal positions: number of residues outside helix to be treated as terminal (-helixendout) Integer 0 (defined $value and $value != $vdef) ? " -helixendout=$value" : "" ( "" , " -helixendout=" + str( value ) )[ value is not None and value != vdef ] This option (together with the -helixendin) specify the range of structure termini for the intermediate penalties. In the alignment output, these are indicated as lower case. For Alpha Helices, by default, the range spans the end helical turn. strandendin Strand terminal positions: number of residues inside strand to be treated as terminal (-strandendin) Integer 1 (defined $value and $value != $vdef) ? " -strandendin=$value" : "" ( "" , " -strandendin=" + str( value ) )[ value is not None and value != vdef ] This option (together with the -strandendout option) specify the range of structure termini for the intermediate penalties. In the alignment output, these are indicated as lower case. For Beta Strands, the default range spans the end residue and the adjacent loop residue, since sequence conservation often extends beyond the actual H-bonded Beta Strand. strandendout Strand terminal positions: number of residues outside strand to be treated as terminal (-strandendout) Integer 1 (defined $value and $value != $vdef) ? " -strandendout=$value" : "" ( "" , " -strandendout=" + str( value ) )[ value is not None and value != vdef ] This option (together with the -strandendin option) specify the range of structure termini for the intermediate penalties. In the alignment output, these are indicated as lower case. For Beta Strands, the default range spans the end residue and the adjacent loop residue, since sequence conservation often extends beyond the actual H-bonded Beta Strand. secstrout Output in alignment (-secstrout) Choice STRUCTURE STRUCTURE MASK BOTH NONE (defined $value and $value ne $vdef) ? " -secstrout=$value" : "" ( "" , " -secstrout=" + str( value ) )[ value is not None and value != vdef ] This option lets you choose whether or not to include the masks in the CLUSTAL W output alignments. Showing both is useful for understanding how the masks work. The secondary structure information is itself very useful in judging the alignment quality and in seeing how residue conservation patterns vary with secondary structure. outputparam Output parameters 5 outputformat Output format (-output) Choice null null GCG GDE PHYLIPI NEXUS CODATA FASTA (defined $value and $value ne $vdef) ? " -output=$value" : "" ( "" , " -output=" + str( value) )[ value is not None and value != vdef ] seqnos Output sequence numbers in the output file (for clustalw output only) (-seqnos) Boolean not defined $outputformat outputformat is None 0 (defined $value and $value != $vdef) ? " -seqnos=on" : "" ( "" , " -seqnos=on")[ value is not None and value != vdef] outorder Result order (-outorder) Choice aligned input aligned (defined $value and $value ne $vdef) ? " -outorder=$value" : "" ( "" , " -outorder=" + str(value))[ value is not None and value != vdef ] outfile Sequence alignment file name (-outfile) Filename (defined $value and $value ne $vdef ) ? " -outfile=$value" : "" ( "" , " -outfile=" + str( value))[ value is not None ] aligfile Alignment file Alignment $outputformat =~ /^(NEXUS|GCG|PHYLIPI|FASTA)$/ outputformat in [ "NEXUS", "GCG", "PHYLIPI","FASTA"] (defined $outfile)? "$outfile":"*.fasta *.nxs *.phy *.msf" { "OUTFILE":outfile, "FASTA":"*.fasta", "NEXUS": "*.nxs", "PHYLIPI": "*.phy" , 'GCG': '*.msf' }[( "OUTFILE", outputformat)[outfile is None]] clustalaligfile Alignment file Alignment CLUSTAL not defined $outputformat outputformat is None (defined $outfile)? "$outfile":"*.aln" ("*.aln", str(outfile))[outfile is not None] In the conservation line output in the clustal format alignment file, three characters are used: '*' indicates positions which have a single, fully conserved residue. ':' indicates that one of the following 'strong' groups is fully conserved (STA,NEQK,NHQK,NDEQ,QHRK,MILV,MILF,HY,FYW). '.' indicates that one of the following 'weaker' groups is fully conserved (CSA,ATV,SAG,STNK,STPA,SGND,SNDEQK,NDEQHK,NEQHRK,FVLIM,HFY). These are all the positively scoring groups that occur in the Gonnet Pam250 matrix. The strong and weak groups are defined as strong score >0.5 and weak score =<0.5 respectively. seqfile Sequences file Sequence NBRF GDE $outputformat =~ /^(GDE|PIR)$/ outputformat in [ 'GDE', 'PIR' ] (defined $outfile)? "$outfile":"*.gde *.pir" { "OUTFILE":outfile, 'GDE':'*.gde', 'PIR':'*.pir}[( "OUTFILE", outputformat)[outfile is None]] dndfile Tree file Tree NEWICK "*.dnd" "*.dnd" Programs-5.1.1/newcpgseek.xml0000644000175000001560000001166212072525233015036 0ustar bneronsis newcpgseek EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net newcpgseek Identify and report CpG-rich regions in nucleotide sequence(s) http://bioweb2.pasteur.fr/docs/EMBOSS/newcpgseek.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:cpg_islands newcpgseek e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_score Cpg score (value from 1 to 200) Integer 17 ("", " -score=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 Value less than or equal to 200 is required value <= 200 2 e_output Output section e_outfile Name of the output file (e_outfile) Filename newcpgseek.e_outfile ("" , " -outfile=" + str(value))[value is not None] 3 e_outfile_out outfile_out option NewcpgseekReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 4 Programs-5.1.1/INSTALL0000644000175000001560000000206311672707410013211 0ustar bneronsisINSTALLATION INSTRUCTIONS FOR PROGRAMS DEFINITIONS IN MOBYLE ************************************************************ 1 - copy the xml programs definitions in MOBYLEHOME/Services/Programs 2 - copy the xml in Entities directory in MOBYLEHOME/Services/Programs/Entities 3 - copy the xml in Env directory in MOBYLEHOME/Local/Services/Programs/Env 4 - modify the content of each xml you have copied in MOBYLEHOME/Local/Services/Programs/Env to fit to your installation 5 - deploy your services ( see configuration guide: 1.13 Services management ) if you experienced some trouble with some programs definitions you can contact us by mail at the following adress: mobyle-support@pasteur.fr. 6 - Mailing list: ================= There is a mailing list dedicated to Mobyle server administrators, called "mobyle-users". This list discusses new releases, related software announcements, administration and development issues, etc. This is a moderated and low traffic list. You can subscribe to Mobyle users at: http://sympa.pasteur.fr/wws/subrequest/mobyle-users Programs-5.1.1/backtranseq.xml0000644000175000001560000010654512072525233015206 0ustar bneronsis backtranseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net backtranseq Back-translate a protein sequence to a nucleotide sequence http://bioweb2.pasteur.fr/docs/EMBOSS/backtranseq.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:translation sequence:protein:composition backtranseq e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_cfile cfile option Choice Ehuman.cut Eacc.cut Eacica.cut Eadenovirus5.cut Eadenovirus7.cut Eagrtu.cut Eaidlav.cut Eanasp.cut Eani.cut Eani_h.cut Eanidmit.cut Earath.cut Easn.cut Eath.cut Eatu.cut Eavi.cut Eazovi.cut Ebacme.cut Ebacst.cut Ebacsu.cut Ebacsu_high.cut Ebja.cut Ebly.cut Ebme.cut Ebmo.cut Ebna.cut Ebommo.cut Ebov.cut Ebovin.cut Ebovsp.cut Ebpphx.cut Ebraja.cut Ebrana.cut Ebrare.cut Ebst.cut Ebsu.cut Ebsu_h.cut Ecac.cut Ecaeel.cut Ecal.cut Ecanal.cut Ecanfa.cut Ecaucr.cut Eccr.cut Ecel.cut Echi.cut Echick.cut Echicken.cut Echisp.cut Echk.cut Echlre.cut Echltr.cut Echmp.cut Echnt.cut Echos.cut Echzm.cut Echzmrubp.cut Ecloab.cut Ecpx.cut Ecre.cut Ecrigr.cut Ecrisp.cut Ectr.cut Ecyapa.cut Edayhoff.cut Eddi.cut Eddi_h.cut Edicdi.cut Edicdi_high.cut Edog.cut Edro.cut Edro_h.cut Edrome.cut Edrome_high.cut Edrosophila.cut Eeca.cut Eeco.cut Eeco_h.cut Eecoli.cut Eecoli_high.cut Eemeni.cut Eemeni_high.cut Eemeni_mit.cut Eerwct.cut Ef1.cut Efish.cut Efmdvpolyp.cut Ehaein.cut Ehalma.cut Ehalsa.cut Eham.cut Ehha.cut Ehin.cut Ehma.cut Ehorvu.cut Ehum.cut Ehuman.cut Ekla.cut Eklepn.cut Eklula.cut Ekpn.cut Elacdl.cut Ella.cut Elyces.cut Emac.cut Emacfa.cut Emaize.cut Emaize_chl.cut Emam_h.cut Emammal_high.cut Emanse.cut Emarpo_chl.cut Emedsa.cut Emetth.cut Emixlg.cut Emouse.cut Emsa.cut Emse.cut Emta.cut Emtu.cut Emus.cut Emussp.cut Emva.cut Emyctu.cut Emze.cut Emzecp.cut Encr.cut Eneigo.cut Eneu.cut Eneucr.cut Engo.cut Eoncmy.cut Eoncsp.cut Eorysa.cut Eorysa_chl.cut Epae.cut Epea.cut Epet.cut Epethy.cut Epfa.cut Ephavu.cut Ephix174.cut Ephv.cut Ephy.cut Epig.cut Eplafa.cut Epolyomaa2.cut Epombe.cut Epombecai.cut Epot.cut Eppu.cut Eprovu.cut Epse.cut Epseae.cut Epsepu.cut Epsesm.cut Epsy.cut Epvu.cut Erab.cut Erabbit.cut Erabit.cut Erabsp.cut Erat.cut Eratsp.cut Erca.cut Erhile.cut Erhime.cut Erhm.cut Erhoca.cut Erhosh.cut Eric.cut Erle.cut Erme.cut Ersp.cut Esalsa.cut Esalsp.cut Esalty.cut Esau.cut Eschma.cut Eschpo.cut Eschpo_cai.cut Eschpo_high.cut Esco.cut Eserma.cut Esgi.cut Esheep.cut Eshp.cut Eshpsp.cut Esli.cut Eslm.cut Esma.cut Esmi.cut Esmu.cut Esoltu.cut Esoy.cut Esoybn.cut Espi.cut Espiol.cut Espn.cut Espo.cut Espo_h.cut Espu.cut Esta.cut Estaau.cut Estrco.cut Estrmu.cut Estrpn.cut Estrpu.cut Esty.cut Esus.cut Esv40.cut Esyhsp.cut Esynco.cut Esyncy.cut Esynsp.cut Etbr.cut Etcr.cut Eter.cut Etetsp.cut Etetth.cut Etheth.cut Etob.cut Etobac.cut Etobac_chl.cut Etobcp.cut Etom.cut Etrb.cut Etrybr.cut Etrycr.cut Evco.cut Evibch.cut Ewheat.cut Ewht.cut Exel.cut Exenla.cut Exenopus.cut Eyeast.cut Eyeast_cai.cut Eyeast_high.cut Eyeast_mit.cut Eyeastcai.cut Eyen.cut Eyeren.cut Eyerpe.cut Eysc.cut Eysc_h.cut Eyscmt.cut Eysp.cut Ezebrafish.cut Ezma.cut ("", " -cfile=" + str(value))[value is not None and value!=vdef] 2 e_output Output section e_outfile Name of the output file (e_outfile) DNA Filename backtranseq.e_outfile ("" , " -outfile=" + str(value))[value is not None] 3 e_osformat_outfile Choose the sequence output format DNA Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 4 e_outfile_out outfile_out option DNA Sequence e_outfile auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/psiblast.xml0000644000175000001560000007547511767572177014563 0ustar bneronsis psiblast PSI-Blast Position Specific Iterative Blast Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaeffer,Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res. 25:3389-3402. database:search:homology database:search:pattern psiblast String "blastpgp" "blastpgp" 0 The blastpgp program can do an iterative search in which sequences found in one round of searching are used to build a score model for the next round of searching. In this usage, the program is called Position-Specific Iterated BLAST, or PSI-BLAST. As explained in the accompanying paper, the BLAST algorithm is not tied to a specific score matrix. Traditionally, it has been implemented using an AxA substitution matrix where A is the alphabet size. PSI-BLAST instead uses a QxA matrix, where Q is the length of the query sequence; at each position the cost of a letter depends on the position w.r.t. the query and the letter in the subject sequence. The position-specific matrix for round i+1 is built from a constrained multiple alignment among the query and the sequences found with sufficiently low e-value in round i. The top part of the output for each round distinguishes the sequences into: sequences found previously and used in the score model, and sequences not used in the score model. The output currently includes lots of diagnostics requested by users at NCBI. To skip quickly from the output of one round to the next, search for the string 'producing', which is part of the header for each round and likely does not appear elsewhere in the output. PSI-BLAST 'converges' and stops if all sequences found at round i+1 below the e-value threshold were already in the model at the beginning of the round. query Sequence File (-i) Sequence FASTA " -i $query" " -i " + str(value) 3 start_region Start of required region in query (-S) Integer 1 (defined $value and $value != $vdef) ? " -S $value" : "" ( "" , " -S " + str(value) )[ value is not None and value != vdef] end_region End of required region in query (-H) Integer -1 (defined $value and $value != $vdef) ? " -H $value" : "" ( "" , " -H " + str(value) )[ value is not None and value != vdef] Location on query sequence. -1 indicates end of query protein_db Protein database (-d ) Choice null null " -d $value" " -d " + str(value) 2 Choose a protein db for blastp or blastx. scoring Scoring option 4 open_a_gap Cost to open a gap (-G) Integer 11 (defined $value and $value != $vdef) ? " -G $value" : "" ( "" , " -G " + str(value) )[ value is not None and value != vdef] extend_a_gap Cost to extend a gap (-E) Integer 1 (defined $value and $value != $vdef) ? " -E $value" : "" ( "" , " -E " + str(value) )[ value is not None and value != vdef] Limited values for gap existence and extension are supported for these three programs. Some supported and suggested values are: Existence Extension 10 -- 1 10 -- 2 11 -- 1 8 -- 2 9 -- 2 (source: NCBI Blast page) matrix Similarity matrix (-M) Choice BLOSUM62 BLOSUM45 BLOSUM80 BLOSUM62 PAM30 PAM70 (defined $value and $value ne $vdef)? " -M $value" : "" ( "" , " -M " + str(value) )[ value is not None and value != vdef] filter_opt Filtering and masking options 5 This options also takes a string as an argument. One may use such a string to change the specific parameters of seg or invoke other filters. Please see the 'Filtering Strings' section (below) for details. filter Filter query sequence with SEG (-F) Boolean 0 ($value) ? " -F T" : "" ( "" , " -F T" )[ value ] lower_case Use lower case filtering (-U) Boolean 0 ($value) ? " -U T" : "" ("", " -U T")[value] This option specifies that any lower-case letters in the input FASTA file should be masked. selectivity_opt Selectivity options 5 Expect Expected value (-e) Float 10 (defined $value and $value != $vdef)? " -e $value":"" ("" , " -e " + str(value))[ value is not None and value != vdef] The statistical significance threshold for reporting matches against database sequences; the default value is 10, such that 10 matches are expected to be found merely by chance, according to the stochastic model of Karlin and Altschul (1990). If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Lower EXPECT thresholds are more stringent, leading to fewer chance matches being reported. Fractional values are acceptable. word_size Word Size (-W) Integer (defined $value) ? " -W $value" : "" ("" , " -W "+str(value))[value is not None] Valid wordsize range is 2 to 3 $value >= 2 and $value <=3 value >= 2 and value <=3 Use words of size N. Zero invokes default behavior Default value: 3 window Multiple hits window size (-A) Integer 40 (defined $value and $value != $vdef)? " -A $value" : "" ( "" , " -A " + str(value) )[ value is not None and value != vdef] When multiple hits method is used, this parameter defines the distance from last hit on the same diagonal to the new one. Zero means single hit algorithm. extend_hit Threshold for extending hits (-f) Integer 11 (defined $value and $value != $vdef)? " -f $value" : "" ( "" , " -f " + str(value) )[ value is not None and value !=vdef] Blast seeks first short word pairs whose aligned score reaches at least this value dropoff X dropoff value for gapped alignment (-X) Integer (defined $value)? " -X $value":"" ("" , " -X " + str(value))[ value is not None ] This is the value that control the path graph region explored by Blast during a gapped extension (Xg in the NAR paper). dropoff_z X dropoff value for final gapped alignment (-Z) Integer 25 (defined $value and $value != $vdef)? " -Z $value" : "" ( "" , " -Z " + str(value) )[ value is not None and value != vdef] This parameter controls the dropoff for the final reported alignment. See also the -X parameter. dropoff_y Dropoff for blast ungapped extensions in bits (-y) Float 7.0 (defined $value and $value != $vdef) ? " -y $value" : "" ( "" , " -y " + str(value) )[ value is not None and value != vdef] This parameter controls the dropoff at ungapped extension stage. See also the -X parameter. eff_len Effective length of the search space (-Y) Integer 0 (defined $value and $value != $vdef) ? " -Y $value" : "" ("" , " -Y "+str(value))[value is not None and value !=vdef] Use zero for the real size keep_hits Number of best hits from a region to keep (-K) Integer (defined $value) ? " -K $value" : "" ("" , " -K "+str(value))[value is not None] If this option is used, a value of 100 is recommended. mode Single-hit or multiple-hit mode (-P) Choice 0 0 1 ($value eq "0") ? " -P $value" : "" ("" , " -P "+str(value))[value != "0"] nb_bits Number of bits to trigger gapping (-N) Integer 22 (defined $value and $value != $vdef) ? " -N $value" : "" ("" , " -N "+str(value))[value is not None and value !=vdef] psi_spec_opt PSI-Blast specific selectivity options 5 max_passes Maximum number of passes to use in multipass version (-j) Integer 1 (defined $value and $value != $vdef) ? " -j $value" : "" ( "" , " -j " + str(value) )[ value is not None and value != vdef] expect_in_multipass e-value threshold for inclusion in multipass model (-h) Float 0.002 (defined $value and $value != $vdef) ? " -h $value" : "" ( "" , " -h " + str(value) )[ value is not None and value != vdef] pseudocounts Constant in pseudocounts for multipass version (-c) Integer 9 (defined $value and $value != $vdef)? " -c $value" : "" ( "" , " -c " + str(value) )[ value is not None and value != vdef] This constant is the weight given to a pre-calculated residue target frequency (versus the observed one) in a column of the position specific matrix. The larger its value, the greater the emphasis given to prior knowledge of residue relationships vis a vis observed residue frequencies (beta constant in NAR paper). affichage Report options 5 Descriptions Number of one-line descriptions to show? (-v) Integer 500 (defined $value and $value != $vdef) ? " -v $value" : "" ( "" , " -v " + str(value) )[ value is not None and value != vdef] Maximum number of database sequences for which one-line descriptions will be reported. Alignments Number of database sequences to show alignments? (-b) Integer 250 (defined $value and $value != $vdef) ? " -b $value" : "" ( "" , " -b " + str(value) )[ value is not None and value != vdef] Maximum number of database sequences for which high-scoring segment pairs will be reported (-b). view_alignments Alignment view options (-m) Choice 0 0 1 2 3 4 5 6 7 8 (defined $value and $value ne $vdef)? " -m $value" : "" ( "" , " -m " + str(value) )[ value is not None and value != vdef] txtoutput Text output file String $view_alignments ne "7" view_alignments != "7" " -o psiblast.txt" " -o psiblast.txt" 10 xmloutput XML output file String $view_alignments eq "7" view_alignments == "7" " -o psiblast.xml" " -o psiblast.xml" 10 htmloutput Html output Boolean $view_alignments !~ /^[78]$/ view_alignments not in [ "7" , "8" ] 1 ($value) ? " && html4blast -g -o psiblast.html psiblast.txt" : "" ("" , " && html4blast -g -o psiblast.html psiblast.txt")[value] 11 believe Believe the query defline (-J) Boolean 0 ($value)? " -J":"" ("" , " -J")[ value ] seqalign_file SeqAlign file (-J option must be true) (-O) Filename $believe believe (defined $value)? " -O $value" : "" ( "" , " -O " + str(value) )[ value is not None ] SeqAlign is in ASN.1 format, so that it can be read with NCBI tools (such as sequin). This allows one to view the results in different formats. txtfile Blast text report BlastTextReport Report $view_alignments ne "7" view_alignments != "7" "psiblast.txt" "psiblast.txt" xmlfile Blast xml report BlastXmlReport Report $view_alignments eq "7" view_alignments == "7" "psiblast.xml" "psiblast.xml" htmlfile Blast html report BlastHtmlReport Report $view_alignments !~ /^[78]$/ view_alignments not in ["7", "8"] "psiblast.html" "psiblast.html" imgfile Picture Binary $view_alignments !~ /^[78]$/ view_alignments not in ["7", "8"] "*.png" "*.gif" "*.png" "*.gif" save_matrix Save PSI-Blast Matrix to file (-C) Filename $max_passes > 1 max_passes > 1 (defined $value) ? " -C $save_matrix" : "" ( "" , " -C "+ str(value) )[ value is not None ] save_txt_matrix Save PSI-BLAST Matrix as text to file (-Q) Filename $max_passes > 1 max_passes > 1 (defined $value)? " -Q $save_txt_matrix" : "" ( "" , " -Q " + str(value) )[ value is not None ] Programs-5.1.1/rnadistance.xml0000644000175000001560000003130111767656243015206 0ustar bneronsis rnadistance RNAdistance Calculate distances between RNA secondary structures Hofacker, Fontana, Hofacker, Stadler Shapiro B A, (1988) An algorithm for comparing multiple RNA secondary structures, CABIOS 4, 381-393 Shapiro B A, Zhang K (1990) Comparing multiple RNA secondary structures using tree comparison, CABIOS 6, 309-318 Fontana W, Konings D A M, Stadler P F, Schuster P, (1993) Statistics of RNA secondary structures, Biopolymers 33, 1389-1404 I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994) Fast Folding and Comparison of RNA Secondary Structures. Monatshefte f. Chemie 125, 167-188 RNAdistance reads RNA secondary structures from stdin and calculates one or more measures for their dissimilarity, based on tree or string editing (alignment). In addition it calculates a "base pair distance" given by the number of base pairs present in one structure, but not the other. For structures of different length base pair distance is not recommended. sequence:nucleic:2D_structure structure:2D_structure rnadistance String "RNAdistance " "RNAdistance " struct RNA structures File RNAStructure AbstractText " < $value" " < " +str(value) 1000 The program accepts structures in bracket format, where matching brackets symbolize base pairs and unpaired bases are represented by a dot '.', or coarse grained representations where hairpins, interior loops, bulges, multiloops, stacks and external bases are represented by (H), (I), (B), (M), (S), and (E), respectively. These can be optionally weighted. Full structures can be represented in the same fashion using the identifiers (U) and (P) for unpaired and paired bases, respectively. Examples: .((..(((...)))..((..)))). full structure (usual format); (U)((U2)((U3)P3)(U2)((U2)P2)P2) HIT structure; ((H)(H)M) or ((((H)S)((H)S)M)S) coarse grained structure; (((((H3)S3)((H2)S2)M4)S2)E2) weighted coarse grained. others_options Other options 2 distance Representation for distance calculation (-D) Choice f f F h H w W c C P (defined $value and $value ne $vdef)? " -D$value" : "" ( "" , " -D" +str(value) )[ value is not None and value != vdef] Use the full, HIT, weighted coarse, or coarse representation to calculate the distance. Capital letters indicate string alignment otherwise tree editing is used. Any combination of distances can be specified. -DP selects the base pair distance. The default if "f". compare Which comparisons (-X) Choice p p m f c (defined $value and $value ne $vdef)? " -X$value" : "" ( "" , " -X" +str(value) )[ value is not None and value != vdef] p: compare the structures pairwise (p), that is first with 2nd, third with 4th etc. This is the default. m: calculate the distance matrix between all structures. The output is formatted as a lower triangle matrix. f: compare each structure to the first one. c: compare continuously, that is i-th with (i+1)th structure. matrix_options Analyse the distance matrix $compare eq "m" compare == "m" 2000 Only when comparison between all structures is requested (-Xm). This uses AnalyseDists distributed with the Vienna package. do_analyse Do this analysis (AnalyseDists program)? Boolean 0 ($value)? " | AnalyseDists" : "" ("", " | AnalyseDists")[value] AnalyseDists reads a distance matrix (given as lower triangle matrix) from stdin and writes a split decomposition and a cluster analysis of this distance matrix to stdout. Method AnalyseDist methods to be used (-X) Choice Null Null s w n (defined $value and $value ne $vdef)? " -X$value" : "" ("", " -X"+str(value))[ value is not None and value != vdef] 2001 psfiles Postcript output file PostScript Binary $compare eq "m" and $Method ne "s" compare == "m" and Method != "s" "*.ps" "*.ps" shapiro Use the Bruce Shapiro's cost matrix for comparing coarse structures (-S) Boolean 0 ($value)? " -S" : "" ( "" , " -S" )[ value ] alignment_name Structure alignment file (-B) Filename (defined $value)? " -B $value" : "" ( "" , " -B " +str(value) )[ value is not None] Print an 'alignment' with gaps of the structures, to show matching substructures. The aligned structures are written to file, if specified. Otherwise output is written to stdout, unless the -Xm option is set in which case 'backtrack.file' is used. ali_outfile Alignment output file RnadistanceReport Report $alignment_name alignment_name is not None $alignment_name str(alignment_name) Programs-5.1.1/signalp.xml0000644000175000001560000004666511767577366014403 0ustar bneronsis signalp 4.0 signalp predict signal peptides in proteins http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?signalp SignalP 4.0: discriminating signal peptides from transmembrane regions Thomas Nordahl Petersen, Søren Brunak, Gunnar von Heijne & Henrik Nielsen Nature Methods, 8:785-786, 2011 Improved prediction of signal peptides: SignalP 3.0. Jannick Dyrløv Bendtsen, Henrik Nielsen, Gunnar von Heijne and Søren Brunak. J. Mol. Biol., 340:783-795, 2004. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Henrik Nielsen, Jacob Engelbrecht, Søren Brunak and Gunnar von Heijne. Protein Engineering, 10:1-6, 1997. Prediction of signal peptides and signal anchors by a hidden Markov model. Henrik Nielsen and Anders Krogh. Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology (ISMB 6), AAAI Press, Menlo Park, California, pp. 122-130, 1998. http://www.cbs.dtu.dk/services/SignalP/ signalp predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks. sequence:protein:motifs sequence:protein:pattern signalp String " signalp " " signalp " sequence Input Sequence Sequence FASTA " $value "" " " + str( value ) 100 >IPI:IPI00000001.2 SWISS-PROT:O95793-1 TREMBL:A8K622;Q59F99 Isoform L ong of Double-stranded RNA-binding protein Staufen homolog 1 MSQVQVQVQNPSAALSGSQILNKNQSLLSQPLMSIPSTTSSLPSENAGRPIQNSALPSAS ITSTSAAAESITPTVELNALCMKLGKKPMYKPVDPYSRMQSTYNYNMRGGAYPPRYFYPF PVPPLLYQVELSVGGQQFNGKGKTRQAAKHDAAAKALRILQNEPLPERLEVNGRESEEEN LNKSEISQVFEIALKRNLPVNFEVARESGPPHMKNFVTKVSVGEFVGEGEGKSKKISKKN AAIAVLEELKKLPPLPAVERVKPRIKKKTKPIVKPQTSPEYGQGINPISRLAQIQQAKKE KEPEYTLLTERGLPRRREFVMQVKVGNHTAEGTGTNKKVAKRNAAENMLEILGFKVPQAQ PTKPALKSEEKTPIKKPGDGRKVTFFEPGSGDENGTSNKEDEFRMPYLSHQQLPAGILPM VPEVAQAVGVSQGHHTKDFTRAAPNPAKATVTAMIARELLYGGTSPTAETILKNNISSGH VPHGPLTRPSEQLDYLSRVQGFQVEYKDFPKNNKNEFVSLINCSSQPPLISHGIGKDVES CHDMAALNILKLLSELDQQSTEMPRTGNGPMSVCGRC >IPI:IPI00000023.4 SWISS-PROT:P18507 TREMBL:B4DSA1 Gamma-aminobutyric acid receptor subunit gamma-2 MSSPNIWSTGSSVYSTPVFSQKMTVWILLLLSLYPGFTSQKSDDDYEDYASNKTWVLTPK VPEGDVTVILNNLLEGYDNKLRPDIGVKPTLIHTDMYVNSIGPVNAINMEYTIDIFFAQT WYDRRLKFNSTIKVLRLNSNMVGKIWIPDTFFRNSKKADAHWITTPNRMLRIWNDGRVLY TLRLTIDAECQLQLHNFPMDEHSCPLEFSSYGYPREEIVYQWKRSSVEVGDTRSWRLYQF SFVGLRNTTEVVKTTSGDYVVMSVYFDLSRRMGYFTIQTYIPCTLIVVLSWVSFWINKDA VPARTSLGITTVLTMTTLSTIARKSLPKVSYVTAMDLFVSVCFIFVFSALVEYGTLHYFV SNRKPSKDKDKKKKNPAPTIDIRPRSATIQMNNATHLQERDEEYGYECLDGKDCASFFCC FEDCRTGAWRHGRIHIRIAKMDSYARIFFPTAFCLFNLVYWVSYLYL type Use networks and models trained on sequences from the specified type of organisms Choice null null gram- gram+ euk (defined $value)? " -t " : "" " -t " + value 10 format Produce output in the specified format. Choice short short long all summary (defined $value and $value ne $vdef)? " -f $value" : "" ( "" , " -f " + value)[ value is not None and value != vdef ]

The valid formats are:

  • short : Write only one line of concluding scores per sequence. Intended for analysis of large datasets where machine-readable output is required.This is the default.
  • long : Write the scores for each position in each sequnce.
  • all : Write predictions for both Signalp-TM and SignalP-noTM networks. Five columns with cleavage site (CS) and Signal Peptide (SP) predictions for both SigP-noTM and SigP-TM methods and TM prediction for each position.
  • summary : Write only the concluding scores for each sequence. This is essen†tially the same information as the 'short' format.
10
graphics generate graphics (-g). Choice null null gif gif+eps ( defined $value and $value ne $vdef) ? " -g $value" : "" ( "" , " -g "+str( value ) )[ bool( value ) ]
  • gif : Save plots in Graphics Interchange Format (GIF) under the names 'plot.method.#.gif', where method is nn or hmm, and # is the number of the input sequence.
  • gif+eps : Save plots in both GIF and EPS formats as described above.
20
Method Use the specified prediction method. Choice best best notm (defined $value and $value ne $vdef) ? " -s $value" : "" ( "" , " -s " + value)[ value is not None and value != vdef ]

Input sequences may include or not TM regions.

  • best : The method decides which neural networks predictions give the best result choosing predictions from either SignalP-TM or SignalP-noTM networks. For 'gram+' organisms it is always SignalP-TM networks.(default)
  • notm : The SignalP-noTM neural networks are specifically chosen.
30
noTM_cutoff cutoff for noTM networks Float (defined $value and $value ne $vdef) ? " -u" : "" ( "" , " -u " + str( value ) )[ value is not None] the cutoff must be >= 0 and <= 1 value >= 0 and value <= 1

user defined D-cutoff for noTM networks. A score above the specified cutoff will result in a positive prediction of a signal peptide. The cutoff determines the yes/no answer only, the prediction process is not affected. The default cutoffs are:

  • euk : 0.45
  • gram+ : 0.57
  • gram- : 0.57
50
TM_cutoff cutoff for TM networks Float (defined $value and $value ne $vdef) ? " -c" : "" ( "" , " -U " + str( value ) )[ value is not None] the cutoff must be >= 0 and <= 1 value >= 0 and value <= 1
user defined D-cutoff for TM networks. A score above the specified cutoff will result in a positive prediction of a signal peptide. The cutoff determines the yes/no answer only, the prediction process is not affected. The default cutoffs are:
  • euk : 0.50
  • gram+ : 0.45
  • gram- : 0.51
50
truncate Truncate each sequence to maximally n N-terminal residues Integer 70 (defined $value and $value ne $vdef) ? " -c" : "" ( "" , " -c " + str( value ) )[ value is not None and value != vdef ] enter a positive value value >= 0 truncate the input sequences to the specified length from the N-ter†minal. The default is 70 residues. The value of "0" disables truncation. 60 mature generate a FASTA file with mature sequences based on the predictions. Boolean 0 ( "" , " -m %s_mature.fasta"%sequence)[value] 70 n_s_e generate a GFF (name-start-end) file with the predicted signal peptides. Boolean 0 ( "" , " -n %s.gff"%sequence)[value] 70 results signalp report Report signalp "signalp.out" "signalp.out"

Neural network output

For each input sequence the neural network (nn) module of signalp will first return three scores between 0 and 1 for each sequence position:

  • C-score (raw cleavage site score) The output score from networks trained to recognize cleavage sites vs. other sequence positions. Trained to be high at position +1 (immediately after the cleavage site), and low at all other posi†tions.
  • S-score (signal peptide score) The output score from networks trained to recognize signal peptide vs. non-signal-peptide positions. Trained to be high at all positions before the cleavage site, and low at positions after the cleav†age site and in the N-terminals of non-secretory proteins.
  • Y-score (combined cleavage site score) The prediction of cleavage site location is optimized by observing where the C-score is high and the S-score changes from a high to a low value. The Y-score formalizes this by combining the height of the C-score with the slope of the S-score.
    Specifically, the Y-score is a geometric average between the C-score and a smoothed derivative of the S-score (i.e. the difference between the mean S-score over d positions before and d positions after the current position, where d varies with the chosen network ensemble).

signalp will then report the maximal C-, S-, and Y-scores, the mean S-score in the interval between the N-terminal and the site with the maximal Y-score and, finally, the D-score, the average of the S-mean and Y-max score.

The high detail level of the output is intended to allow for interpretation of borderline cases by the user.

If the sequence is predicted to have a signal peptide, the predicted cleavage site is located immediately before the position with the maximal Y-score.

gif graphic in GIF Binary signalp_graphic GIF $graphics eq "gif" or $graphics eq "gif+eps" graphics == "gif" or graphics == "gif+eps" "*.gif" "*.gif" eps graphic in eps Binary signalp_graphic EPS $graphics eq "gif+eps" graphics == "gif+eps" "*.gif" "*.gif" mature_result a FASTA file with mature sequences based on the predictions Sequence FASTA $mature mature "${sequence}_mature.fasta" "%s_mature.fasta"%sequence n_s_e_result a GFF (name-start-end) file with the predicted signal peptides Feature AbstractText GFF $n_s_e n_s_e "${sequence}.gff" "%s.gff"%sequence
Programs-5.1.1/saps.xml0000644000175000001560000002100111767572177013660 0ustar bneronsis saps SAPS Statistical Analysis of Protein Sequences V. Brendel Brendel, V., Bucher, P., Nourbakhsh, I., Blaisdell, B.E., Karlin, S. (1992) Methods and algorithms for statistical analysis of protein sequences. Proc. Natl. Acad. Sci. USA 89: 2002-2006. sequence:protein:composition saps seq Protein sequence(s) File Protein Sequence EMBL " $value" " "+str(value) 2 control Control options 1 specie Use this specie for quantile comparisons (-s) Choice swp23s BACSU DROME HUMAN RAT YEAST CHICK ECOLI MOUSE XENLA swp23s (defined $value and $value ne $vdef)? " -s $value" : "" ( "" , " -s " + str(value) )[ value is not None and value != vdef] H_positive Count H as positive charge (-H) Boolean 0 ($value)? " -H":"" ("" , " -H")[ value ] By default, SAPS treats only lysine (K) and arginine (R) as positively charged residues. If the command line flag -H is set, then histidine (H) is also treated as positively charged in all parts of the program involving the charge alphabet. analyze Analyze spacings of amino acids X, Y, .... (-a) String (defined $value)? " -a $value":"" ("" , " -a " + str(value))[ value is not None ] Clusters of particular amino acid types may be evaluated by means of the same tests that are used to detect clustering of charged residues (binomial model and scoring statistics). These tests are invoked by setting this flag; for example, to test (separately) for clusters of alanine (A) and serine (S), set this parameter to AS. The binomial test is also programmed for certain combinations of amino acids: AG (flag -a a), PEST (flag -a p), QP (flag -a q), ST (flag -a s). output Output options 1 documented Generate documented output (-d) Boolean 0 ($value)? " -d":"" ("" , " -d")[ value ] The output will come with documentation that annotates each part of the program; this flag should be set when SAPS is used for the first time as it provides helpful explanations with respect to the statistics being used and the layout of the output. terse Generate terse output (-t) Boolean 0 ($value)? " -t":"" ("" , " -t")[ value ] This flag specifies terse output that is limited to the analysis of the charge distribution and of high scoring segments. verbose Generate verbose output (-v) Boolean 0 ($value)? " -v":"" ("" , " -v")[ value ] table Append computer-readable table summary output (-T) Boolean 0 ($value)? " -T":"" ("" , " -T")[ value ] This flag is used in conjunction with the analysis of sets of proteins ; if specified, the file saps.table is appended with computer-readable lines describing the input files and their significant features. tablefile Summary table output Text $table table "*.table" "*.table" Programs-5.1.1/squizz_convert.xml0000644000175000001560000003113512105210041015766 0ustar bneronsis squizz_convert 0.99b SQUIZZ Sequence/Alignment format converter N. Joly http://bioweb2.pasteur.fr/docs/squizz/seqfmt.html http://bioweb2.pasteur.fr/docs/squizz/alifmt.html alignment:formatter sequence:formatter squizz seq Sequence section not $infile_aln or ($infile_seq and $infile_aln) not infile_aln or (infile_seq and infile_aln) infile_seq Input Sequence EMBL FASTA GCG GENBANK IG NBRF CODATA RAW SWISSPROT 1,n " ($value)" " "+str(value) You must enter either a sequence or a alignment. not $infile_aln not infile_aln 2 convert_seq Convert sequence into sequence format (-c) Choice null null EMBL FASTA GCG GDE GENBANK IG NBRF CODATA RAW SWISSPROT (defined $value) ? " -S -c $value" : "" ("", " -S -c "+str(value))[value is not None] 1 seq_outfile Sequence(s) file Sequence 1,n defined $infile_seq infile_seq is not None "squizz_convert.out" "squizz_convert.out" aln Alignment section not $infile_seq or ($infile_seq and $infile_aln) not infile_seq or (infile_seq and infile_aln) infile_aln Input Alignment CLUSTAL PHYLIPI PHYLIPS MSF NEXUS STOCKHOLM FASTA MEGA " $value" " "+str(value) Can not handle both Sequence and Alignment at the same time not $infile_seq not infile_seq 2 convert_aln Convert alignment into alignment format (-c) Choice not defined $convert_seq2 or (defined $convert_aln and defined $convert_seq2) convert_seq2 is None or (convert_aln is not None and convert_seq2 is not None) null null CLUSTAL FASTA MEGA MSF NEXUS PHYLIPI PHYLIPS STOCKHOLM (defined $value) ? "-A -c $value" : "" ("", " -A -c "+str(value))[value is not None] Cannot convert to both Sequence and Alignment at the same time defined $convert_aln and not defined $convert_seq2 convert_aln is not None and convert_seq2 is None 1 convert_seq2 Convert alignment into sequence format (-c) Choice not defined $convert_aln or (defined $convert_aln and defined $convert_seq2) convert_aln is None or (convert_aln is not None and convert_seq2 is not None) null null EMBL FASTA GCG GDE GENBANK IG NBRF CODATA RAW SWISSPROT (defined $value) ? " -c $value" : "" ("", " -c "+str(value))[value is not None] Cannot convert to both Sequence and Alignment at the same time not defined $convert_aln and defined $convert_seq2 convert_aln is None and convert_seq2 is not None 1 aln_outfile Alignment file Alignment defined $infile_aln and defined $convert_aln infile_aln is not None and convert_aln is not None "squizz_convert.out" "squizz_convert.out" seq2_outfile Sequence(s) file Sequence defined $infile_aln and defined $convert_seq2 infile_aln is not None and convert_seq2 is not None "squizz_convert.out" "squizz_convert.out" Programs-5.1.1/blast2taxoclass.xml0000644000175000001560000003370612103731725016017 0ustar bneronsis blast2taxoclass 1.0 blast2taxoclass Blast filtering with taxonomic hierarchy information C. Maufrais database:search:filter blast2taxoclass infile Blast output file BlastTextReport Report " -i $value" " -i " + str(value) 20 blastfilter Find taxonomic classification of: Choice M M F ($value) ? " -$value" : "" " -" + str(value) nbofhit Number of hsp to consider (-x) Integer 10 (defined $value) ? " -x $value" : "" ("", " -x " + str(value) )[value is not None and value != vdef] 0: all hsp taxonomicfilter Taxonomic hierarchy filter option position Relative position in taxonomic hierarchy (-p) Integer (defined $value) ? " -p $value" : "" ("", " -p " + str(value) )[value is not None] Choose only one of taxonomic hierarchy filter option: Relative position, Taxonomic rank or Taxonomic name. (defined $position and (not defined $taxonomic_name and not defined $rank)) (position is not None and (taxonomic_name is None and rank is None)) or (taxonomic_name is not None and (position is None and rank is None)) or (rank is not None and (taxonomic_name is None and position is None)) zero means: root of taxonomy, higher value: leaf or near taxonomic_name Taxonomic Name (-n) String (defined $value) ? " -n $value" : "" ("", " -n " + str(value).replace(' ','_') )[value is not None] Choose only one of taxonomic hierarchy filter option: Relative position, Taxonomic rank or Taxonomic name. (defined $position and (not defined $taxonomic_name and not defined $rank)) (position is not None and (taxonomic_name is None and rank is None)) or (taxonomic_name is not None and (position is None and rank is None)) or (rank is not None and (taxonomic_name is None and position is None)) rank Taxonomic rank name (-r) Choice null null superkingdom kingdom subkingdom superphylum phylum subphylum superclass class subclass infraclass superorder order suborder infraorder parvorder superfamily family subfamily tribe subtribe genus subgenus species_group species_subgroup species subspecies varietas forma (defined $value) ? " -r $value" : "" ("", " -r " + str(value) )[value is not None] Choose only one of taxonomic hierarchy filter option: Relative position, Taxonomic rank or Taxonomic name. (defined $position and (not defined $taxonomic_name and not defined $rank)) (position is not None and (taxonomic_name is None and rank is None)) or (taxonomic_name is not None and (position is None and rank is None)) or (rank is not None and (taxonomic_name is None and position is None)) If Taxonomic rank is not defined for one hit, it is not treated. output Output option blastout Blast output file(s) sort/split by specific taxonomic hierarchy (-b) Boolean 0 ($value)? " -b" : "" ("" , " -b") [value] queryout Query name write in file(s) sort/split by specific taxonomic hierarchy (-q) Boolean 0 ($value)? " -q" : "" ("" , " -q") [value] fastaExtract Extraction of fasta sequences. Boolean 0 Query name write in file must be checked and query sequences must be done. $fastaExtract == 1 and $queryout == 1 and defined $query_seq (fastaExtract and (queryout and query_seq is not None)) or (not fastaExtract) Extract fasta sequence, matching specified taxonomic filter, from file containing query sequences witch are used to made blast. query_seq Query sequences witch are used to made blast. Sequence FASTA 1,n defined $fastaExtract and defined $queryout fastaExtract and queryout (defined $value)? " && extractfasta -i $query *.qry": "" (""," && extractfasta -i "+ str(value) + " *.qry") [value is not None] 100 outfile Output file Blast2taxoclassReport Report "blast2taxoclass.out" "blast2taxoclass.out" blastoutfile Blast output file(s) BlastTextReport Report defined $blastout blastout "*.blast" "*.blast" queryoutfile Query name file QueryNameReport Report defined $queryout queryout "*.qry" "*.qry" fastafile Fasta file Sequence defined $fastaExtract fastaExtract "*.fasta" "*.fasta" Programs-5.1.1/mix.xml0000644000175000001560000006667111724156742013522 0ustar bneronsis mix mix Mixed method parsimony http://bioweb2.pasteur.fr/docs/phylip/doc/mix.html MIX is a general parsimony program which carries out the Wagner and Camin-Sokal parsimony methods in mixture, where each character can have its method specified separately. The program defaults to carrying out Wagner parsimony. phylogeny:parsimony mix String " && mix <mix.params" " && mix <mix.params" 0 infile Input File PhylipDiscreteCharMatrix AbstractText $infile ne "infile" infile != "infile" "ln -s $infile infile; " "ln -s "+ str( infile ) + " infile " -10 5 6 Alpha 110110 Beta 110000 Gamma 100110 Delta 001001 Epsilon 001110 ancestral_opt Ancestral options use_ancestral_state Use ancestral states (A) Boolean 0 ($value) ? "A\\n" : "" ( "" , "A\n")[ value ] 1 Give a ancestors file whenever you enable this parameter mix.params ancestors_file Ancestors file AncestorsFile AbstractText $use_ancestral_state use_ancestral_state (defined $value) ? " && ln -s $ancestors_file ancestors" : "" ( "" , " && ln -s " + str( ancestors_file ) + " ancestors")[ value is not None ] -1 The A (Ancestral states) option. This indicates that we are specifying the ancestral states for each character. In the menu the ancestors (A) option must be selected. An ancestral states input file is read, whose default name is ancestors. It contains a line or lines giving the ancestral states for each character. These may be 0, 1 or ? the latter indicating that the ancestral state is unknown. An example is: 001??11 The ancestor information can be continued to a new line and can have blanks between any of the characters in the same way that species character data can. mix_opt Mix options parsimony_method Parsimony method (P) Choice not $use_mixed not use_mixed wagner wagner camin ($value eq "camin") ? "P\\n" : "" ( "" , "P\n" )[ value == "camin" ] 1 Only if Use Mixed method is disabled. mix.params use_mixed Use Mixed method (X) Boolean 0 ($value) ? "X\\n" : "" ( "" , "X\n")[ value ] 1 Give a mixure file whenever you choose Mixed method mix.params mixture_file Mixture file MixturePattern AbstractText $use_mixed use_mixed (defined $value) ? " && ln -s $mixture_file mixture" : "" ( "" , " && ln -s " + str( mixture_file ) + " mixture")[ value is not None ] -1 The X (miXture) option. Move, and Penny the user can specify for each character which parsimony method is in effect. This is done by selecting menu option X (not M) and having an input mixture file . It contains a line or lines with and one letter for each character. These letters are C or S if the character is to be reconstructed according to Camin-Sokal parsimony, W or ? if the character is to be reconstructed according to Wagner parsimony. So if there are 20 characters the line giving the mixture might look like this: WWWCC WWCWC Note that blanks in the sequence of characters (after the first ones that are as long as the species names) will be ignored, and the information can go on to a new line at any point. So this could equally well have been specified by WW CCCWWCWC jumble_dataset Randomize and Multiple data set options jumble_or_dataset I want to Choice null null J D 9 data_nb number of data set If you choose Analyze multiple data sets, you must indicate the number of sets you have. Integer $jumble_or_dataset eq 'D' jumble_or_dataset == 'D' Enter a value > 0 ; there must be no more than 1000 datasets for this server $value > 0 and $value <= 1000 value > 0 and value <= 1000 9 seed Random number seed (must be odd) Integer $jumble_or_dataset jumble_or_dataset Random number seed must be odd $value >= 0 and ($value % 2) != 0 value >= 0 and (value % 2) != 0 9 times Number of times to jumble Integer 1 defined $jumble_or_dataset jumble_or_dataset is not None the product of "number of times to jumble" and number of data set (if define) must be less than 100000 ($times * (defined $data_nb) ? $data_nb : 1) <= 100000 times * (1, data_nb)[data_nb is not None] <= 100000 9 jumble Integer ( defined $jumble_or_dataset ) and ( $jumble_or_dataset eq "J" ) ( jumble_or_dataset is not None ) and ( jumble_or_dataset == "J" ) "J\\n$seed\\n$times\\n" 'J\n' + str( seed ) + "\n" + str( times ) +"\n" 9 mix.params multiple_dataset String ( defined $jumble_or_dataset ) and ( $jumble_or_dataset eq 'D') ( jumble_or_dataset is not None ) and ( jumble_or_dataset == "D" ) "M\nD\n" + str( data_nb ) + "\n" + str( seed ) +"\n" + str( times ) +"\n" "M\nD\n" + str( data_nb ) + "\n" + str( seed ) +"\n" + str( times ) +"\n" 9 mix.params consense Compute a consensus tree Boolean $jumble_or_dataset eq 'D' and $print_treefile jumble_or_dataset == 'D' and print_treefile 0 ($value) ? " && cp infile mix.infile && cp mix.outtree intree && consense <consense.params && mv outtree consense.outtree && mv outfile consense.outfile" : "" ("" , " && cp infile mix.infile && cp mix.outtree intree && consense <consense.params && mv outtree consense.outtree && mv outfile consense.outfile")[ value ] 10 user_tree_opt User tree options user_tree Use User tree (default: search for best tree) (U) Boolean 0 ($value) ? "U\\n" : "" ( "" , "U\n" )[ value ] You cannot randomize (jumble) your dataset and give a user tree at the same time not ( $user_tree and $jumble ) not( user_tree and jumble ) 1 The user-defined trees supplied if you use the U option must be given as rooted trees with two-way splits (bifurcations) mix.params tree_file User Tree file Tree NEWICK $user_tree user_tree (defined $value)? " && ln -s $tree_file intree" : "" ( "" , " && ln -s " + str( tree_file ) + " intree")[ value is not None ] -1 Give a tree whenever the infile does not already contain the tree. tree_number How many tree(s) in the User Tree file Integer $tree_file tree_file 1 " && echo $value >> infile" " && echo "+ str( value ) + ">> infile" -2 Give this information whenever the infile does not already contain the tree. output Output options print_tree Print out tree (3) Boolean 1 ($value) ? "" : "3\\n" ( "3\n" ,"" )[ value ] 1 Tells the program to print a semi-graphical picture of the tree in the outfile. mix.params print_step Print out steps in each character (4) Boolean 0 ($value) ? "4\\n" : "" ( "" , "4\n" )[ value ] 1 mix.params print_states Print states at all nodes of tree (5) Boolean 0 ($value) ? "5\\n" : "" ( "" , "5\n" )[ value ] 1 mix.params print_treefile Write out trees onto tree file (6) Boolean NEWICK 1 ($value) ? "" : "6\\n" ( "6\n" , "" )[ value ] 1 Tells the program to save the tree in a treefile (a standard representation of trees where the tree is specified by a nested pairs of parentheses, enclosing names and separated by commas). mix.params printdata Print out the data at start of run (1) Boolean 0 ($value) ? "1\\n" : "" ( "" , "1\n" )[ value ] 1 mix.params pars_opt Parsimony options use_threshold Use Threshold parsimony (T) Boolean 0 ($value) ? "T\\n$threshold\\n" : "" ( "" , "T\n" +str( threshold ) +"\n")[ value ] 3 mix.params threshold Threshold parsimony value Integer $use_threshold use_threshold "" "" You must enter a numeric value, greater than or equal to 1 $threshold >= 1 threshold >= 1 2 mix.params other_options Other options outgroup Outgroup root species (O) Integer 1 (defined $value and $value != $vdef) ? "O\\n$value\\n" : "" ( "" , "O\n" + str( value ) + "\n" )[ value is not None and value != vdef ] Please enter a value greater than 0 $value > 0 value > 0 1 mix.params outfile Mix output file Text " && mv outfile mix.outfile" " && mv outfile mix.outfile" "mix.outfile" "mix.outfile" treefile Mix tree file Tree NEWICK $print_treefile print_treefile " && mv outtree mix.outtree" " && mv outtree mix.outtree" "mix.outtree" "mix.outtree" confirm String "y\\n" "y\n" 1000 mix.params terminal_type String "0\\n" "0\n" -1 mix.params consense_confirm String $consense consense "Y\\n" "Y\n" 1000 consense.params consense_terminal_type String $consense consense "T\\n" "T\n" -2 consense.params consense_outfile Consense output file Text $consense consense "consense.outfile" "consense.outfile" consense_treefile Consense tree file Tree NEWICK $consense consense "consense.outtree" "consense.outtree" Programs-5.1.1/coderet.xml0000644000175000001560000003464312072525233014334 0ustar bneronsis coderet EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net coderet Extract CDS, mRNA and translations from feature tables http://bioweb2.pasteur.fr/docs/EMBOSS/coderet.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:edit coderet e_input Input section e_seqall seqall option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -seqall=" + str(value))[value is not None] 1 coderet was limited to EMBL/GenBank feature tables e_output Output section e_outfile Name of the output file (e_outfile) Filename outfile.coderet ("" , " -outfile=" + str(value))[value is not None] 2 e_outfile_out outfile_out option CoderetReport Report e_outfile e_cdsoutseq Name of the output sequence file (e_cdsoutseq) DNA Filename outseq.cds ("" , " -cdsoutseq=" + str(value))[value is not None] 3 e_osformat_cdsoutseq Choose the sequence output format DNA Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 4 e_cdsoutseq_out cdsoutseq_out option DNA Sequence e_cdsoutseq e_mrnaoutseq Name of the output sequence file (e_mrnaoutseq) DNA Filename outseq.mrna ("" , " -mrnaoutseq=" + str(value))[value is not None] 5 e_osformat_mrnaoutseq Choose the sequence output format DNA Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 6 e_mrnaoutseq_out mrnaoutseq_out option DNA Sequence e_mrnaoutseq e_translationoutseq Name of the output sequence file (e_translationoutseq) Protein Filename outseq.prot ("" , " -translationoutseq=" + str(value))[value is not None] 7 e_osformat_translationoutseq Choose the sequence output format Protein Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 8 e_translationoutseq_out translationoutseq_out option Protein Sequence e_translationoutseq e_restoutseq Name of the output sequence file (e_restoutseq) DNA Filename outseq.noncoding ("" , " -restoutseq=" + str(value))[value is not None] 9 e_osformat_restoutseq Choose the sequence output format DNA Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 10 e_restoutseq_out restoutseq_out option DNA Sequence e_restoutseq auto Turn off any prompting String " -auto -stdout" 11 Programs-5.1.1/recoder.xml0000644000175000001560000002057112072525233014325 0ustar bneronsis recoder EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net recoder Find restriction sites to remove (mutate) with no translation change http://bioweb2.pasteur.fr/docs/EMBOSS/recoder.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:restriction recoder e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_enzymes Comma separated enzyme list String all ("", " -enzymes=" + str(value))[value is not None and value!=vdef] 2 e_output Output section e_sshow Display untranslated sequence Boolean 0 ("", " -sshow")[ bool(value) ] 3 e_tshow Display translated sequence Boolean 0 ("", " -tshow")[ bool(value) ] 4 e_outfile Name of the report file Filename recoder.report ("" , " -outfile=" + str(value))[value is not None] 5 e_rformat_outfile Choose the report output format Choice TABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 6 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/clustalO-profile.xml0000644000175000001560000005611312104230615016120 0ustar bneronsis clustalO-profile Clustal-Omega: Profile alignment Align 2 profiles (alignments) Use this interface to align two alignments (profiles) together. The columns in each profile will be kept fixed and the alignment of the two profiles will be written out. alignment:multiple clustalo input Data Input the columns in each profile will be kept fixed and the alignment of the two profiles will be written out. Use this option to align two alignments (profiles) together. profile1 Pre-aligned multiple sequence file (aligned columns will be kept fixed) Protein Alignment FASTA CLUSTAL STOCKHOLM MSF PHYLPI " --profile1=$value" " --profile1=" + str( value ) profile2 Pre-aligned multiple sequence file (aligned columns will be kept fixed) Protein Alignment FASTA CLUSTAL STOCKHOLM MSF 1 " --profile2=$value" " --profile2=" + str( value ) seqtype type of sequences Choice auto auto Protein RNA DNA (defined $value and $value neq $vdef)? " --seqtype=$value" : "" ("", " --seqtype="+str(value))[value is not None and value != vdef] Since version 1.1.0 the Clustal-Omega alignment engine can process DNA/RNA. Clustal-Omega tries to guess the sequence type (protein, DNA/RNA), but this can be over-ruled with this flag. clustering Clustering In order to produce a multiple alignment Clustal-Omega requires a guide tree which defines the order in which sequences/profiles are aligned. A guide tree in turn is constructed, based on a distance matrix. Conventionally, this distance matrix is comprised of all the pair-wise distances of the sequences. The distance measure Clustal-Omega uses for pair-wise distances of un-aligned sequences is the k-tuple measure [4], which was also implemented in Clustal 1.83 and ClustalW2 [5,6]. If the sequences inputted via -i are aligned Clustal-Omega uses the Kimura-corrected pairwise aligned identities [7]. The computational effort (time/memory) to calculate and store a full distance matrix grows quadratically with the number of sequences. Clustal-Omega can improve this scalability to N*log(N) by employing a fast clustering algorithm called mBed [2]; this option is automatically invoked (default). If a full distance matrix evaluation is desired, then the --full flag has to be set. The mBed mode calculates a reduced set of pair-wise distances. These distances are used in a k-means algorithm, that clusters at most 100 sequences. For each cluster a full distance matrix is calculated. No full distance matrix (of all input sequences) is calculated in mBed mode. If there are less than 100 sequences in the input, then in effect a full distance matrix is calculated in mBed mode, however, no distance matrix can be outputted (see below). Clustal-Omega uses Muscle's [8] fast UPGMA implementation to construct its guide trees from the distance matrix. By default, the distance matrix is used internally to construct the guide tree and is then discarded. By specifying --distmat-out the internal distance matrix can be written to file. This is only possible in --full mode. The guide trees by default are used internally to guide the multiple alignment and are then discarded. By specifying the --guidetree-out option these internal guide trees can be written out to file. Conversely, the distance calculation and/or guide tree building stage can be skipped, by reading in a pre-calculated distance matrix and/or pre-calculated guide tree. These options are invoked by specifying the --distmat-in and/or --guidetree-in flags, respectively. However, distance matrix reading is disabled in the current version. By default, distance matrix and guide tree files are not over-written, if a file with the specified name already exists. In this case Clustal-Omega aborts during the command-line processing stage. In mBed mode a full distance matrix cannot be outputted, distance matrix output is only possible in --full mode. mBed or --full distance mode do not affect the ability to write out guide-trees. Guide trees can be iterated to refine the alignment (see section ITERATION). Clustal-Omega takes the alignment, that was produced initially and constructs a new distance matrix from this alignment. The distance measure used at this stage is the Kimura distance [7]. By default, Clustal-Omega constructs a reduced distance matrix at this stage using the mBed algorithm, which will then be used to create an improved (iterated) new guide tree. To turn off mBed-like clustering at this stage the --full-iter flag has to be set. While Kimura distances in general are much faster to calculate than k-tuple distances, time and memory requirements still scale quadratically with the number of sequences and --full-iter clustering should only be considered for smaller cases ( << 10,000 sequences). [2] Blackshields G, Sievers F, Shi W, Wilm A, Higgins DG. Sequence embedding for fast construction of guide trees for multiple sequence alignment. Algorithms Mol Biol. 2010 May 14;5:21. [4] Wilbur and Lipman, 1983; PMID 6572363 [5] Thompson JD, Higgins DG, Gibson TJ. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673-4680. [6] Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG. (2007). Clustal W and Clustal X version 2.0. Bioinformatics, 23, 2947-2948. [7] Kimura M (1980). "A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences". Journal of Molecular Evolution 16: 111–120. guidetree_in Guide tree input file (--guidetree-in) Tree NEWICK (defined $value )? " --guidetree-in= $value" : "" ( "" , " --guidetree-in="+str(value))[ value is not None ] full Use full distance matrix for guide-tree calculation (slow; mBed is default) (--full) Boolean 0 (defined $full and $ full)? " --full ": "" ( "" , " --full ")[ value is not None and value ] full_iter Use full distance matrix for guide-tree calculation during iteration (mBed is default) (--full-iter) Boolean 0 (defined $full and $ full)? " --full-iter ": "" ( "" , " --full-iter ")[ value is not None and value ] output_format Alignment Output output_format alignment output format Choice fasta fasta clustal msf phylip stockholm vienna (defined $value and $value ne $vdef)? " --outfmt=$value" : "" ( "" , " --outfmt=" + value )[ value is not None and value != vdef ] iteration Iteration By default, Clustal-Omega calculates (or reads in) a guide tree and performs a multiple alignment in the order specified by this guide tree. This alignment is then outputted. Clustal-Omega can 'iterate' its guide tree. The hope is that the (Kimura) distances, that can be derived from the initial alignment, will give rise to a better guide tree, and by extension, to a better alignment. A similar rationale applies to HMM-iteration. MSAs in general are very 'vulnerable' at their early stages. Sequences that are aligned at an early stage remain fixed for the rest of the MSA. Another way of putting this is: 'once a gap, always a gap'. This behaviour can be mitigated by HMM iteration. An initial alignment is created and turned into a HMM. This HMM can help in a new round of MSA to 'anticipate' where residues should align. This is using the HMM as an External Profile and carrying out iterative EPA. In practice, individual sequences and profiles are aligned to the External HMM, derived after the initial alignment. Pseudo-count information is then transferred to the (internal) HMM, corresponding to the individual sequence/profile. The now somewhat 'softened' sequences/profiles are then in turn aligned in the order specified by the guide tree. Pseudo-count transfer is reduced with the size of the profile. Individual sequences attain the greatest pseudo-count transfer, larger profiles less so. Pseudo-count transfer to profiles larger than, say, 10 is negligible. The effect of HMM iteration is more pronounced in larger test sets (that is, with more sequences). Both, HMM- and guide tree-iteration come at a cost of increasing the run-time. One round of guide tree iteration adds on (roughly) the time it took to construct the initial alignment. If, for example, the initial alignment took 1min, then it will take (roughly) 2min to iterate the guide tree once, 3min to iterate the guide tree twice, and so on. HMM-iteration is more costly, as each round of iteration adds three times the time required for the alignment stage. For example, if the initial alignment took 1min, then each additional round of HMM iteration will add on 3min; so 4 iterations will take 13min (=1min+4*3min). The factor of 3 stems from the fact that at every stage both intermediate profiles have to be aligned with the background HMM, and finally the (softened) HMMs have to be aligned as well. All times are quoted for single processors. By default, guide tree iteration and HMM-iteration are coupled. This means, at each iteration step both, guide tree and HMM, are re-calculated. This is invoked by setting the --iter flag. For example, if --iter=1, then first an initial alignment is produced (without external HMM background information and using k-tuple distances to calculate the guide tree). This initial alignment is then used to re-calculate a new guide tree (using Kimura distances) and to create a HMM. The new guide tree and the HMM are then used to produce a new MSA. Iteration of guide tree and HMM can be de-coupled. This means that the number of guide tree iterations and HMM iterations can be different. This can be done by combining the --iter flag with the --max-guidetree-iterations and/or the --max-hmm-iterations flag. The number of guide tree iterations is the minimum of --iter and --max-guidetree-iterations, while the number of HMM iterations is the minimum of --iter and --max-hmm-iterations. If, for example, HMM iteration should be performed 5 times but guide tree iteration should be performed only 3 times, then one should set --iter=5 and --max-guidetree-iterations=3. All three flags can be specified at the same time (however, this makes no sense). It is not sufficient just to specify --max-guidetree-iterations and --max-hmm-iterations but not --iter. If any iteration is desired --iter has to be set. iterations Number of (combined guide-tree/HMM) iterations (--iter) Integer (defined $value)? " --iter=$value ": "" ( "" , " --iter="+str(value) )[ value is not None ] max_guidetree_iterations Maximum number guidetree iterations (--max-guidetree-iterations) Integer (defined $value)? " --max-guidetree-iterations=$value ": "" ( "" , " --max-guidetree-iterations="+str(value) )[ value is not None ] max_hmm_iterations Maximum number of HMM iterations (--max-hmm-iterations) Integer (defined $value)? " --max-hmm-iterations=$value ": "" ( "" , " --max-hmm-iterations="+str(value) )[ value is not None ] miscellaneous Miscellaneous auto Set options automatically (might overwrite some of your options) (--auto) Boolean 0 (defined $value and $value)? " --auto ": "" ( "" , " --auto ")[value is not None and value] Users may feel unsure which options are appropriate in certain situations even though using ClustalO without any special options should give you the desired results. The --auto flag tries to alleviate this problem and selects accuracy/speed flags according to the number of sequences. For all cases will use mBed and thereby possibly overwrite the --full option. For more than 1,000 sequences the iteration is turned off as the effect of iteration is more noticeable for 'larger' problems. Otherwise iterations are set to 1 if not already set to a higher value by the user. Expert users may want to avoid this flag and exercise more fine tuned control by selecting the appropriate options manually. verbosity 100 String " -v --force --log=clustalO_log" " -v --force --log=clustalO_log" alignment_output Multiple Sequence Alignment Protein Alignment FASTA CLUSTAL MSF PHILIPI STOCKHOLM FASTA "clustalO-profile.out" "clustalO-profile.out" logfile Clustal omega log file ClustalOReport Report "clustalO_log" "clustalO_log" Programs-5.1.1/dotpath.xml0000644000175000001560000002345012072525233014344 0ustar bneronsis dotpath EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net dotpath Draw a non-overlapping wordmatch dotplot of two sequences http://bioweb2.pasteur.fr/docs/EMBOSS/dotpath.html http://emboss.sourceforge.net/docs/themes alignment:pairwise:dot_plots dotpath e_input Input section e_asequence asequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -asequence=" + str(value))[value is not None] 1 e_bsequence bsequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -bsequence=" + str(value))[value is not None] 2 e_required Required section e_wordsize Word size (value greater than or equal to 2) Integer 4 ("", " -wordsize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 2 is required value >= 2 3 e_output Output section e_overlaps Display the overlapping matches Boolean 0 ("", " -overlaps")[ bool(value) ] 4 Displays the overlapping matches (in red) as well as the minimal set of non-overlapping matches e_boxit Draw a box around dotplot Boolean 1 (" -noboxit", "")[ bool(value) ] 5 e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 6 e_goutfile Name of the output graph Filename dotpath_graph ("" , " -goutfile=" + str(value))[value is not None] 7 outgraph_png Graph file Picture Binary e_graph == "png" "*.png" outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" outgraph_data Graph file Text e_graph == "data" "*.dat" auto Turn off any prompting String " -auto -stdout" 8 Programs-5.1.1/xpound.xml0000644000175000001560000002526211767572177014244 0ustar bneronsis xpound Xpound Software for exon trapping Thomas and Skolnick A probabilistic model for detecting coding regions in DNA sequences. Alun Thomas and Mark H Skolnick, IMA Journal of Mathematics Applied in Medicine and Biology, 1994, 11, 149-160. sequence:nucleic:gene_finding xpound seq DNA sequence File Sequence RAW 1,1 " <$value" " <"+str(value) 2 Everything after a % on a line in the input file is ignored. Other than comment xpound expects only white space, which is also ignored, or IUPAC characters: A C M G R S V T W Y H K D B N in upper or lower case. Characters which do not uniquely determine a base, such as N, B, S and so on, are all interpreted as a C. Xpound will not accept the IUPAC character -, all occurences of which should be stripped from the input file beforehand. outfile Output file XpoundReport Report "xpound.out" "xpound.out" report_options Report options report Reports regions of bases for which the probability of coding is high (xreport) Boolean 1 ($value) ? " ; xreport <xpound.out " : "" ( "" , " ; xreport <xpound.out " )[ value ] 20 cut_off Cut off value for report Float $report report 0.75 (defined $value and $value != $vdef) ? " $value " : "" ( "" , " " + str(value) )[ value is not None and value != vdef] 21 min_length Minimum length value for report Integer $report report 0 (defined $value and $value != $vdef) ? " $value " : "" ( "" , " " + str(value) )[ value is not None and value != vdef] 22 report_file Report file XreportReport Report $report report " >xreport.out " ">xreport.out " 25 "xreport.out" "xreport.out" postscript_options Postscript options postscript Produces a file of graphs in PostScript format (xpscript) Boolean 1 ($value) ? "; xpscript xpound.out" : "" ( "" , "; xpscript xpound.out" )[ value ] 30 orientation Orientation (-l) Choice $postscript postscript portrait portrait lanscape ($value eq "lanscape") ? " -l " : "" ( "" , " -l " )[ value == "lanscape" ] 31 rows Rows of plots per page (-r) Integer $postscript postscript 5 (defined $value and $value != $vdef) ? " -r $value " : "" ( "" , " -r " + str(value) )[ value is not None and value != vdef] 32 columns Columns of plots per page (-c) Integer $postscript postscript 1 (defined $value and $value != $vdef) ? " -c $value " : "" ( "" , " -c " + str(value) )[ value is not None and value != vdef] 32 high Draw a line at this level (-hi) Float $postscript postscript 0.75 (defined $value and $value != $vdef) ? " -hi $value " : "" ( "" , " -hi " + str(value) )[ value is not None and value != vdef] 33 low Draw a line at this level (-lo) Float $postscript postscript 0.5 (defined $value and $value != $vdef) ? " -lo $value " : "" ( "" , " -lo " + str(value) )[ value is not None and value != vdef] 34 psfile PostScript file PostScript Binary $postscript postscript " >xpound.ps" " >xpound.ps" 100 "xpound.ps" "xpound.ps" Programs-5.1.1/pepnet.xml0000644000175000001560000002371612072525233014201 0ustar bneronsis pepnet EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net pepnet Draw a helical net for a protein sequence http://bioweb2.pasteur.fr/docs/EMBOSS/pepnet.html http://emboss.sourceforge.net/docs/themes display:protein:2D_structure structure:2D_structure pepnet e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_output Output section e_amphipathic Prompt for amphipathic residue marking Boolean 0 ("", " -amphipathic")[ bool(value) ] 2 If this is true then the residues ACFGILMVWY are marked as squares and all other residues are unmarked. This overrides any other markup that you may have specified using the qualifiers '-squares', '-diamonds' and '-octags'. e_squares Mark as squares String not e_amphipathic ILVM ("", " -squares=" + str(value))[value is not None and value!=vdef] 3 By default the aliphatic residues ILVM are marked with squares. e_diamonds Mark as diamonds String not e_amphipathic DENQST ("", " -diamonds=" + str(value))[value is not None and value!=vdef] 4 By default the residues DENQST are marked with diamonds. e_octags Mark as octagons String not e_amphipathic HKR ("", " -octags=" + str(value))[value is not None and value!=vdef] 5 By default the positively charged residues HKR are marked with octagons. e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 6 e_goutfile Name of the output graph Filename pepnet_graph ("" , " -goutfile=" + str(value))[value is not None] 7 outgraph_png Graph file Picture Binary e_graph == "png" "*.png" outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" outgraph_data Graph file Text e_graph == "data" "*.dat" auto Turn off any prompting String " -auto -stdout" 8 Programs-5.1.1/taxoptimizer.xml0000644000175000001560000001205712125257722015445 0ustar bneronsis taxoptimizer 1.0 taxoptimizer taxoptimizer reports taxonomic information for each Blast HIT C. Maufrais database:search:display taxoptimizer inputFile Blast Result input file BlastTextReport Report Tabulated file ' -i ' + str(value) 10 Output Options taxoptimizerOptions columnNumber Column number to parse (default second column: 2) Integer Column number to parse (default second column: 2) 20 ' -c ' + str(value) 2 database Specific database to reduce the query String Specified Database name for finding taxonomy in only one database. 40 ('',' -d '+ str(value))[value is not None] description Add description in output Boolean 70 0 ('',' -e ')[value] Add description (DE) in output OnlyTaxonomyInformation Limit the output only to sequences with taxonomic information Boolean 0 Only write lines with taxonomy information in output file 70 ('',' -x ')[value] NoTaxonomyInfo Reports sequences with no taxo output in another output file Boolean 0 Outputs sequences with no taxonomic information 80 ('',' -f ')[value] columnNumber SeparatorCharacter database description OnlyTaxonomyInformation NoTaxonomyInfo OutputFileName Output File Name String taxoptimizer.out OutputNoTaxonomy SequencesWithNoTaxoOutput BlastTextReport Report taxoptimizer secondary output 85 NoTaxonomyInfo is not None ' -f noTaxo' + OutputFileName "noTaxo"+ OutputFileName OutputFile Output file taxoptimizerTextReport Report OutputFileName Reports Taxonomic annotation and concatenates to blast results 90 ' -o ' + OutputFileName Programs-5.1.1/prose.xml0000644000175000001560000000667611767572177014067 0ustar bneronsis prose 0.02 PROSE Prosite Pattern search K. Schuerer ftp://ftp.pasteur.fr/pub/gensoft/projects/prose/ sequence:protein:pattern prose seqfile Protein Sequence File Sequence FASTA " $value" " " + str(value) 2 abundant Include abundant patterns (-s) Boolean 0 ($value) ? " -s" : "" ("", " -s")[value] report Report occurrences (-m) Choice short short long all ($value ne $vdef) ? " -m $value" : "" ("", " -m " + str(value))[value != vdef] case Perform case sensitive search (-c) Boolean 0 ($value) ? " -c" : "" ("", " -c")[value] plist Pattern list file (-l) ProsePattern AbstractText (defined $value) ? " -l $value" : "" ("", " -l " + str(value))[value is not None] This file require exactly one pattern per line, in the following format: NAME followed by PATTERN. Programs-5.1.1/rnaalifold.xml0000644000175000001560000004775111444656032015032 0ustar bneronsis rnaalifold RNAalifold Calculate secondary structures for a set of aligned RNAs D.H. Mathews, J. Sabina, M. Zuker and H. Turner "Expanded Sequence Dependence of Thermodynamic Parameters Provides Robust Prediction of RNA Secondary Structure" JMB, 288, pp 911-940, 1999 vo L. Hofacker, Martin Fekete, and Peter F. Stadler "Secondary Structure Prediction for Aligned RNA Sequences" J.Mol.Biol. 319: 1059-1066 (2002). http://www.tbi.univie.ac.at/RNA/RNAalifold.html RNAalifold reads aligned RNA sequences from file and calculates their minimum free energy (mfe) structure, partition function (pf) and base pairing probability matrix. Currently, the input alignment has to be in CLUSTAL format. It returns the mfe structure in bracket notation, its energy, the free energy of the thermodynamic ensemble and the frequency of the mfe structure in the ensemble to outfile. It also produces Postscript files with plots of the resulting secondary structure graph ("alirna.ps") and a "dot plot" of the base pairing matrix ("alidot.ps"). The file "alifold.out" will contain a list of likely pairs sorted by credibility. CAVEATS: Since gaps are not removed for the evaluation of energies, it may be of advantage to remove any columns with more than, say, 75% gaps from the alignment before folding with RNAalifold. Sequences are not weighted. If possible, do not mix very similar and dissimilar sequences. Duplicate sequences, for example, can distort the prediction. alignment:multiple:display structure:2D_structure RNAalifold seq Aligned RNA Sequences File Nucleic Alignment CLUSTAL " $value" " "+str( value ) 1000 control Control options 2 covariance Weight of the covariance (-cv) Integer 1 (defined $value and $value != $vdef)? " -cv $value" : "" ( "" , " -cv " + str(value) )[ value is not None and value != vdef] Set the weight of the covariance term in the energy function to factor. Default is 1. non_compatible Penalty for non-compatible sequences in the covariance (-nc) Integer 1 (defined $value and $value != $vdef)? " -nc $value" : "" ( "" , " -nc " + str(value) )[ value is not None and value != vdef] Set the penalty for non-compatible sequences in the covariance term of the energy function to factor. Default is 1. endgaps Score pairs with endgaps (-E) Boolean 0 ($value)? " -E" : "" ( "" , " -E" )[ value ] Score pairs with endgaps same as gap-gap pairs. partition Calculate the partition function and base pairing probability matrix (-p) Boolean 0 ($value)? " -p" : "" ( "" , " -p" )[ value ] Calculate the partition function and base pairing probability matrix in addition to the mfe structure. Default is calculation of mfe structure only. temperature Rescale energy parameters to a temperature of temp C. (-T) Integer 37 (defined $value and $value != $vdef)? " -T $value" : "" ( "" , " -T " + str(value) )[ value is not None and value != vdef] tetraloops Do not include special stabilizing energies for certain tetraloops (-4) Boolean 0 ($value)? " -4" : "" ( "" , " -4" )[ value ] dangling How to treat dangling end energies for bases adjacent to helices in free ends and multiloops (-d) Choice -d1 -d1 -d -d2 (defined $value and $value ne $vdef)? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] How to treat 'dangling end' energies for bases adjacent to helices in free ends and multiloops: Normally only unpaired bases can participate in at most one dangling end. With -d2 this check is ignored, this is the default for partition function folding (-p). -d ignores dangling ends altogether. Note that by default pf and mfe folding treat dangling ends differently, use -d2 (or -d) in addition to -p to ensure that both algorithms use the same energy model. The -d2 options is available for RNAfold, RNAeval, and RNAinverse only. input Input parameters 2 circ Circular RNA molecules (-circ)? Boolean 0 ($value)? " -circ" : "" ( "" , " -circ" )[ value ] Assume circular (instead of linear) RNA molecules. noLP Avoid structures without lonely pairs (helices of length 1) (-noLP) Boolean 0 ($value)? " -noLP" : "" ( "" , " -noLP" )[ value ] Produce structures without lonely pairs (helices of length 1). For partition function folding this only disallows pairs that can only occur isolated. Other pairs may still occasionally occur as helices of length 1. noGU Do not allow GU pairs (-noGU) Boolean 0 ($value)? " -noGU" : "" ( "" , " -noGU" )[ value ] noCloseGU Do not allow GU pairs at the end of helices (-noCloseGU) Boolean 0 ($value)? " -noCloseGU" : "" ( "" , " -noCloseGU" )[ value ] nsp Non standard pairs (comma seperated list) (-nsp) String (defined $value)? " -nsp $value" : "" ( "" , " -nsp " + str(value) )[ value is not None ] Allow other pairs in addition to the usual AU,GC,and GU pairs. pairs is a comma seperated list of additionally allowed pairs. If a the first character is a '-' then AB will imply that AB and BA are allowed pairs. e.g. RNAfold -nsp -GA will allow GA and AG pairs. Nonstandard pairs are given 0 stacking energy. parameter Energy parameter file (-P) EnergyParameterFile AbstractText (defined $value)? " -P $value" : "" ( "" , " -P " + str(value) )[ value is not None ] Read energy parameters from paramfile, instead of using the default parameter set. A sample parameterfile should accompany your distribution. See the RNAlib documentation for details on the file format. ribosum_matrix use specified Ribosum Matrix instead of normal energy model. ribosum_matrix AbstractText (defined $value)? " -P $value" : "" ( "" , " -P " + str(value) )[ value is not None ] use specified Ribosum Matrix instead of normal energy model. Matrices to use should be 6x6 matrices, the order of the terms is AU, CG, GC, GU, UA, UG. use_ribosum_matrix use Ribosum scoring matrix. Boolean (defined $value)? " -r $value" : "" ( "" , " -r ")[ value ] use Ribosum scoring matrix. The matrix is chosen according to the minimal and maximal pairwise identities of the sequences in the file. When using Ribosum scores, best benchmark results were achieved with options -cv 0.6 -nc 0.5 (see above). constraints Calculate structures subject to constraints (-C) Constraint AbstractText (defined $value)? " -C < $value" : "" ( "" , " -C <" + str( value) )[ value is not None ] The programm reads first the sequence then the a string containg constraints on the structure encoded with the symbols: | (the corresponding base has to be paired x (the base is unpaired) < (base i is paired with a base j>i) > (base i is paired with a base j<i) matching brackets ( ) (base i pairs base j) Pf folding ignores constraints of type '|' '<' and '>', but disallow all pairs conflicting with a constraint of type 'x' or '( )'. This is usually sufficient to enforce the constraint. output_options Output options informative Print most informative sequence instead of simple consensus (-mis)? Boolean 0 ($value)? " -mis" : "" ( "" , " -mis" )[ value ] color Produce a colored version of the consensus strcture plot (-color)? Boolean 0 ($value)? " -color" : "" ( "" , " -color" )[ value ] Produce a colored version of the consensus strcture plot "alirna.ps" (default black and white). aln Produce a colored and structure annotated alignment in PostScript format (-aln) ? Boolean 0 ($value)? " -aln" : "" ( "" , " -aln" )[ value ] Produce a colored and structure annotated alignment in PostScript format in the file "aln.ps". psfiles Postscript file PostScript Binary "*.ps" "*.ps" alifold_out list of likely pairs vienna_likely_pairs AbstractText $partition partition "alifold.out" "alifold.out" The file "alifold.out" will contain a list of likely pairs sorted by credibility, suitable for viewing with "AliDot.pl" Programs-5.1.1/cosa.xml0000644000175000001560000001144411525212156013625 0ustar bneronsis cosa cosa Clustal ouput structural analysis T. Rose This program gives simple statistics about residue conservation from clustal output files. There is the possibility to redirect the residue frequency at every position of the protein sequence in the PDB file corresponding to one of the identified sequences of the multiple alignment. This frequency or conservation index is put in place of B-factors and allows spectral coloring according to the index value in most of pdb structure viewers. alignment:structure structure:indexing cosa alig Alignment Protein Alignment CLUSTAL " $value" " "+str(value) 1 struct_pos Position in the sequence multialignment of the structure used as reference Integer " $value" " " + str( value ) 2 pdbin PDB entry Protein AbstractText _3DStructure PDB " $value" " " + str( value ) 3 pdbout Name of the output PDB file Filename tmp_clustal.pdb (defined $value) ? " $value" : "" ("", " " + str( value ) )[ value is not None ] 4 txtout Name of the output result file Filename tmp_stats.txt (defined $value) ? " $value" : "" ("" , " " + str( value ) )[ value is not None ] 5 default_pdbout PDB with the residue occurrence Protein AbstractText _3DStructure PDB PDB file format of the structure used as reference with the residue occurrence in place of B-factor $pdbout pdbout default_txtout Alignment and statistics Text Vertical sequence alignment and statistics $txtout txtout Programs-5.1.1/primersearch.xml0000644000175000001560000001174612072525233015372 0ustar bneronsis primersearch EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net primersearch Search DNA sequences for matches with primer pairs http://bioweb2.pasteur.fr/docs/EMBOSS/primersearch.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:primers primersearch e_input Input section e_seqall seqall option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -seqall=" + str(value))[value is not None] 1 e_infile Primer pairs file PrimerPairs AbstractText ("", " -infile=" + str(value))[value is not None] 2 e_required Required section e_mismatchpercent Allowed percent mismatch Integer 0 ("", " -mismatchpercent=" + str(value))[value is not None and value!=vdef] 3 e_output Output section e_outfile Name of the output file (e_outfile) Filename primersearch.e_outfile ("" , " -outfile=" + str(value))[value is not None] 4 e_outfile_out outfile_out option Primer3Report Report e_outfile auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/dnadist.xml0000644000175000001560000007205111745213176014336 0ustar bneronsis dnadist dnadist Compute distance matrix from nucleotide sequences http://bioweb2.pasteur.fr/docs/phylip/doc/dnadist.html This program uses nucleotide sequences to compute a distance matrix, under four different models of nucleotide substitution. It can also compute a table of similarity between the nucleotide sequences. The distance for each pair of species estimates the total branch length between the two species, and can be used in the distance matrix programs FITCH, KITSCH or NEIGHBOR. phylogeny:distance dnadist String "dnadist <dnadist.params" "dnadist <dnadist.params" 0 infile Alignment File DNA Alignment PHYLIPI "ln -s $infile infile && " "ln -s " + str( infile ) + " infile && " the name of this data can't be "infile" or "outfile" value not in ( "infile" , "outfile" ) $value ne "infile" and $value ne "outfile" -10 The input file must contained aligned sequences in PHYLIP format obtained by sequence alignment programs. 5 13 Alpha AACGTGGCCACAT Beta AAGGTCGCCACAC Gamma CAGTTCGCCACAA Delta GAGATTTCCGCCT Epsilon GAGATCTCCGCCC dnadist_opt Dnadist options distance Distance (D) Choice F84 F84 "" "" K "D\\n" "D\n" JC "D\\nD\\n" "D\nD\n" LogDet "D\\nD\\nD\\n" "D\nD\nD\n" Similarity "D\\nD\\nD\\nD\\n" "D\nD\nD\nD\n" 1 dnadist.params ratio Transition/transversion ratio (T) Float $distance eq "F84" or $distance eq "K" distance == "F84" or distance == "K" 2.0 (defined $value and $value != $vdef) ? "T\\n$value\\n" : "" ( "" , "T\n"+ str( value )+"\n" )[value is not None and value != vdef ] Transition/transversion ratio (T) must be a real number greater than 0.0 value >= 0.0 $value >= 0.0 1 The T option in this program does not stand for Threshold, but instead is the Transition/transversion option. The user is prompted for a real number greater than 0.0, as the expected ratio of transitions to transversions. Note that this is not the ratio of the first to the second kinds of events, but the resulting expected ratio of transitions to transversions. The exact relationship between these two quantities depends on the frequencies in the base pools. The default value of the T parameter if you do not use the T option is 2.0. dnadist.params gamma Gamma distributed rates across sites (G) Choice $distance eq "F84" or $distance eq "K" or $distance eq "JC" distance == "F84" or distance == "K" or distance == "JC" No No "" "" 1 "G\\n" "G\n" GI "G\\nG\\n" "G\nG\n" 5 dnadist.params variation_coeff Coefficient of variation of substitution rate among sites (must be positive) (if Gamma) Float $gamma ne "No" gamma != "No" (defined $value) ? "$value\\n" : "" ( "" , str( value ) + "\n" )[ value is not None ] 1010 In gamma distribution parameters, this is 1/(square root of alpha) dnadist.params invariant_sites Fraction of invariant sites (if Gamma) Float $gamma eq "GI" gamma == "GI" (defined $value) ? "$value\\n" : "" ( "" , str( value ) + "\n" )[ value is not None ] 1011 dnadist.params ACGT_frequencies Base frequencies for A, C, G, T/U (if not empirical) distance eq "F84" distance == "F84" These must add to 1 empirical_frequencies Use empirical base frequencies (F) Boolean 1 1 dnadist.params A_frequency Base frequencies for A Float not $empirical_frequencies not empirical_frequencies 0.25 "" "" 2 C_frequency Base frequencies for C Float not $empirical_frequencies not empirical_frequencies 0.25 "" "" 2 G_frequency Base frequencies for G Float not $empirical_frequencies not empirical_frequencies 0.25 "" "" 2 T_frequency Base frequencies for T/U Float not $empirical_frequencies not empirical_frequencies 0.25 "" "" 2 base_frequencies Float not $empirical_frequencies not empirical_frequencies "F\\n$A_frequency $C_frequency $G_frequency $T_frequency\\n" "F\n" + str( A_frequency ) + " " + str( C_frequency ) + " " + str( G_frequency ) + " " + str( T_frequency ) + "\n" 2 dnadist.params weight_opt Weight options weights Use weights for sites (W) Boolean 0 ($value) ? "W\\n" : "" ( "" , "W\n" )[ value ] 1 dnadist.params weights_file Weights file PhylipWeight AbstractText $weights weights (defined $value) ? "ln -s $weights_file weights && " : "" ( "" , " ln -s " + str( weights_file ) + " weights && " )[ value is not None] the name of this data can't be "infile" or "outfile" value not in ( "infile" , "outfile" ) $value ne "infile" and $value ne "outfile" -1 It selects a set of sites to be analyzed, ignoring the others. The sites selected are those with weight 1. The weights in it are a simple string of digits. Blanks in the weightfile are skipped over and ignored, and the weights can continue to a new line. 1011001001011 bootstrap Bootstrap options seqboot Perform a bootstrap before analysis Boolean 0 ($value) ? "seqboot < seqboot.params && mv outfile seqboot.outfile && rm infile && ln -s seqboot.outfile infile && " : "" ( "" , "seqboot < seqboot.params && mv outfile seqboot.outfile && rm infile && ln -s seqboot.outfile infile && " )[ value ] -5 By selecting this option, the bootstrap will be performed on your sequence file. So you don't need to perform a separated seqboot before. Don't give an already bootstrapped file to the program, this won't work! Method Resampling methods (J) Choice $seqboot seqboot bootstrap bootstrap "" "" jackknife "J\\n" "" permute_species "J\\nJ\\n" "J\nJ\n" permute_char "J\\nJ\\nJ\\n" "J\nJ\nJ\n" permute_within_species "J\\nJ\\nJ\\nJ\\n" "J\nJ\nJ\nJ\n" 1 1. The bootstrap. Bootstrapping was invented by Bradley Efron in 1979, and its use in phylogeny estimation was introduced by me (Felsenstein, 1985b). It involves creating a new data set by sampling N characters randomly with replacement, so that the resulting data set has the same size as the original, but some characters have been left out and others are duplicated. The random variation of the results from analyzing these bootstrapped data sets can be shown statistically to be typical of the variation that you would get from collecting new data sets. The method assumes that the characters evolve independently, an assumption that may not be realistic for many kinds of data. 2. Delete-half-jackknifing. This alternative to the bootstrap involves sampling a random half of the characters, and including them in the data but dropping the others. The resulting data sets are half the size of the original, and no characters are duplicated. The random variation from doing this should be very similar to that obtained from the bootstrap. The method is advocated by Wu (1986). 3. Permuting species for each characters. This method of resampling (well, OK, it may not be best to call it resampling) was introduced by Archie (1989) and Faith (1990; see also Faith and Cranston, 1991). It involves permuting the columns of the data matrix separately. This produces data matrices that have the same number and kinds of characters but no taxonomic structure. It is used for different purposes than the bootstrap, as it tests not the variation around an estimated tree but the hypothesis that there is no taxonomic structure in the data: if a statistic such as number of steps is significantly smaller in the actual data than it is in replicates that are permuted, then we can argue that there is some taxonomic structure in the data (though perhaps it might be just the presence of aa pair of sibling species). 4. Permuting characters order. This simply permutes the order of the characters, the same reordering being applied to all species. For many methods of tree inference this will make no difference to the outcome (unless one has rates of evolution correlated among adjacent sites). It is included as a possible step in carrying out a permutation test of homogeneity of characters (such as the Incongruence Length Difference test). 5. Permuting characters separately for each species. This is a method introduced by Steel, Lockhart, and Penny (1993) to permute data so as to destroy all phylogenetic structure, while keeping the base composition of each species the same as before. It shuffles the character order separately for each species. seqboot.params seqboot_seed Random number seed (must be odd) Integer $seqboot seqboot "$value\\n" str( value ) +"\n" Random number seed must be odd $value >= 0 and (($value % 2) != 0) value >= 0 and ( ( value % 2 ) != 0 ) 1000 seqboot.params replicates How many replicates Integer $seqboot seqboot 100 (defined $value and $value != $vdef) ? "R\\n$value\\n" : "" ( "" , "R\n" + str( value ) + "\n" )[ value is not None and value != vdef ] This server allows no more than 1000 replicates $value <= 1000 value <= 1000 Bad data sets number: it must be greater than 1 $value > 1 value > 1 1 seqboot.params outfile Outfile PhylipDistanceMatrix AbstractText " && mv outfile dnadist.outfile" " && mv outfile dnadist.outfile" 40 "dnadist.outfile" "dnadist.outfile" seqboot_out seqboot outfile SetOfAlignment AbstractText $seqboot seqboot 40 "seqboot.outfile" "seqboot.outfile" confirm String "y\\n" "y\n" 1000 dnadist.params terminal_type String "0\\n" "0\n" -1 dnadist.params multiple_dataset String $seqboot and $replicates > 1 seqboot and replicates > 1 "M\\nD\\n$replicates\\n" "M\nD\n" + str( replicates ) + str("\n") 1 dnadist.params seqboot_confirm String $seqboot seqboot "y\\n" "y\n" 100 seqboot.params seqboot_terminal_type String $seqboot seqboot "0\\n" "0\n" -1 seqboot.params Programs-5.1.1/extractseq.xml0000644000175000001560000001642312072525233015066 0ustar bneronsis extractseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net extractseq Extract regions from a sequence http://bioweb2.pasteur.fr/docs/EMBOSS/extractseq.html http://emboss.sourceforge.net/docs/themes sequence:edit extractseq e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_regions Regions to extract (eg: 4-57,78-94) String ("", " -regions=" + str(value))[value is not None] 2 Regions to extract. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 e_additional Additional section e_separate Write regions to separate sequences Boolean 0 ("", " -separate")[ bool(value) ] 3 If this is set true then each specified region is written out as a separate sequence. The name of the sequence is created from the name of the original sequence with the start and end positions of the range appended with underscore characters between them, eg: XYZ region 2 to 34 is written as: XYZ_2_34 e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename extractseq.e_outseq ("" , " -outseq=" + str(value))[value is not None] 4 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 5 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 6 Programs-5.1.1/stretcher.xml0000644000175000001560000004425712072525233014714 0ustar bneronsis stretcher EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net stretcher Needleman-Wunsch rapid global alignment of two sequences http://bioweb2.pasteur.fr/docs/EMBOSS/stretcher.html http://emboss.sourceforge.net/docs/themes alignment:pairwise:global stretcher e_input Input section e_asequence asequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -asequence=" + str(value))[value is not None] 1 e_bsequence bsequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -bsequence=" + str(value))[value is not None] 2 e_datafile Matrix file Choice mobyle_null mobyle_null EBLOSUM30 EBLOSUM35 EBLOSUM40 EBLOSUM45 EBLOSUM50 EBLOSUM55 EBLOSUM60 EBLOSUM62 EBLOSUM62-12 EBLOSUM65 EBLOSUM70 EBLOSUM75 EBLOSUM80 EBLOSUM85 EBLOSUM90 EBLOSUMN EDNAFULL EDNAMAT EDNASIMPLE EPAM10 EPAM100 EPAM110 EPAM120 EPAM130 EPAM140 EPAM150 EPAM160 EPAM170 EPAM180 EPAM190 EPAM20 EPAM200 EPAM210 EPAM220 EPAM230 EPAM240 EPAM250 EPAM260 EPAM270 EPAM280 EPAM290 EPAM30 EPAM300 EPAM310 EPAM320 EPAM330 EPAM340 EPAM350 EPAM360 EPAM370 EPAM380 EPAM390 EPAM40 EPAM400 EPAM410 EPAM420 EPAM430 EPAM440 EPAM450 EPAM460 EPAM470 EPAM480 EPAM490 EPAM50 EPAM500 EPAM60 EPAM70 EPAM80 EPAM90 SSSUB ("", " -datafile=" + str(value))[value is not None and value!=vdef] 3 This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. e_additional Additional section e_gapopen Gap penalty (Positive integer) Integer ("", " -gapopen=" + str(value))[value is not None] Value greater than or equal to 0 is required value >= 0 4 12 for protein, 16 for nucleic e_gapextend Gap length penalty (Positive integer) Integer ("", " -gapextend=" + str(value))[value is not None] Value greater than or equal to 0 is required value >= 0 5 2 for protein, 4 for nucleic e_output Output section e_outfile Name of the output alignment file Filename stretcher.align ("" , " -outfile=" + str(value))[value is not None] 6 e_aformat_outfile Choose the alignment output format Choice MARKX0 FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH ("", " -aformat=" + str(value))[value is not None and value!=vdef] 6 e_outfile_out outfile_out option Alignment e_aformat_outfile in ['FASTA', 'MSF'] e_outfile e_outfile_out2 outfile_out2 option Text e_aformat_outfile in ['PAIR', 'MARKX0', 'MARKX1', 'MARKX2', 'MARKX3', 'MARKX10', 'SRS', 'SRSPAIR', 'SCORE', 'UNKNOWN', 'MULTIPLE', 'SIMPLE', 'MATCH'] e_outfile auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/shuffleseq.xml0000644000175000001560000001405112072525233015043 0ustar bneronsis shuffleseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net shuffleseq Shuffles a set of sequences maintaining composition http://bioweb2.pasteur.fr/docs/EMBOSS/shuffleseq.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:mutation sequence:protein:mutation shuffleseq e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_advanced Advanced section e_shuffle Number of shuffles Integer 1 ("", " -shuffle=" + str(value))[value is not None and value!=vdef] 2 e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename shuffleseq.e_outseq ("" , " -outseq=" + str(value))[value is not None] 3 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 4 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/etandem.xml0000644000175000001560000002542312072525233014320 0ustar bneronsis etandem EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net etandem Finds tandem repeats in a nucleotide sequence http://bioweb2.pasteur.fr/docs/EMBOSS/etandem.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:repeats etandem e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_minrepeat Minimum repeat size (Integer, 2 or higher) Integer 10 ("", " -minrepeat=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 2 is required value >= 2 2 e_maxrepeat Maximum repeat size (Integer, same as -minrepeat or higher) Integer e_minrepeat ("", " -maxrepeat=" + str(value))[value is not None] 3 Same as -minrepeat e_advanced Advanced section e_threshold Threshold score Integer 20 ("", " -threshold=" + str(value))[value is not None and value!=vdef] 4 e_mismatch Allow n as a mismatch Boolean 0 ("", " -mismatch")[ bool(value) ] 5 e_uniform Allow uniform consensus Boolean 0 ("", " -uniform")[ bool(value) ] 6 e_output Output section e_outfile Name of the report file Filename report.tan ("" , " -outfile=" + str(value))[value is not None] 7 e_rformat_outfile Choose the report output format Choice TABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 8 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile e_origfile Name of the output file (e_origfile) Filename outfile.oldtan ("" , " -origfile=" + str(value))[value is not None] 9 e_origfile_out origfile_out option TandemReport Report e_origfile auto Turn off any prompting String " -auto -stdout" 10 Programs-5.1.1/geecee.xml0000644000175000001560000000746112072525233014122 0ustar bneronsis geecee EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net geecee Calculate fractional GC content of nucleic acid sequences http://bioweb2.pasteur.fr/docs/EMBOSS/geecee.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:cpg_islands geecee e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_output Output section e_outfile Name of the output file (e_outfile) Filename geecee.e_outfile ("" , " -outfile=" + str(value))[value is not None] 2 e_outfile_out outfile_out option GeeceeReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 3 Programs-5.1.1/infoalign.xml0000644000175000001560000006233011672346320014652 0ustar bneronsis infoalign EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net infoalign Display basic information about a multiple sequence alignment http://bioweb2.pasteur.fr/docs/EMBOSS/infoalign.html http://emboss.sourceforge.net/docs/themes alignment:multiple:information infoalign e_input Input section e_sequence sequence option Alignment FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH 1,n ("", " -sequence=" + str(value))[value is not None] 1 The sequence alignment to be displayed. e_matrix Similarity scoring matrix file Choice mobyle_null mobyle_null EBLOSUM30 EBLOSUM35 EBLOSUM40 EBLOSUM45 EBLOSUM50 EBLOSUM55 EBLOSUM60 EBLOSUM62 EBLOSUM62-12 EBLOSUM65 EBLOSUM70 EBLOSUM75 EBLOSUM80 EBLOSUM85 EBLOSUM90 EBLOSUMN EDNAFULL EDNAMAT EDNASIMPLE EPAM10 EPAM100 EPAM110 EPAM120 EPAM130 EPAM140 EPAM150 EPAM160 EPAM170 EPAM180 EPAM190 EPAM20 EPAM200 EPAM210 EPAM220 EPAM230 EPAM240 EPAM250 EPAM260 EPAM270 EPAM280 EPAM290 EPAM30 EPAM300 EPAM310 EPAM320 EPAM330 EPAM340 EPAM350 EPAM360 EPAM370 EPAM380 EPAM390 EPAM40 EPAM400 EPAM410 EPAM420 EPAM430 EPAM440 EPAM450 EPAM460 EPAM470 EPAM480 EPAM490 EPAM50 EPAM500 EPAM60 EPAM70 EPAM80 EPAM90 SSSUB ("", " -matrix=" + str(value))[value is not None and value!=vdef] 2 This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. e_refseq The number or the name of the reference sequence String 0 ("", " -refseq=" + str(value))[value is not None and value!=vdef] 3 If you give the number in the alignment or the name of a sequence, it will be taken to be the reference sequence. The reference sequence is the one against which all the other sequences are compared. If this is set to 0 then the consensus sequence will be used as the reference sequence. By default the consensus sequence is used as the reference sequence. e_advanced Advanced section e_plurality Plurality check % for consensus (value from 0.0 to 100.0) Float 50.0 ("", " -plurality=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 100.0 is required value <= 100.0 4 Set a cut-off for the % of positive scoring matches below which there is no consensus. The default plurality is taken as 50% of the total weight of all the sequences in the alignment. e_identity Required % of identities at a position fro consensus (value from 0.0 to 100.0) Float 0.0 ("", " -identity=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 100.0 is required value <= 100.0 5 Provides the facility of setting the required number of identities at a position for it to give a consensus. Therefore, if this is set to 100% only columns of identities contribute to the consensus. e_output Output section e_outfile Name of the output file (e_outfile) Filename outfile.infoalign ("" , " -outfile=" + str(value))[value is not None] 6 If you enter the name of a file here then this program will write the sequence details into that file. e_outfile_out outfile_out option InfoalignReport Report e_outfile e_html Format output as an html table Boolean 0 ("", " -html")[ bool(value) ] 7 e_only Display the specified columns Boolean 0 ("", " -only")[ bool(value) ] 8 This is a way of shortening the command line if you only want a few things to be displayed. Instead of specifying: '-nohead -nousa -noname -noalign -nogaps -nogapcount -nosimcount -noidcount -nodiffcount -noweight' to get only the sequence length output, you can specify '-only -seqlength' e_heading Display column headings Boolean not e_only 0 ("", " -heading")[ bool(value) ] 9 e_usa Display the usa of the sequence Boolean not e_only 0 ("", " -usa")[ bool(value) ] 10 e_name Display 'name' column Boolean not e_only 0 ("", " -name")[ bool(value) ] 11 e_seqlength Display 'seqlength' column Boolean not e_only 0 ("", " -seqlength")[ bool(value) ] 12 e_alignlength Display 'alignlength' column Boolean not e_only 0 ("", " -alignlength")[ bool(value) ] 13 e_gaps Display number of gaps Boolean not e_only 0 ("", " -gaps")[ bool(value) ] 14 e_gapcount Display number of gap positions Boolean not e_only 0 ("", " -gapcount")[ bool(value) ] 15 e_idcount Display number of identical positions Boolean not e_only 0 ("", " -idcount")[ bool(value) ] 16 e_simcount Display number of similar positions Boolean not e_only 0 ("", " -simcount")[ bool(value) ] 17 e_diffcount Display number of different positions Boolean not e_only 0 ("", " -diffcount")[ bool(value) ] 18 e_change Display % number of changed positions Boolean not e_only 0 ("", " -change")[ bool(value) ] 19 e_weight Display 'weight' column Boolean not e_only 0 ("", " -weight")[ bool(value) ] 20 e_description Display 'description' column Boolean not e_only 0 ("", " -description")[ bool(value) ] 21 auto Turn off any prompting String " -auto -stdout" 22 Programs-5.1.1/kronaextract.xml0000644000175000001560000000754412103731725015414 0ustar bneronsis kronaextract 1.0 kronaextract kronaextract extract, from xml file obtained by rankoptimizer program, list of reads and blast offset for a given taxonomic name. C. Maufrais http://sourceforge.net/p/krona/home/krona/ Ondov BD, Bergman NH, and Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30; 12(1):385. database:search:display kronaextract inputFile Rankoptimizer xml output file KronaXMLReport Report XML file with krona Specification. ' -i ' + str(value) 10 Options rankoptimizerOptions taxoName Taxonomic name needed to extract list of informations (-n). String ('', ' -n ' + str(value) )[value is not None] 20 splitOut Split output file in two files with the given prefix name. Boolean 0 The output file is split in two files: one contain reads names and the other contain corresponding taxoptimizer's line offset. ('', ' -s rankoptimizer' )[value] 30 offsettfile Offset numbers OffsetReport Report splitOut "*.offset" queryoutfile Queries name QueryNameReport Report splitOut "*.seq" OutputFile Output file Report "kronaextract.out" not splitOut 90 ' -o kronaextract.out' Programs-5.1.1/fuzztran.xml0000644000175000001560000003262312072525233014566 0ustar bneronsis fuzztran EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net fuzztran Search for patterns in protein sequences (translated) http://bioweb2.pasteur.fr/docs/EMBOSS/fuzztran.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:motifs sequence:protein:motifs fuzztran e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_pattern Search pattern Protein Pattern AbstractText ("", " -pattern=@" + str(value))[value is not None] 2 The standard IUPAC one-letter codes for the amino acids are used. The symbol 'x' is used for a position where any amino acid is accepted. Ambiguities are indicated by listing the acceptable amino acids for a given position, between square parentheses '[ ]'. For example: [ALT] stands for Ala or Leu or Thr. Ambiguities are also indicated by listing between a pair of curly brackets '{ }' the amino acids that are not accepted at a gven position. For example: {AM} stands for any amino acid except Ala and Met. Each element in a pattern is separated from its neighbor by a '-'. (Optional in fuzztran) Repetition of an element of the pattern can be indicated by following that element with a numerical value or a numerical range between parenthesis. Examples: x(3) corresponds to x-x-x, x(2,4) corresponds to x-x or x-x-x or x-x-x-x. When a pattern is restricted to either the N- or C-terminal of a sequence, that pattern either starts with a '<' symbol or respectively ends with a '>' symbol. A period ends the pattern. (Optional in fuzztran). For example, [DE](2)HS{P}X(2)PX(2,4)C e_pmismatch Search pattern Integer 0 ("", " -pmismatch=" + str(value))[value is not None and value!=vdef] 3 e_additional Additional section e_frame Translation frames Choice 1 1 2 3 F -1 -2 -3 R 6 ("", " -frame=" + str(value))[value is not None and value!=vdef] 4 e_table Genetic codes Choice 0 0 1 2 3 4 5 6 9 10 11 12 13 14 15 16 21 22 23 ("", " -table=" + str(value))[value is not None and value!=vdef] 5 e_output Output section e_outfile Name of the report file Filename fuzztran.report ("" , " -outfile=" + str(value))[value is not None] 6 e_rformat_outfile Choose the report output format Choice TABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 7 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 8 Programs-5.1.1/lindna.xml0000644000175000001560000005134711672346320014157 0ustar bneronsis lindna EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net lindna Draws linear maps of DNA constructs http://bioweb2.pasteur.fr/docs/EMBOSS/lindna.html http://emboss.sourceforge.net/docs/themes display lindna e_input Input section e_infile Commands to the lindna drawing program file LindnaMappingCommands AbstractText ("", " -infile=" + str(value))[value is not None ] 1 e_additional Additional section e_maxgroups Maximum number of groups (value greater than or equal to 1) Integer 20 ("", " -maxgroups=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 2 e_maxlabels Maximum number of labels (value greater than or equal to 1) Integer 10000 ("", " -maxlabels=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 3 e_output Output section e_ruler Draw a ruler Boolean 1 (" -noruler", "")[ bool(value) ] 4 e_blocktype Type of blocks Choice Filled Open Filled Outline ("", " -blocktype=" + str(value))[value is not None and value!=vdef] 5 e_intersymbol Type of junctions between blocks Choice 1 1 2 3 4 ("", " -intersymbol=" + str(value))[value is not None and value!=vdef] 6 e_intercolour Colour of junctions between blocks (enter a colour number) (value from 0 to 15) Integer 1 ("", " -intercolour=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 Value less than or equal to 15 is required value <= 15 7 e_interticks Horizontal junctions between ticks Boolean 0 ("", " -interticks")[ bool(value) ] 8 e_gapsize Interval between ticks in the ruler (value greater than or equal to 0) Integer 500 ("", " -gapsize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 9 e_ticklines Vertical lines at the ruler's ticks Boolean 0 ("", " -ticklines")[ bool(value) ] 10 e_textheight Height of text multiplier (value greater than or equal to 0.0) Float 1.0 ("", " -textheight=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 11 Height of text. Enter a number <1.0 or >1.0 to decrease or increase the size, respectively e_textlength Length of text multiplier (value greater than or equal to 0.0) Float 1.0 ("", " -textlength=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 12 Length of text. Enter a number <1.0 or >1.0 to decrease or increase the size, respectively e_margin Width of left margin multiplier (value greater than or equal to 0.0) Float 1.0 ("", " -margin=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 13 Width of left margin. This is the region left to the groups where the names of the groups are displayed. Enter a number <1.0 or >1.0 to decrease or increase the size, respectively e_tickheight Height of ticks multiplier (value greater than or equal to 0.0) Float 1.0 ("", " -tickheight=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 14 Height of ticks. Enter a number <1.0 or >1.0 to decrease or increase the size, respectively e_blockheight Height of blocks multilpier (value greater than or equal to 0.0) Float 1 ("", " -blockheight=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 15 Height of blocks. Enter a number <1.0 or >1.0 to decrease or increase the size, respectively e_rangeheight Height of range ends multiplier (value greater than or equal to 0.0) Float 1.0 ("", " -rangeheight=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 16 Height of range ends. Enter a number <1.0 or >1.0 to decrease or increase the size, respectively e_gapgroup Space between groups mutliplier (value greater than or equal to 0.0) Float 1.0 ("", " -gapgroup=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 17 Space between groups. Enter a number <1.0 or >1.0 to decrease or increase the size, respectively e_postext Space between text and ticks, blocks, and ranges multiplier (value greater than or equal to 0.0) Float 1.0 ("", " -postext=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 18 Space between text and ticks, blocks, and ranges. Enter a number <1.0 or >1.0 to decrease or increase the size, respectively e_graphout Choose the e_graphout output format Choice png png gif cps ps meta data (" -graphout=" + str(vdef), " -graphout=" + str(value))[value is not None and value!=vdef] 19 e_goutfile Name of the output graph Filename lindna_graph ("" , " -goutfile=" + str(value))[value is not None] 20 outgraph_png Graph file Picture Binary e_graphout == "png" "*.png" outgraph_gif Graph file Picture Binary e_graphout == "gif" "*.gif" outgraph_ps Graph file PostScript Binary e_graphout == "ps" or e_graphout == "cps" "*.ps" outgraph_meta Graph file Picture Binary e_graphout == "meta" "*.meta" outgraph_data Graph file Text e_graphout == "data" "*.dat" auto Turn off any prompting String " -auto -stdout" 21 Programs-5.1.1/merger.xml0000644000175000001560000005157112072525233014167 0ustar bneronsis merger EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net merger Merge two overlapping sequences http://bioweb2.pasteur.fr/docs/EMBOSS/merger.html http://emboss.sourceforge.net/docs/themes alignment:consensus merger e_input Input section e_asequence asequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -asequence=" + str(value))[value is not None] 1 e_bsequence bsequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -bsequence=" + str(value))[value is not None] 2 e_datafile Matrix file Choice mobyle_null mobyle_null EBLOSUM30 EBLOSUM35 EBLOSUM40 EBLOSUM45 EBLOSUM50 EBLOSUM55 EBLOSUM60 EBLOSUM62 EBLOSUM62-12 EBLOSUM65 EBLOSUM70 EBLOSUM75 EBLOSUM80 EBLOSUM85 EBLOSUM90 EBLOSUMN EDNAFULL EDNAMAT EDNASIMPLE EPAM10 EPAM100 EPAM110 EPAM120 EPAM130 EPAM140 EPAM150 EPAM160 EPAM170 EPAM180 EPAM190 EPAM20 EPAM200 EPAM210 EPAM220 EPAM230 EPAM240 EPAM250 EPAM260 EPAM270 EPAM280 EPAM290 EPAM30 EPAM300 EPAM310 EPAM320 EPAM330 EPAM340 EPAM350 EPAM360 EPAM370 EPAM380 EPAM390 EPAM40 EPAM400 EPAM410 EPAM420 EPAM430 EPAM440 EPAM450 EPAM460 EPAM470 EPAM480 EPAM490 EPAM50 EPAM500 EPAM60 EPAM70 EPAM80 EPAM90 SSSUB ("", " -datafile=" + str(value))[value is not None and value!=vdef] 3 This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. e_additional Additional section e_gapopen Gap opening penalty (value from 0.0 to 100.0) Float ("", " -gapopen=" + str(value))[value is not None] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 100.0 is required value <= 100.0 4 e_gapextend Gap extension penalty (value from 0.0 to 10.0) Float ("", " -gapextend=" + str(value))[value is not None] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 10.0 is required value <= 10.0 5 e_output Output section e_outfile Name of the output alignment file Filename merger.align ("" , " -outfile=" + str(value))[value is not None] 6 e_aformat_outfile Choose the alignment output format Choice SIMPLE FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH ("", " -aformat=" + str(value))[value is not None and value!=vdef] 6 e_outfile_out outfile_out option Alignment e_aformat_outfile in ['FASTA', 'MSF'] e_outfile e_outfile_out2 outfile_out2 option Text e_aformat_outfile in ['PAIR', 'MARKX0', 'MARKX1', 'MARKX2', 'MARKX3', 'MARKX10', 'SRS', 'SRSPAIR', 'SCORE', 'UNKNOWN', 'MULTIPLE', 'SIMPLE', 'MATCH'] e_outfile e_outseq Name of the output sequence file (e_outseq) Filename merger.e_outseq ("" , " -outseq=" + str(value))[value is not None] 7 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 8 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 9 Programs-5.1.1/rnaplot.xml0000644000175000001560000001211711435733022014355 0ustar bneronsis rnaplot 1.8.4 RNAplot Draw RNA Secondary Structures Hofacker, Fontana, Bonhoeffer, Stadler I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994) Fast Folding and Comparison of RNA Secondary Structures. Monatshefte f. Chemie 125: 167-188 A. Walter, D Turner, J Kim, M Lyttle, P Muller, D Mathews, M Zuker Coaxial stacking of helices enhances binding of oligoribonucleotides. PNAS, 91, pp 9218-9222, 1994 M. Zuker, P. Stiegler (1981) Optimal computer folding of large RNA sequences using thermodynamic and auxiliary information, Nucl Acid Res 9: 133-148 J.S. McCaskill (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structures, Biopolymers 29: 11051119 D.H. Turner N. Sugimoto and S.M. Freier (1988) RNA structure prediction, Ann Rev Biophys Biophys Chem 17: 167-192 D. Adams (1979) The hitchhiker's guide to the galaxy, Pan Books, London http://bioweb2.pasteur.fr/gensoft/sequence/nucleic/2D_structure.html#ViennaRNa RNAplot reads RNA sequences and structures from stdin in the format as produced by RNAfold and produces drawings of the secondary structure graph. The coordinates are produced using either E. Bruccolerias naview routines, or a simple radial layout method. sequence:nucleic:2D_structure RNAplot seqin RNA sequences and structures from output in the format as produced by RNAfold RnafoldOutput AbstractText " <$value" " <"+str(value) 1000 layout Choose the layout algorithm Choice 1 1 0 (defined $value and $value ne $vdef)? " -t $value" : "" ( "" , " -t " + str(value) )[ value is not None and value != vdef] Choose the layout algorithm. Simple radial layout if 0, or naview if 1. Default is 1. outformat Specify output format Choice ps ps gml xrna svg (defined $value and $value ne $vdef)? " -o $value" : "" ( "" , " -o " + str(value) )[ value is not None and value != vdef] output_options Output options outfile Result file RnaplotdOutput AbstractText "rna.*" "rna.*" Programs-5.1.1/chaos.xml0000644000175000001560000001577312072525233014007 0ustar bneronsis chaos EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net chaos Draw a chaos game representation plot for a nucleotide sequence http://bioweb2.pasteur.fr/docs/EMBOSS/chaos.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:composition chaos e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_output Output section e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 2 e_goutfile Name of the output graph Filename chaos_graph ("" , " -goutfile=" + str(value))[value is not None] 3 outgraph_png Graph file Picture Binary e_graph == "png" "*.png" outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" outgraph_data Graph file Text e_graph == "data" "*.dat" auto Turn off any prompting String " -auto -stdout" 4 Programs-5.1.1/hmmconvert.xml0000644000175000001560000001610711444656032015070 0ustar bneronsis hmmconvert HMMCONVERT Convert profile HMM file to a HMMER format hmmconvert reads an HMM file from oldhmmfile in any HMMER format, and writes it to a new file newhmmfile in a new format. oldhmmfile and newhmmfile must be different files; you can't reliably overwrite the old file. By default, the new HMM file is written in HMMER 2 ASCII format. hmm:building hmmconvert oldhmmfile Old HMM ascii file HmmProfile AbstractText not $oldhmmfile oldhmmfile is not None " $oldhmmfile" " " + str(oldhmmfile) Do not enter ascii and bin files at the same time not defined $oldbinfile oldbinfile is None 2 oldbinfile Old HMM binary file HmmProfileBin Binary not $oldhmmfile oldbinfile is not None " $oldbinfile" " " + str(oldbinfile) Do not enter ascii and bin files at the same time not defined $oldhmmfile oldhmmfile is None 2 advanced Advanced options 1 new_format New format Choice -a -a -b -2 ($value)? " $value":"" " "+str(value) 1 outfmt Choose output legacy 3.x file formats by name Choice $new_format ne '-2' new_format != '-2' 3/b 3/b 3/a ($value ne $vdef)? " --outfmt $value":"" ("", " --outfmt "+str(value))[value !=vdef] Output in a HMMER3 ASCII text format other then the most current one. Valid choices for the value are '3/b' or '3/a'. The current format is '3/b', and this is the default. There is a slightly different format '3/a' that was used in some alpha test code. 1 result_file Hmm profile HmmProfile AbstractText $new_format eq '-a' or $new_format eq '-2' new_format == '-a' or new_format == '-2' (defined $oldhmmfile)? "> $oldhmmfile.convert": "> $oldbinfile.convert" ("> " + str(oldbinfile) + ".convert", "> " + str(oldhmmfile) + ".convert")[oldhmmfile is not None] 3 (defined $oldhmmfile)? "$oldhmmfile.convert": "$oldbinfile.convert" (str(oldbinfile) + ".convert" , str(oldhmmfile) + ".convert")[oldhmmfile is not None] result_bin_file Hmm profile (binary) HmmProfileBin Binary $new_format eq '-b' new_format == '-b' (defined $oldhmmfile)? "> $oldhmmfile.bin": "> $oldbinfile.bin" ("> " + str(oldbinfile) + ".bin", "> " + str(oldhmmfile) + ".bin")[oldhmmfile is not None] 3 (defined $oldhmmfile)? "$oldhmmfile.bin":"oldbinfile.bin" (str(oldbinfile) + ".bin" , str(oldhmmfile) + ".bin")[oldhmmfile is not None] Programs-5.1.1/syco.xml0000644000175000001560000012140112072525233013651 0ustar bneronsis syco EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net syco Draw synonymous codon usage statistic plot for a nucleotide sequence http://bioweb2.pasteur.fr/docs/EMBOSS/syco.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:codon_usage sequence:nucleic:gene_finding syco e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_cfile cfile option Choice mobyle_null mobyle_null Eacc.cut Eacica.cut Eadenovirus5.cut Eadenovirus7.cut Eagrtu.cut Eaidlav.cut Eanasp.cut Eani.cut Eani_h.cut Eanidmit.cut Earath.cut Easn.cut Eath.cut Eatu.cut Eavi.cut Eazovi.cut Ebacme.cut Ebacst.cut Ebacsu.cut Ebacsu_high.cut Ebja.cut Ebly.cut Ebme.cut Ebmo.cut Ebna.cut Ebommo.cut Ebov.cut Ebovin.cut Ebovsp.cut Ebpphx.cut Ebraja.cut Ebrana.cut Ebrare.cut Ebst.cut Ebsu.cut Ebsu_h.cut Ecac.cut Ecaeel.cut Ecal.cut Ecanal.cut Ecanfa.cut Ecaucr.cut Eccr.cut Ecel.cut Echi.cut Echick.cut Echicken.cut Echisp.cut Echk.cut Echlre.cut Echltr.cut Echmp.cut Echnt.cut Echos.cut Echzm.cut Echzmrubp.cut Ecloab.cut Ecpx.cut Ecre.cut Ecrigr.cut Ecrisp.cut Ectr.cut Ecyapa.cut Edayhoff.cut Eddi.cut Eddi_h.cut Edicdi.cut Edicdi_high.cut Edog.cut Edro.cut Edro_h.cut Edrome.cut Edrome_high.cut Edrosophila.cut Eeca.cut Eeco.cut Eeco_h.cut Eecoli.cut Eecoli_high.cut Eemeni.cut Eemeni_high.cut Eemeni_mit.cut Eerwct.cut Ef1.cut Efish.cut Efmdvpolyp.cut Ehaein.cut Ehalma.cut Ehalsa.cut Eham.cut Ehha.cut Ehin.cut Ehma.cut Ehorvu.cut Ehum.cut Ehuman.cut Ekla.cut Eklepn.cut Eklula.cut Ekpn.cut Elacdl.cut Ella.cut Elyces.cut Emac.cut Emacfa.cut Emaize.cut Emaize_chl.cut Emam_h.cut Emammal_high.cut Emanse.cut Emarpo_chl.cut Emedsa.cut Emetth.cut Emixlg.cut Emouse.cut Emsa.cut Emse.cut Emta.cut Emtu.cut Emus.cut Emussp.cut Emva.cut Emyctu.cut Emze.cut Emzecp.cut Encr.cut Eneigo.cut Eneu.cut Eneucr.cut Engo.cut Eoncmy.cut Eoncsp.cut Eorysa.cut Eorysa_chl.cut Epae.cut Epea.cut Epet.cut Epethy.cut Epfa.cut Ephavu.cut Ephix174.cut Ephv.cut Ephy.cut Epig.cut Eplafa.cut Epolyomaa2.cut Epombe.cut Epombecai.cut Epot.cut Eppu.cut Eprovu.cut Epse.cut Epseae.cut Epsepu.cut Epsesm.cut Epsy.cut Epvu.cut Erab.cut Erabbit.cut Erabit.cut Erabsp.cut Erat.cut Eratsp.cut Erca.cut Erhile.cut Erhime.cut Erhm.cut Erhoca.cut Erhosh.cut Eric.cut Erle.cut Erme.cut Ersp.cut Esalsa.cut Esalsp.cut Esalty.cut Esau.cut Eschma.cut Eschpo.cut Eschpo_cai.cut Eschpo_high.cut Esco.cut Eserma.cut Esgi.cut Esheep.cut Eshp.cut Eshpsp.cut Esli.cut Eslm.cut Esma.cut Esmi.cut Esmu.cut Esoltu.cut Esoy.cut Esoybn.cut Espi.cut Espiol.cut Espn.cut Espo.cut Espo_h.cut Espu.cut Esta.cut Estaau.cut Estrco.cut Estrmu.cut Estrpn.cut Estrpu.cut Esty.cut Esus.cut Esv40.cut Esyhsp.cut Esynco.cut Esyncy.cut Esynsp.cut Etbr.cut Etcr.cut Eter.cut Etetsp.cut Etetth.cut Etheth.cut Etob.cut Etobac.cut Etobac_chl.cut Etobcp.cut Etom.cut Etrb.cut Etrybr.cut Etrycr.cut Evco.cut Evibch.cut Ewheat.cut Ewht.cut Exel.cut Exenla.cut Exenopus.cut Eyeast.cut Eyeast_cai.cut Eyeast_high.cut Eyeast_mit.cut Eyeastcai.cut Eyen.cut Eyeren.cut Eyerpe.cut Eysc.cut Eysc_h.cut Eyscmt.cut Eysp.cut Ezebrafish.cut Ezma.cut ("", " -cfile=" + str(value))[value is not None and value!=vdef] 2 Codon usage file e_advanced Advanced section e_window Averaging window Integer 30 ("", " -window=" + str(value))[value is not None and value!=vdef] 3 e_uncommon Show common codon usage Boolean 0 ("", " -uncommon")[ bool(value) ] 4 e_minimum Minimum value for a common codon (value from 0.0 to .99) Float .15 ("", " -minimum=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to .99 is required value <= .99 5 e_output Output section e_plot Produce plot Boolean 1 (" -noplot", "")[ bool(value) ] 6 e_graph Choose the e_graph output format Choice e_plot png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 7 xy_goutfile Name of the output graph Filename e_plot syco_xygraph ("" , " -goutfile=" + str(value))[value is not None] 8 xy_outgraph_png Graph file Picture Binary e_plot and e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_plot and e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_plot and e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_plot and e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_plot and e_graph == "data" "*.dat" e_outfile Name of the output file (e_outfile) Filename not e_plot syco.e_outfile ("" , " -outfile=" + str(value))[value is not None] 9 e_outfile_out outfile_out option SycoReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 10 Programs-5.1.1/phiblast.xml0000644000175000001560000007603011767572177014534 0ustar bneronsis phiblast PHI-Blast Pattern-Hit Initiated BLAST R. Baeza-Yates and G. Gonnet, Communications of the ACM 35(1992), pp. 74-82. S. Wu and U. Manber, Communications of the ACM 35(1992), pp. 83-91. database:search:homology database:search:pattern phiblast Program (-p) Choice blastpgp blastpgp patseedp seedp "blastpgp -p $value" "blastpgp -p " + str(value) PHI-BLAST (Pattern-Hit Initiated BLAST) is a search program that combines matching of regular expressions with local alignments surrounding the match. The calculation of local alignments is done using a method very similar to (and much of the same code as) gapped BLAST. Program modes: . patseedp: normal phiblast mode . seedp: Restrict the search for local alignments to a subset of the pattern occurrences in the query. This program option requires the user to specify the location(s) of the interesting pattern occurrence(s) in the pattern file (for the syntax see below). When there are multiple pattern occurrences in the query it may be important to decide how many are of interest because the E-value for matches is effectively multiplied by the number of interesting pattern occurrences. query Sequence File (-i) Sequence FASTA " -i $query" " -i " + str(value) 3 start_region Start of required region in query (-S) Integer 1 (defined $value and $value != $vdef)? " -S $value" : "" ( "" , " -S " + str(value) )[ value is not None and value != vdef] 5 end_region End of required region in query (-H) Integer -1 (defined $value and $value != $vdef)? " -H $value" : "" ( "" , " -H " + str(value) )[ value is not None and value != vdef] 5 Location on query sequence. -1 indicates end of query pattern Pattern file- Prosite syntax (-k) PrositePattern AbstractText " -k $value" " -k " + str(value) 3 Given a protein sequence S and a regular expression pattern P occurring in S, PHI-BLAST helps answer the question: What other protein sequences both contain an occurrence of P and are homologous to S in the vicinity of the pattern occurrences? Rules for pattern syntax: The syntax for patterns in PHI-BLAST follows the conventions of PROSITE. When using the stand-alone program, it is permissible to have multiple patterns in a file separated by a blank line between patterns. Valid protein characters for PHI-BLAST patterns: ABCDEFGHIKLMNPQRSTVWXYZU Other useful delimiters: [ ] means any one of the characters enclosed in the brackets e.g., [LFYT] means one occurrence of L or F or Y or T - means nothing (this is a spacer character used by PROSITE) x with nothing following means any residue x(5) means 5 positions in which any residue is allowed (and similarly for any other single number in parentheses after x) x(2,4) means 2 to 4 positions where any residue is allowed, and similarly for any other two numbers separated by a comma; the first number should be < the second number. IP PATTERN PA [LIVM]-x-D-x(2)-[GA]-[NQS]-K-G-T-G-x-W protein_db Protein database (-d) Choice null null " -d $value" " -d " + str(value) 2 scoring Scoring options 4 open_a_gap Cost to open a gap (-G) Integer 11 (defined $value and $value != $vdef) ? " -G $value" : "" ( "" , " -G " + str(value) )[ value is not None and value != vdef] extend_a_gap Cost to extend a gap (-E) Integer 1 (defined $value and $value != $vdef) ? " -E $value" : "" ( "" , " -E " + str(value) )[ value is not None and value != vdef] Limited values for gap existence and extension are supported for these three programs. Some supported and suggested values are: Existence Extension 10 -- 1 10 -- 2 11 -- 1 8 -- 2 9 -- 2 (source: NCBI Blast page) matrix Similarity matrix (-M) Choice BLOSUM62 BLOSUM45 BLOSUM80 BLOSUM62 PAM30 PAM70 (defined $value and $value ne $vdef)? " -M $value" : "" ( "" , " -M " + str(value) )[ value is not None and value != vdef] filter_opt Filtering and masking options 5 This options also takes a string as an argument. One may use such a string to change the specific parameters of seg or invoke other filters. Please see the 'Filtering Strings' section (below) for details. filter Filter query sequence with SEG (-F) Boolean 0 ($value) ? " -F T" : "" ( "" , " -F T" )[ value ] lower_case Use lower case filtering (-U) Boolean 0 ($value) ? " -U T" : "" ("", " -U T")[value] This option specifies that any lower-case letters in the input FASTA file should be masked. selectivity_opt Selectivity options 5 Expect Expected value (-e) Float 10 (defined $value and $value != $vdef)? " -e $value":"" ("" , " -e " + str(value))[ value is not None and value != vdef] The statistical significance threshold for reporting matches against database sequences; the default value is 10, such that 10 matches are expected to be found merely by chance, according to the stochastic model of Karlin and Altschul (1990). If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Lower EXPECT thresholds are more stringent, leading to fewer chance matches being reported. Fractional values are acceptable. word_size Word Size (-W) Integer (defined $value) ? " -W $value" : "" ("" , " -W "+str(value))[value is not None] Valid wordsize range is 2 to 3 $value >= 2 and $value <=3 value >= 2 and value <=3 Use words of size N. Zero invokes default behavior Default value: 3 window Multiple hits window size (-A) Integer 40 (defined $value and $value != $vdef)? " -A $value" : "" ( "" , " -A " + str(value) )[ value is not None and value != vdef] When multiple hits method is used, this parameter defines the distance from last hit on the same diagonal to the new one. Zero means single hit algorithm. extend_hit Threshold for extending hits (-f) Integer 11 (defined $value and $value != $vdef)? " -f $value" : "" ( "" , " -f " + str(value) )[ value is not None and value !=vdef] Blast seeks first short word pairs whose aligned score reaches at least this value dropoff X dropoff value for gapped alignment (-X) Integer (defined $value)? " -X $value":"" ("" , " -X " + str(value))[ value is not None ] This is the value that control the path graph region explored by Blast during a gapped extension (Xg in the NAR paper). dropoff_z X dropoff value for final gapped alignment (-Z) Integer 25 (defined $value and $value != $vdef)? " -Z $value" : "" ( "" , " -Z " + str(value) )[ value is not None and value != vdef] This parameter controls the dropoff for the final reported alignment. See also the -X parameter. dropoff_y Dropoff for blast ungapped extensions in bits (-y) Float 7.0 (defined $value and $value != $vdef) ? " -y $value" : "" ( "" , " -y " + str(value) )[ value is not None and value != vdef] This parameter controls the dropoff at ungapped extension stage. See also the -X parameter. eff_len Effective length of the search space (-Y) Integer 0 (defined $value and $value != $vdef) ? " -Y $value" : "" ("" , " -Y "+str(value))[value is not None and value !=vdef] Use zero for the real size keep_hits Number of best hits from a region to keep (-K) Integer (defined $value) ? " -K $value" : "" ("" , " -K "+str(value))[value is not None] If this option is used, a value of 100 is recommended. mode Single-hit or multiple-hit mode (-P) Choice 0 0 1 ($value eq "0") ? " -P $value" : "" ("" , " -P "+str(value))[value != "0"] nb_bits Number of bits to trigger gapping (-N) Integer 22 (defined $value and $value != $vdef) ? " -N $value" : "" ("" , " -N "+str(value))[value is not None and value !=vdef] phi_spec_opt PHI-Blast specific selectivity options 5 multipass Maximum number of passes to use in multipass version (-j) Integer 1 (defined $value and $value != $vdef) ? " -j $value" : "" ("" , " -j "+str(value))[value is not None and value !=vdef] pseudocounts Constant in pseudocounts for multipass version (-c) Integer 9 (defined $value and $value != $vdef) ? " -c $value" : "" ("" , " -c "+str(value))[value is not None and value !=vdef] e_threshold e-value threshold for inclusion in multipass model (-h) Float 0.002 (defined $value and $value != $vdef) ? " -h $value" : "" ("" , " -h "+str(value))[value is not None and value !=vdef] affichage Report options 5 Descriptions Number of one-line descriptions to show? (-v) Integer 500 (defined $value and $value != $vdef) ? " -v $value" : "" ( "" , " -v " + str(value) )[ value is not None and value != vdef] Maximum number of database sequences for which one-line descriptions will be reported. Alignments Number of database sequences to show alignments? (-b) Integer 250 (defined $value and $value != $vdef) ? " -b $value" : "" ( "" , " -b " + str(value) )[ value is not None and value != vdef] Maximum number of database sequences for which high-scoring segment pairs will be reported (-b). view_alignments Alignment view options (-m) Choice 0 0 1 2 3 4 5 6 7 8 (defined $value and $value ne $vdef)? " -m $value" : "" ( "" , " -m " + str(value) )[ value is not None and value != vdef] txtoutput Text output String $view_alignments ne "7" view_alignments != "7" " -o phiblast.txt" " -o phiblast.txt" 10 xmloutput Xml output String $view_alignments eq "7" view_alignments == "7" " -o phiblast.xml" " -o phiblast.xml" 10 htmloutput Html output Boolean $view_alignments !~ /^[78]$/ view_alignments not in [ "7" , "8" ] 1 ($value) ? " && html4blast -g -o phiblast.html phiblast.txt" : "" ("" , " && html4blast -g -o phiblast.html phiblast.txt")[value] 11 believe Believe the query defline (-J) Boolean 0 ($value)? " -J":"" ("" , " -J")[ value ] seqalign_file SeqAlign file (-J option must be true) (-O) Filename $believe believe (defined $value)? " -O $value" : "" ( "" , " -O " + str(value) )[ value is not None ] SeqAlign is in ASN.1 format, so that it can be read with NCBI tools (such as sequin). This allows one to view the results in different formats. txtfile Blast text report BlastTextReport Report $view_alignments ne "7" view_alignments != "7" "phiblast.txt" "phiblast.txt" xmlfile Blast xml report BlastXmlReport Report $view_alignments eq "7" view_alignments == "7" "phiblast.xml" "phiblast.xml" htmlfile Blast html report BlastHtmlReport Report $view_alignments !~ /^[78]$/ view_alignments not in ["7", "8"] "phiblast.html" "phiblast.html" imgfile Picture Binary $view_alignments !~ /^[78]$/ view_alignments not in ["7", "8"] "*.png" "*.gif" "*.png" "*.gif" Programs-5.1.1/mreps.xml0000644000175000001560000002316011441651470014027 0ustar bneronsis mreps 2.5 mreps Algorithm for finding tandem repeats in DNA sequences G. Kucherov R. Kolpakov, G. Kucherov, Finding maximal repetitions in a word in linear time, 1999 Symposium on Foundations of Computer Science (FOCS), New-York (USA), pp. 596-604, IEEE Computer Society R. Kolpakov, G. Kucherov, Finding Approximate Repetitions under Hamming Distance, 9-th European Symposium on Algorithms (ESA), Aarhus (Denmark), Lecture Notes in Computer Science, vol. 2161, pp 170-181. http://bioinfo.lifl.fr/mreps/ http://bioinfo.lifl.fr/mreps/ http://bioinfo.lifl.fr/mreps/ sequence:nucleic:repeats mreps query Query Sequence file DNA Sequence FASTA " -fasta $value" " -fasta "+str(value) 20 err Specifies the resolution (-res) Integer (defined $value ) ? " -res $value" : "" ( "" , " -res " + str(value) )[ value is not None] 10 Integer value from_v Specifies starting position (-from) Integer (defined $value) ? " -from $value" : "" ( "" , " -from " + str(value) )[ value is not None ] 10 Integer value to Specifies end position (-to) Integer (defined $value) ? " -to $value" : "" ( "" , " -to " + str(value) )[ value is not None ] End position must be greater or equal to the starting position $value >= $from_v value >= from_v 10 Integer value win Processes by sliding windows of size 2*n overlaping by n (-win) Integer (defined $value) ? " -win $value" : "" ( "" , " -win " + str(value) )[ value is not None ] 10 Integer value minsize Report repetitions whose size is at least n (-minsize) Integer (defined $value) ? " -minsize $value" : "" ( "" , " -minsize " + str(value) )[ value is not None ] 10 Integer value maxsize Report repetitions whose size is at most n (-maxsize) Integer (defined $value) ? " -maxsize $value" : "" ( "" , " -maxsize " + str(value) )[ value is not None ] Maximal size must be greater or equal to the minimal size $value >= $minsize value >= minsize 10 Integer value minperiod Report repetitions whose period is at least n (-minperiod) Integer (defined $value) ? " -minperiod $value" : "" ( "" , " -minperiod " + str(value) )[ value is not None ] 10 Integer value maxperiod Report repetitions whose period is at most n (-maxperiod) Integer (defined $value) ? " -maxperiod $value" : "" ( "" , " -maxperiod " + str(value) )[ value is not None ] Maximal period must be greater or equal to the minimal period $value >= $minperiod value >= minperiod 10 Integer value exp Report repetitions whose exponent is at least n (-exp) Float (defined $value) ? " -exp $value" : "" ( "" , " -exp " + str(value) )[ value is not None ] You must give a float value greater than or equal to 1.0 $value >= 1.0 value >= 1.0 10 Float value greater tha 1.0 allowsmall Output small repeats that can occur randomly (-allowsmall) Boolean 0 ( $value) ? " -allowsmall" : "" ( "" , " -allowsmall" )[ value ] 10 noprint Do not output repetitions sequences (-noprint) Boolean 0 ( $value) ? " -noprint" : "" ( "" , " -noprint" )[ value ] 10 xml XML format output file name (-xmloutput) Filename (defined $value) ? " -xmloutput $value" : "" ( "" , " -xmloutput " + str(value) )[ value is not None ] 10 xmlout XML output file MrepsXmlReport Report $xml xml is not None 10 $xml str(xml) Programs-5.1.1/cap3.xml0000644000175000001560000011503011672710655013534 0ustar bneronsis cap3 3 CAP3 Contig Assembly Program Huang, X. and Madan, A. (1999) Huang, X. and Madan, A. (1999) CAP3: A DNA Sequence Assembly Program. Genome Research, 9: 868-877. http://seq.cs.iastate.edu/ http://seq.cs.iastate.edu/cap3.html assembly:assembly cap3 seq File of reads DNA Sequence FASTA 2,n " $value" " " + str( value ) 1 qual_file Quality value file BaseQuality AbstractText PhrapQuality (defined $value?)"ln -sf $value $seq.qual" : "" ("" , "ln -sf %s %s.qual && " %( value , seq ) )[value is not None and value != seq + ".qual" ] CAP3 uses the same format of a quality file as Phrap. The sequence file and the corresponding quality file must be arranged in the same order in terms of reads, where for each read, the same name must be used in both files and the number of bases must be equal to the number of quality values. -10 con_file Constraint file CapConstraint AbstractText (defined $value?)"ln -sf $value $seq.con" : "" ( "" , "ln -sf %s %s.con && " %( value , seq ) )[value is not None ] Each line of the constraint file specifies one forward-reverse constraint of the form: ReadA ReadB MinDistance MaxDistance where ReadA and ReadB are names of two reads, and MinDistance and MaxDistance are distances (integers) in base pairs. The constraint is satisfied if ReadA in forward orientation occurs in a contig before ReadB in reverse orientation, or ReadB in forward orientation occurs in a contig before ReadA in reverse orientation, and their distance is between MinDistance and MaxDistance. CAP3 works better if a lot more constraints are used. -5 clipping_poor_regions Clipping of poor regions

CAP3 computes clipping positions of each read using both base quality values and similarity information. Clipping of a poor end region of a read f is controlled by three parameters: quality value cutoff qualcut, clipping range crange, and depth of good coverage gdepth. The value for qualcut can specified with the "-c" option, the value for crange with the "-y" option, and the value for gdepth with the "-z" option.

If there are quality values, CAP3 computes two positions qualpos5 and qualpos3 of read f such that the region of read f from position qualpos5 to position qualpos3 consists mostly of quality values greater than qualcut. If there are no quality values, then qualpos5 is set to 1 and qualpos3 is set the length of read f. The range for the left clipping position of read f is from 1 to qualpos5 + crange. The range for the right clipping position of read f is from qualpos3 - crange to the end of read f. The minimum depth of good coverage at the left and right clipping positions of read f is expected to be gdepth.

Let realdepth5 be the maximum real depth of coverage for the initial region of read f ending at position qualpos5 + crange. Let depth5 be the smaller of realdepth5 and gdepth. If depth5 is 0, then left clipping position of read f is set to qualpos5 by CAP3. The given value for the parameter crange may be too small for read f. CAP3 reports at the start of a .info file that "No overlap is found in the given 5' clipping range for read f." If there are overlaps beyond the given 5' clipping range for read f, CAP3 reports a new clipping range for each overlap. One of the reported range values can be used as a new value for the parameter crange for read f.

If depth5 is greater than 0, the left clipping position of read f is the smallest position x such that x is less than qualpos5 + crange and the region of read f beginning at position x is similar to depth5 other reads. The right clipping position of read f is computed similarly by CAP3. Larger values for the parameters crange and gdepth result in more aggressive clipping of poor end regions. A larger value for crange allows CAP3 to search for the left clipping position in a larger area. A larger value for gdepth may cause CAP3 to clip more bases so that the resulting good portion of read f is similar to more reads.

The user may provide specific values for the parameters crange and gdepth for individual reads in a file. Each line in the file has the following format: file has the following format:

ReadName     crange5     gdepth5      crange3     gdepth3

where ReadName is the name of a read, crange5 & gdepth5 are values for the 5' end, and crange3 & gdepth3 are for the 3' end.

base_qual_cutoff_clipping Base quality cutoff for clipping (-c) Integer 12 (defined $value and $value != $vdef) ? " -c $value" : "" ( "" , " -c "+ str( value ) )[ value is not None and value != vdef ] Base quality cutoff must be > 5 $value > 5 value > 5 10 Default value: 12 clipping_range Clipping range (-y) Integer 250 (defined $value and $value != $vdef) ? " -y $value" : "" ( "" , " -y "+ str( value ) )[ value is not None and value != vdef ] Value must be > 5 $value > 5 value > 5 10 Default value: 250 good_reads Minimum number of good reads at clip pos (-z) Integer 3 (defined $value and $value != $vdef)? " -z $value" : "" ( "" , " -z "+ str( value ) )[ value is not None and value != vdef ] Value must be > 0 $value > 0 value > 0 10 Default value: 3 clipping_file File for clipping information (-w) ClippingParameters AbstractText (defined $value)? " -w $value" : "" ( "" , " -w "+ str( value ) )[ value is not None ] 10 The user may provide specific values for the parameters crange and gdepth or individual reads in a file. Each line in the file has the following format: ReadName crange5 gdepth5 crange3 gdepth3 where ReadName is the name of a read, crange5 and gdepth5 are values for the 5' end, and crange3 and gdepth3 are for the 3' end.
band_diagonals Band of diagonals The program determines a minimum band of diagonals for an overlapping alignment between two sequence reads. The band is expanded by a number of bases specified by the user with option "-a". band_expansion Band expansion size (-a) Integer 20 (defined $value and $value != $vdef) ? " -a $value" : "" ( "" , " -a "+ str( value ) )[ value is not None and value != vdef ] Band expansion size must be > 10 $value > 10 value > 10 10 The program determines a minimum band of diagonals for an overlapping alignment between two sequence reads. The band is expanded by a number of bases specified by the user with option "-a". Default value: 20 overlap_score Quality difference score of an overlap Overlaps between reads are evaluated by many measures. The first measure is based on base quality. If an overlap contains lots of differences at bases of high quality, then the overlap is removed. Specifically,let b be the base quality cutoff value and let d be the maximum difference score. The values for the two parameters can be set using the "-b" and "-d" options. If the overlap contains a difference at bases of quality values q1 and q2, then the score at the difference is max(0, min(q1, q2) - b). The difference score of an overlap is the sum of scores at each difference. For example, an overlap contains two differences, one at bases of quality values 15 and 30 and the other at bases of quality values 40 and 50. With b = 20, the difference score of the overlap is 0 + 20 = 20. If the difference score of an overlap exceeds d, then the overlap is removed. With b = 20, an overlap with 15 differences at bases of quality values 40 or higher has a difference score of at least 300 and is removed if d = 250. base_qual_cutoff Base quality cutoff for differences (-b) Integer 20 (defined $value and $value != $vdef) ? "-b $value" : "" ( "" , " -b "+ str( value ) )[ value is not None and value != vdef ] Base quality cutoff must be > 15 $value > 15 value > 15 10 max_qscore Maximum qscore sum at differences (-d) Integer 200 (defined $value and $value != $vdef) ? " -d $value" : "" ( "" , " -d "+ str( value ) )[ value is not None and value != vdef ] Value must be > 20 $value > 20 value > 20 10 nb_diff_overlap Number of differences in an overlap The second measure looks at the number of differences in an overlap. If the number of differences in an overlap is higher than expected, than the overlap is removed. Let an integer e be the maximum number of extra differences. Consider an overlap between reads f and g. Let d1 be the estimated number of sequencing errors for the region of f involved in the overlap and let r2 be that for the region of g involved in the overlap. If the observed number of differences in the overlap is greater than r1 + r2 + e, then the overlap is removed. The value for the parameter e can be changed using the "-e" option. The expected number of differences in the overlap is about r1 + r2. Giving a smaller value to e causes more overlaps to be removed. clearance Clearance between number of diff (-e) Integer 30 (defined $value and $value != $vdef)? " -e $value" : "" ( "" , " -e "+ str( value ) )[ value is not None and value != vdef ] Value must be > 10 $value > 10 value > 10 10 sim_score_overlap Similarity score of an overlap The third measure is based on overlap similarity score. The similarity score of an overlapping alignment is defined using base quality values. Let m be the match score factor, let n be the mismatch score factor, and let g be the gap penalty factor. Values for these parameters can be set with the "-m", "-n", and "-g" options. A match at bases of quality values q1 and q2 is given a score of m * min(q1,q2). A mismatch at bases of quality values q1 and q2 is given a score of n * min(q1,q2). A base of quality value q1 in a gap is given a score of -g * min(q1,q2), where q2 is the quality value of the base in the other sequence right before the gap. The score of a gap is the sum of scores of each base in the gap minus a gap open penalty. The similarity score of an overlapping alignment is the sum of scores of each match, each mismatch, and each gap. With m = 2, an overlap that consists of 25 matches at bases of quality value 10 has a score of 500. If the similarity score of an overlap is less than the overlap similarity score cutoff s, then the overlap is removed. match_score Match score factor (-m) Integer 2 (defined $value and $value != $vdef)? " -m $value" : "" ( "" , " -m "+ str( value ) )[ value is not None and value != vdef ] Value must be > 0 $value > 0 value > 0 10 mismatch_score Mismatch score factor (-n) Integer -5 (defined $value and $value != $vdef)? " -n $value" : "" ( "" , " -n "+ str( value ) )[ value is not None and value != vdef ] Value must be < 0 $value < 0 value < 0 10 gap_penalty Gap penalty factor (-g) Integer 6 (defined $value and $value != $vdef)? " -g $value" : "" ( "" , " -g "+ str( value ) )[ value is not None and value != vdef ] Value must be > 0 $value > 0 value > 0 10 percent_id_overlap Length and percent identity of an overlap The fourth requirement for an overlap is that the length in bp of the overlap is no less than the value of the minimum overlap length cutoff parameter. The value for this parameter can be changed with the "-o" option. The fifth requirement for an overlap is that the percent identity of the overlap is no less than the minimum percent identity cutoff. The value for this parameter can be changed with the "-p" option. A value of 75 for p means 0.75 or 75%. overlap_length Overlap length cutoff (-o) Integer 40 (defined $value and $value != $vdef) ? " -o $value" : "" ( "" , " -o "+ str( value ) )[ value is not None and value != vdef ] Value must be > 20 $value > 20 value > 20 10 overlap_identity Overlap percent identity cutoff (-p) Integer 80 (defined $value and $value != $vdef) ? " -p $value" : "" ( "" , " -p "+ str( value ) )[ value is not None and value != vdef ] Value must be > 65 $value > 65 value > 65 10 max_len_gaps_overlap Maximum length of gaps in an overlap The program provides a parameter (-f option) for the user to reject overlaps with a long gap. Let an integer f be the maximum length of gaps allowed in any overlap. Then any overlap with a gap longer than f is rejected by the program. The value for this parameter can be changed using the "-f" option. Note that a small value for this parameter may cause the program to remove true overlaps and to produce incorrect results. The "-f" option may be used by the user to split reads from alternative splicing forms into separate contigs. Geo Pertea at TIGR suggested that this option be added to the program. max_gap_length Maximum gap length in any overlap (-f) Integer 20 (defined $value and $value != $vdef) ? " -f $value" : "" ( "" , " -f "+ str( value ) )[ value is not None and value != vdef ] Value must be > 1 $value > 1 value > 1 10 overhang_pcent_len_overlap Overhang percent length of an overlap The total length of the different overhang regions in an overlap is controlled with the -h option. TAn overhang region in an overlap is a different terminal region before or after the overlap. TThe overhang percent length of an overlap is 100 times the total length of the different overhang regions in the overlap divided by the length of the overlap. TOverlaps with an overhang percent length greater than the maximum cutoff are rejected. max_overhang Maximum overhang percent length (-h) Integer 20 (defined $value and $value != $vdef)? " -h $value" : "" ( "" , " -h "+ str( value ) )[ value is not None and value != vdef ] Value must be > 2 $value > 2 value > 2 10 overlap_similarity Overlap similarity score cutoff (-s) Integer 900 (defined $value and $value != $vdef) ? " -s $value" : "" ( "" , " -s "+ str( value ) )[ value is not None and value != vdef ] Value must be > 400 $value > 400 value > 400 10 assembly_fwd Assembly of reads in forward orientation only The "-r" option is used to let CAP3 know whether to consider reads in reverse orientation for assembly. The default value for the option is 1, meaning that reads in reverse orientation are also considered for assembly. Specifying zero as "-r 0" instructs CAP3 to perform assembly of reads in forward orientation only. This option was suggested by Patrick Schnable's lab. reverse_orientation Reverse orientation value (-r) Integer 1 (defined $value and $value != $vdef) ? " -r $value" : "" ( "" , " -r "+ str( value ) )[ value is not None and value != vdef ] Value must be >= 0 $value >= 0 value >= 0 10 max_num_word_matches Maximum number of word matches This parameter (option -t) allows the user to trade off the efficiency of the program for its accuracy. For a read f, the program computes overlaps between read f and other reads by considering short word matches between read f and other reads. A word match is examined to see if it can be extended into a long overlap. If read f has overlaps with many other reads, then read f has many short word matches with many other reads. This parameter gives an upper limit, for any word, on the number of word matches between read f and other reads that are considered by the program. Using a large value for this parameter allows the program to consider more word matches between read f and other reads, which can find more overlaps for read f, but slows down the program. Using a small vlaue for this parameter has the opposite effect. A large value may be used if the depth of coverage is high for the data set. For example, a value of 150 is used for a data set with a maximum depth of coverage of 30, and a value of 500 for a data set with a maximum depth of coverage of 100. Using a very large value may cause the program to run forever or run out of memory. word_matches Maximum number of word matches (-t) Integer 300 (defined $value and $value != $vdef) ? " -t $value" : "" ( "" , " -t "+ str( value ) )[ value is not None and value != vdef ] Value must be > 30 $value > 30 value > 30 10 fwd_rev_const Forward-reverse constraints Corrections to an assembly are made using forward-reverse constraints. Let an integer u be the minimum number of constraints for correction. Consider an alternative overlap between two reads f and g. Assume that f is in contig C1 and that g is in contig C2. If the number of unsatisfied constraints that support the overlap between f and g is greater than the value of the u parameter plus the number of satisfied constraints that support the current joins involving f and g, then the current joins involving f and g are disconnected and the overlap between f and g is implemented. The value for this parameter can be changed with the "-u" option. Contigs that are linked by forward-reverse constraints are reported. The minimum number of constraints for reporting a link between two contigs is specified with the "-v" option. min_constraints_corr Minimum number of constraints for correction (-u) Integer 3 (defined $value and $value != $vdef) ? " -u $value" : "" ( "" , " -u "+ str( value ) )[ value is not None and value != vdef ] Value must be > 0 $value > 0 value > 0 10 min_constraints_linking Minimum number of constraints for linking (-v) Integer 2 (defined $value and $value != $vdef) ? " -v $value" : "" ( "" , " -v "+ str( value ) )[ value is not None and value != vdef ] Value must be > 0 $value > 0 value > 0 10 ace Assembly in ace format AceAssembly AbstractText ACE "*.cap.ace" "*.cap.ace" contig Contigs DNA Sequence 0,n FASTA "*.cap.contigs" "*.cap.contigs" contig_link Contig links Text "*.cap.contigs.links" "*.cap.contigs.links" contig_qual Quality of contigs Text "*.cap.contigs.qual" "*.cap.contigs.qual" info Assembly informations Text "*.cap.info" "*.cap.info" singlet Singlets DNA Sequence 0,n "*.cap.singlets" "*.cap.singlets"
Programs-5.1.1/cusp.xml0000644000175000001560000000744212072525233013656 0ustar bneronsis cusp EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net cusp Create a codon usage table from nucleotide sequence(s) http://bioweb2.pasteur.fr/docs/EMBOSS/cusp.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:codon_usage cusp e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_output Output section e_outfile Name of the output file (e_outfile) Filename cusp.e_outfile ("" , " -outfile=" + str(value))[value is not None] 2 e_outfile_out outfile_out option CuspReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 3 Programs-5.1.1/fuzzpro.xml0000644000175000001560000002167612072525233014430 0ustar bneronsis fuzzpro EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net fuzzpro Search for patterns in protein sequences http://bioweb2.pasteur.fr/docs/EMBOSS/fuzzpro.html http://emboss.sourceforge.net/docs/themes sequence:protein:motifs fuzzpro e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_pattern Search pattern Protein Pattern AbstractText ("", " -pattern=@" + str(value))[value is not None] 2 The standard IUPAC one-letter codes for the amino acids are used. The symbol 'x' is used for a position where any amino acid is accepted. Ambiguities are indicated by listing the acceptable amino acids for a given position, between square parentheses '[ ]'. For example: [ALT] stands for Ala or Leu or Thr. Ambiguities are also indicated by listing between a pair of curly brackets '{ }' the amino acids that are not accepted at a given position. For example: {AM} stands for any amino acid except Ala and Met. Each element in a pattern is separated from its neighbor by a '-'. (Optional in fuzzpro). Repetition of an element of the pattern can be indicated by following that element with a numerical value or a numerical range between parenthesis. Examples: x(3) corresponds to x-x-x, x(2,4) corresponds to x-x or x-x-x or x-x-x-x. When a pattern is restricted to either the N- or C-terminal of a sequence, that pattern either starts with a '<' symbol or respectively ends with a '>' symbol. A period ends the pattern. (Optional in fuzzpro). For example, [DE](2)HS{P}X(2)PX(2,4)C e_pmismatch Search pattern Integer 0 ("", " -pmismatch=" + str(value))[value is not None and value!=vdef] 3 e_output Output section e_outfile Name of the report file Filename fuzzpro.report ("" , " -outfile=" + str(value))[value is not None] 4 e_rformat_outfile Choose the report output format Choice SEQTABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 5 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 6 Programs-5.1.1/README0000644000175000001560000000046311022262432013027 0ustar bneronsis This package contains XML definitions, for bioinformatics software and databanks, to be integrated into the Mobyle framework. It is developped at the Institut Pasteur, in the Software and Databanks group; and distributed under the LGPLv2 license. For further inquiries, please contact mobyle@pasteur.fr. Programs-5.1.1/plotorf.xml0000644000175000001560000002025712072525233014370 0ustar bneronsis plotorf EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net plotorf Plot potential open reading frames in a nucleotide sequence http://bioweb2.pasteur.fr/docs/EMBOSS/plotorf.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:gene_finding sequence:nucleic:translation plotorf e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_advanced Advanced section e_start Start codons String ATG ("", " -start=" + str(value))[value is not None and value!=vdef] 2 e_stop Stop codons String TAA,TAG,TGA ("", " -stop=" + str(value))[value is not None and value!=vdef] 3 e_output Output section e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 4 xy_goutfile Name of the output graph Filename plotorf_xygraph ("" , " -goutfile=" + str(value))[value is not None] 5 xy_outgraph_png Graph file Picture Binary e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_graph == "data" "*.dat" auto Turn off any prompting String " -auto -stdout" 6 Programs-5.1.1/tfscan.xml0000644000175000001560000002621712072525233014163 0ustar bneronsis tfscan EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net tfscan Identify transcription factor binding sites in DNA sequences http://bioweb2.pasteur.fr/docs/EMBOSS/tfscan.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:transcription tfscan e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_menu Transcription factor class Choice V F I P V O C ("", " -menu=" + str(value))[value is not None and value!=vdef] 2 e_custom Transfac database data file (optional) TransfacData AbstractText e_menu=="C" ("", " -custom=" + str(value))[value is not None] 3 e_required Required section e_mismatch Number of mismatches (value greater than or equal to 0) Integer 0 ("", " -mismatch=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 4 e_additional Additional section e_minlength Display matches equal to or above this length (value greater than or equal to 1) Integer 1 ("", " -minlength=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 5 e_output Output section e_outfile Name of the report file Filename tfscan.report ("" , " -outfile=" + str(value))[value is not None] 6 e_rformat_outfile Choose the report output format Choice SEQTABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 7 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile A "quality" value ranging from 1 to 6 and reflecting the experimental reliability of a certain protein-DNA interaction. These values have the following meaning: 1 -- functionally confirmed factor binding site 2 -- binding of pure protein (purified or recombinant) 3 -- immunologically characterized binding activity of a cellular extract 4 -- binding activity characterized via a known binding sequence 5 -- binding of uncharacterized extract protein to a bone fide element 6 -- no quality assigned. auto Turn off any prompting String " -auto -stdout" 8 Programs-5.1.1/tmap.xml0000644000175000001560000002650212072525233013643 0ustar bneronsis tmap EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net tmap Predict and plot transmembrane segments in protein sequences http://bioweb2.pasteur.fr/docs/EMBOSS/tmap.html http://emboss.sourceforge.net/docs/themes sequence:protein:2D_structure structure:2D_structure tmap e_input Input section e_sequences sequences option Protein Alignment FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH 1,n ("", " -sequences=" + str(value))[value is not None] 1 File containing a sequence alignment e_output Output section e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 2 xy_goutfile Name of the output graph Filename tmap_xygraph ("" , " -goutfile=" + str(value))[value is not None] 3 xy_outgraph_png Graph file Picture Binary e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_graph == "data" "*.dat" e_outfile Name of the report file Filename tmap.report ("" , " -outfile=" + str(value))[value is not None] 4 e_rformat_outfile Choose the report output format Choice SEQTABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 5 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 6 Programs-5.1.1/msbar.xml0000644000175000001560000003414312072525233014006 0ustar bneronsis msbar EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net msbar Mutate a sequence http://bioweb2.pasteur.fr/docs/EMBOSS/msbar.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:mutation sequence:protein:mutation msbar e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_othersequence Other sequences that the mutated result should not match Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -othersequence=" + str(value))[value is not None ] 2 If you require that the resulting mutated sequence should not match a set of other sequences, then you can specify that set of sequences here. For example, if you require that the mutated sequence should not be the same as the input sequence, enter the input sequence here. If you want the result to be different to previous results of this program, specify the previous result sequences here. The program will check that the result does not match the sequences specified here before writing it out. If a match is found, then the mutation is started again with a fresh copy of the input sequence. If, after 10 such retries, there is still a match to the set of sequence given here, then the matching mutated sequence is written with a warning message. e_required Required section e_count Number of times to perform the mutation operations (value greater than or equal to 0) Integer 1 ("", " -count=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 3 e_point Point mutation operations (value from 1 to 4) Choice 0 0 1 2 3 4 5 6 ("", " -point=" + str(value))[value is not None and value!=vdef] 4 e_block Block mutation operations (value from 1 to 4) Choice 0 0 1 2 3 4 5 6 ("", " -block=" + str(value))[value is not None and value!=vdef] 5 e_codon Codon mutation operations (value from 1 to 4) Choice 0 0 1 2 3 4 5 6 ("", " -codon=" + str(value))[value is not None and value!=vdef] 6 Types of codon mutations to perform. These are only done if the sequence is nucleic. e_additional Additional section e_inframe Do 'codon' and 'block' operations in frame Boolean 0 ("", " -inframe")[ bool(value) ] 7 e_advanced Advanced section e_minimum Minimum size for a block mutation (value greater than or equal to 0) Integer 1 ("", " -minimum=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 8 e_maximum Maximum size for a block mutation Integer 10 ("", " -maximum=" + str(value))[value is not None and value!=vdef] 9 e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename msbar.e_outseq ("" , " -outseq=" + str(value))[value is not None] 10 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 11 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 12 Programs-5.1.1/diffseq.xml0000644000175000001560000003544712072525233014333 0ustar bneronsis diffseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net diffseq Compare and report features of two similar sequences http://bioweb2.pasteur.fr/docs/EMBOSS/diffseq.html http://emboss.sourceforge.net/docs/themes alignment:differences diffseq e_input Input section e_asequence asequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -asequence=" + str(value))[value is not None] 1 e_bsequence bsequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -bsequence=" + str(value))[value is not None] 2 e_required Required section e_wordsize Word size (value greater than or equal to 2) Integer 10 ("", " -wordsize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 2 is required value >= 2 3 The similar regions between the two sequences are found by creating a hash table of 'wordsize'd subsequences. 10 is a reasonable default. Making this value larger (20?) may speed up the program slightly, but will mean that any two differences within 'wordsize' of each other will be grouped as a single region of difference. This value may be made smaller (4?) to improve the resolution of nearby differences, but the program will go much slower. e_additional Additional section e_globaldifferences Force reporting of differences at the start and end Boolean 0 ("", " -globaldifferences")[ bool(value) ] 4 Normally this program will find regions of identity that are the length of the specified word-size or greater and will then report the regions of difference between these matching regions. This works well and is what most people want if they are working with long overlapping nucleic acid sequences. You are usually not interested in the non-overlapping ends of these sequences. If you have protein sequences or short RNA sequences however, you will be interested in differences at the very ends . It this option is set to be true then the differences at the ends will also be reported. e_output Output section e_outfile Name of the report file Filename diffseq.report ("" , " -outfile=" + str(value))[value is not None] 5 e_rformat_outfile Choose the report output format Choice DIFFSEQ DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 6 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile e_aoutfeat Name of the output feature file (e_aoutfeat) Filename diffseq.e_aoutfeat ("" , " -aoutfeat=" + str(value))[value is not None] 7 File for output of first sequence's features e_offormat_aoutfeat Choose the feature output format Choice GFF GFF EMBL SWISSPROT NBRF CODATA ("", " -offormat=" + str(value))[value is not None and value!=vdef] 8 e_aoutfeat_out aoutfeat_out option Feature AbstractText e_aoutfeat e_boutfeat Name of the output feature file (e_boutfeat) Filename diffseq.e_boutfeat ("" , " -boutfeat=" + str(value))[value is not None] 9 File for output of second sequence's features e_offormat_boutfeat Choose the feature output format Choice GFF GFF EMBL SWISSPROT NBRF CODATA ("", " -offormat=" + str(value))[value is not None and value!=vdef] 10 e_boutfeat_out boutfeat_out option Feature AbstractText e_boutfeat auto Turn off any prompting String " -auto -stdout" 11 Programs-5.1.1/dialign.xml0000644000175000001560000003656711441651470014327 0ustar bneronsis dialign 2.2.1 DIALIGN DNA and protein sequence alignment based on segment-to-segment comparison Morgenstern, Dress, Werner B. Morgenstern (1999). DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics 15, 211 - 218. http://dialign.gobics.de/ http://dialign.gobics.de/download/ alignment:multiple dialign sequence Sequences Sequence FASTA 2,n " $value" " " + str(value) 100 protein_dna Nucleic acid or protein alignment (-n) Choice p p n (defined $value and $value ne $vdef) ? " -n" : "" ( "" , " -n" )[ value is not None and value != vdef] 2 dialign_opt Others options 3 threshold Threshold (-thr) Float 0.0 (defined $value and $value != $vdef) ? " -thr $value" : "" ( "" , " -thr " + str(value) )[ value is not None and value != vdef] cluster Clustering type used to construct sequence tree Choice null null max_link min_link (defined $value and $value ne $vdef) ? " -$value" : "" ( "" , " -" +str(value) )[ value is not None and value !=vdef] "maximum or minimum linkage" clustering used to construct sequence tree (instead of UPGMA). iterative Iterative scoring scheme (-it) Boolean 0 ($value) ? " -it" : "" ( "" , " -it" )[ value ] iterative scoring scheme (fragment scores are based on conditional probabilities given the previously aligned fragments. I.e. the probability of a fragment -- and therefore its score -- is not based on the probability of random occurrence in the input sequences but rather on the probability of occurrence between those fragments that have already been accepted in previous iteration steps). overlap Overlap weights (-iw) Boolean 0 ($value) ? " -iw" : "" ( "" , " -iw" )[ value ] overlap weights switched off (by default, overlap weights are used if up to 35 sequences are aligned). This option speeds up the alignment but may lead to reduced alignment quality. dna_opt DNA options $protein_dna eq "n" protein_dna == "n" translation Translation of nucleotide diagonals into peptide diagonals (-nt) Boolean 0 ($value) ? " -nt" : "" ( "" , " -nt" )[ value ] Input sequences are nucleic acid sequences and `nucleic acid segments' are translated to `peptide segments'. translation_strand Strand to looked at Watson and Crick strands (-cs) Boolean $translation or $mix translation or mix 0 ($value) ? " -cs" : "" ( "" , " -cs" )[ value ] If segments are translated, not only the `Watson strand' but also the `Crick strand' is looked at mix Mixed alignments (-ma) Boolean 0 ($value) ? " -ma" : "" ( "" , " -ma" )[ value ] `mixed alignments' consisting of P-fragments and N-fragments if nucleic acid sequences are aligned. speed Dna alignment speed up (-ds) Boolean 0 ($value) ? " -ds" : "" ( "" , " -ds" )[ value ] Non-translated nucleic acid fragments are taken into account only if they start with at least two matches. Speeds up DNA alignment at the expense of sensitivity. long_genomic Long genomic sequences (-lgs) Boolean not $long_genomic_pep not long_genomic_pep 0 ($value) ? " -lgs" : "" ( "" , " -lgs" )[ value ] combines the following options: -ma, -it, -thr 2, -lmax 30, -smin 8, -nta, -ff, -fop, -ff, -cs, -ds, -pst long_genomic_pep Long genomic sequences (-lfs_t) Boolean not $long_genomic not long_genomic 0 ($value) ? " -lgs_t" : "" ( "" , " -lgs_t" )[ value ] Like "-lgs" but with all segment pairs assessed at the peptide level (rather than 'mixed alignments' as with the"-lgs" option). Therefore faster than -lgs but not very sensitive for non-coding regions. output_options Output options 3 max_simil Maximum number of * characters representing degree similarity (-stars) Integer (defined $value) ? " -stars $value" : "" ( "" , " -stars " + str(value) )[ value is not None] The number of `*' characters below the alignment reflects the degree of local similarity among sequences. More precisely: They represent the sum of `weights' of diagonals connecting residues at the respective position. By default, no stars are used but numbers between 0 and 9, instead. mask Mask not aligned residues (-mask) Boolean 0 ($value) ? " -mask" : "" ( "" , " -mask" )[ value ] residues not belonging to selected fragments are replaced by `*' characters in output alignment (rather than being printed in lower-case characters) fasta Alignment in fasta format (-fa) Boolean 0 ($value) ? " -fa" : "" ( "" , " -fa" )[ value ] Be aware that only upper-case letters are regarded to be aligned in fasta output file. ali Output file Text "*.ali" "*.cw" "*.ali" "*.cw" fasta_alignment fasta alignment file Alignment Fasta $fasta fasta "*.fa" "*.fa" Programs-5.1.1/remap.xml0000644000175000001560000007463212072525233014015 0ustar bneronsis remap EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net remap Display restriction enzyme binding sites in a nucleotide sequence http://bioweb2.pasteur.fr/docs/EMBOSS/remap.html http://emboss.sourceforge.net/docs/themes display:nucleic:restriction display:nucleic:translation remap e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_mfile Restriction enzyme methylation data file RestrictionEnzymeMethylationData AbstractText ("", " -mfile=" + str(value))[value is not None ] 2 e_required Required section e_enzymes Comma separated enzyme list String all ("", " -enzymes=" + str(value))[value is not None and value!=vdef] 3 The name 'all' reads in all enzyme names from the REBASE database. You can specify enzymes by giving their names with commas between then, such as: 'HincII,hinfI,ppiI,hindiii'. The case of the names is not important. You can specify a file of enzyme names to read in by giving the name of the file holding the enzyme names with a '@' character in front of it, for example, '@enz.list'. Blank lines and lines starting with a hash character or '!' are ignored and all other lines are concatenated together with a comma character ',' and then treated as the list of enzymes to search for. An example of a file of enzyme names is: ! my enzymes HincII, ppiII ! other enzymes hindiii HinfI PpiI e_sitelen Minimum recognition site length (value from 2 to 20) Integer 4 ("", " -sitelen=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 2 is required value >= 2 Value less than or equal to 20 is required value <= 20 4 This sets the minimum length of the restriction enzyme recognition site. Any enzymes with sites shorter than this will be ignored. e_additional Additional section e_mincuts Minimum cuts per re (value from 1 to 1000) Integer 1 ("", " -mincuts=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 Value less than or equal to 1000 is required value <= 1000 5 This sets the minimum number of cuts for any restriction enzyme that will be considered. Any enzymes that cut fewer times than this will be ignored. e_maxcuts Maximum cuts per re Integer 2000000000 ("", " -maxcuts=" + str(value))[value is not None and value!=vdef] 6 This sets the maximum number of cuts for any restriction enzyme that will be considered. Any enzymes that cut more times than this will be ignored. e_single Force single site only cuts Boolean 0 ("", " -single")[ bool(value) ] 7 If this is set then this forces the values of the mincuts and maxcuts qualifiers to both be 1. Any other value you may have set them to will be ignored. e_blunt Allow blunt end cutters Boolean 1 (" -noblunt", "")[ bool(value) ] 8 This allows those enzymes which cut at the same position on the forward and reverse strands to be considered. e_sticky Allow sticky end cutters Boolean 1 (" -nosticky", "")[ bool(value) ] 9 This allows those enzymes which cut at different positions on the forward and reverse strands, leaving an overhang, to be considered. e_ambiguity Allow ambiguous matches Boolean 1 (" -noambiguity", "")[ bool(value) ] 10 This allows those enzymes which have one or more 'N' ambiguity codes in their pattern to be considered e_plasmid Allow circular dna Boolean 0 ("", " -plasmid")[ bool(value) ] 11 If this is set then this allows searches for restriction enzyme recognition site and cut positions that span the end of the sequence to be considered. e_methylation Use methylation data Boolean 0 ("", " -methylation")[ bool(value) ] 12 If this is set then RE recognition sites will not match methylated bases. e_commercial Only enzymes with suppliers Boolean 1 (" -nocommercial", "")[ bool(value) ] 13 If this is set, then only those enzymes with a commercial supplier will be searched for. This qualifier is ignored if you have specified an explicit list of enzymes to search for, rather than searching through 'all' the enzymes in the REBASE database. It is assumed that, if you are asking for an explicit enzyme, then you probably know where to get it from and so all enzymes names that you have asked to be searched for, and which cut, will be reported whether or not they have a commercial supplier. e_table Genetic codes Choice 0 0 1 2 3 4 5 6 9 10 11 12 13 14 15 16 21 22 23 ("", " -table=" + str(value))[value is not None and value!=vdef] 14 e_frame Translation frames (value from 1 to 6) Choice 6 1 2 3 F -1 -2 -3 R 6 ("", " -frame=" + str(value))[value is not None and value!=vdef] 15 This allows you to specify the frames that are translated. If you are not displaying cut sites on the reverse sense, then the reverse sense translations will not be displayed even if you have requested frames 4, 5 or 6. By default, all six frames will be displayed. e_output Output section e_outfile Name of the output file (e_outfile) Filename remap.e_outfile ("" , " -outfile=" + str(value))[value is not None] 16 e_outfile_out outfile_out option RemapReport Report e_outfile e_cutlist List the enzymes that cut Boolean 1 (" -nocutlist", "")[ bool(value) ] 17 This produces lists in the output of the enzymes that cut, those that cut but are excluded because that cut fewer times than mincut or more times than maxcut and those enzymes that do not cut. e_flatreformat Display re sites in flat format Boolean 0 ("", " -flatreformat")[ bool(value) ] 18 This changes the output format to one where the recognition site is indicated by a row of '===' characters and the cut site is pointed to by a '>' character in the forward sense, or a '<' in the reverse sense strand. e_limit Limits reports to one isoschizomer Boolean 1 (" -nolimit", "")[ bool(value) ] 19 This limits the reporting of enzymes to just one enzyme from each group of isoschizomers. The enzyme chosen to represent an isoschizomer group is the prototype indicated in the data file 'embossre.equ', which is created by the program 'rebaseextract'. If you prefer different prototypes to be used, make a copy of embossre.equ in your home directory and edit it. If this value is set to be false then all of the input enzymes will be reported. You might like to set this to false if you are supplying an explicit set of enzymes rather than searching 'all' of them. e_translation Display translation Boolean 1 (" -notranslation", "")[ bool(value) ] 20 This displays the 6-frame translations of the sequence in the output. e_reverse Display cut sites and translation of reverse sense Boolean 1 (" -noreverse", "")[ bool(value) ] 21 This displays the cut sites and translation of the reverse sense. e_orfminsize Minimum size of orfs (value greater than or equal to 0) Integer 0 ("", " -orfminsize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 22 This sets the minimum size of Open Reading Frames (ORFs) to display in the translations. All other translation regions are masked by changing the amino acids to '-' characters. e_uppercase Regions to put in uppercase (eg: 4-57,78-94) String ("", " -uppercase=" + str(value))[value is not None] 23 Regions to put in uppercase. If this is left blank, then the sequence case is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 e_highlight Regions to colour in html (eg: 4-57 red 78-94 green) String ("", " -highlight=" + str(value))[value is not None] 24 Regions to colour if formatting for HTML. If this is left blank, then the sequence is left alone. A set of regions is specified by a set of pairs of positions. The positions are integers. They are followed by any valid HTML font colour. Examples of region specifications are: 24-45 blue 56-78 orange 1-100 green 120-156 red A file of ranges to colour (one range per line) can be specified as '@filename'. e_threeletter Display protein sequences in three-letter code Boolean 0 ("", " -threeletter")[ bool(value) ] 25 e_number Number the sequences Boolean 0 ("", " -number")[ bool(value) ] 26 e_width Width of sequence to display (value greater than or equal to 1) Integer 60 ("", " -width=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 27 e_length Line length of page (0 for indefinite) (value greater than or equal to 0) Integer 0 ("", " -length=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 28 e_margin Margin around sequence for numbering (value greater than or equal to 0) Integer 10 ("", " -margin=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 29 e_name Display sequence id Boolean 1 (" -noname", "")[ bool(value) ] 30 Set this to be false if you do not wish to display the ID name of the sequence e_description Display description Boolean 1 (" -nodescription", "")[ bool(value) ] 31 Set this to be false if you do not wish to display the description of the sequence e_offset Offset to start numbering the sequence from Integer 1 ("", " -offset=" + str(value))[value is not None and value!=vdef] 32 e_html Use html formatting Boolean 0 ("", " -html")[ bool(value) ] 33 auto Turn off any prompting String " -auto -stdout" 34 Programs-5.1.1/seqgen.xml0000644000175000001560000006771511767572177014222 0ustar bneronsis seqgen 1.3.2 SeqGen Sequence-Generator A. Rambaut, N. C. Grassly Rambaut, A. and Grassly, N. C. (1996) Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. http://bioweb2.pasteur.fr/docs/seq-gen/index.html Seq-Gen is a program that will simulate the evolution of nucleotide or amino acid sequences along a phylogeny, using common models of the substitution process. http://tree.bio.ed.ac.uk/software/seqgen/ http://tree.bio.ed.ac.uk/download.html?name=seqgen&version=v1.3.2&id=41&num=1 phylogeny:likelihood seqgen String "seq-gen" "seq-gen" 0 intree Input tree file Tree NEWICK NEXUS "< $value" "< " + str(value) 0 input Input parameters 1 Length Sequence length (-l) Integer 1000 (defined $value and $value != $vdef)? " -l $value":"" ("" , " -l " + str(value))[ value is not None and value != vdef] This option allows the user to set the length in nucleotides that each simulated sequence should be. datasets Number of simulated datasets per tree (-n) Integer 1 (defined $value and $value != $vdef)? " -n $value":"" ("" , " -n " + str(value))[ value is not None and value != vdef] This option specifies how many separate datasets should be simulated for each tree in the tree file. partition_numb Number of partitions for each dataset (-p) Integer 1 (defined $value and $value != $vdef)? " -p $value":"" ("" , " -p " + str(value))[ value is not None and value != vdef] Number of partion specifies how many partitions of each data set should be simulated. each partition must have its own tree and number specifying how many sites are in partition. Multiple sets of trees are being inputed with varying numbers of partitions, then this should specify the maximum number of partitions that will be required scale_branch Scale branch lengths (number greater > 0) (-s) Float not defined $scale_tree scale_tree is None 1.0 (defined $value and $value != $vdef)? " -s $value":"" ("" , " -s " + str(value))[ value is not None and value != vdef] Value greater than 0 is required $value > 0 value > 0 This option allows the user to set a value with which to scale the branch lengths in order to make them equal the expected number of substitutions per site for each branch. Basically Seq-Gen multiplies each branch length by this value. For example if you give an value of 0.5 then each branch length would be halved before using it to simulate the sequences. scale_tree Total tree scale (a decimal number greater > 0) (-d) Float $scale_branch != 1.0 scale_branch != 1.0 (defined $value)? " -d $value":"" ("" , " -d " + str(value))[ value is not None ] Value greater than 0 is required $value > 0 value > 0 This option allows the user to set a value which is the desired length of each tree in units of substitutions per site. The term 'tree length' here is the distance from the root to any one of the tips in units of mean number of substitutions per site. This option can only be used when the input trees are rooted and ultrametric (no difference in rate amongst the lineages). This has the effect of making all the trees in the input file of the same length before simulating data. The option multiplies each branch length by a value equal to SCALE divided by the actual length of the tree. input_seq Ancestral Sequence number (-k) Integer (defined $value)? " -k $value":"" ("" , " -k " + str(value))[ value is not None ] This option allows the user to use a supplied sequence as the ancestral sequence at the root (otherwise a random sequence is used). The value is an integer number greater than zero which refers to one of the sequences supplied as input with the tree. Method: The user can supply a sequence alignment as input, as well as the trees. This should be in relaxed PHYLIP format. The trees can then be placed in this file at the end, after a line stating how many trees there are. The file may look like this: 4 50 Taxon1 ATCTTTGTAGTCATCGCCGTATTAGCATTCTTAGATCTAA Taxon2 ATCCTAGTAGTCGCTTGCGCACTAGCCTTCCGAAATCTAG Taxon3 ACTTCTGTGTTTACTGAGCTACTAGCTTCCCTAAATCTAG Taxon4 ATTCCTATATTCGCTAATTTCTTAGCTTTCCTGAATCTGG 1 (((Taxon1:0.2,Taxon2:0.2):0.1,Taxon3:0.3):0.1,Taxon4:0.4); Note that the labels in the alignment do not have to match those in the tree (the ones in the tree will be used for output) there doesn't even have to be the same number of taxa in the alignment as in the trees. The sequence length supplied by the alignment will be used to obtain the simulated sequence length (unless the l option is set). The k option also refers to one of the sequences to specify the ancestral sequence. (see Appendix A) substitution Substitution model options 1 model Model of substitution (-m) Choice JTT JTT WAG PAM BLOSUM MTREV GENERAL F84 HKY GTR (defined $value )? " -m $value" : "" ( "" , " -m " + str(value) )[ value is not None ] shape Shape of the gamma distribution to use with gamma rate heterogeneity (-a) Float (defined $value)? " -a $value":"" ("" , " -a " + str(value))[ value is not None ] 1 Using this option the user may specify a shape for the gamma rate heterogeneity. The default is no site-specific rate heterogeneity. Enter a decimal number. categories Number of categories for the discrete gamma rate heterogeneity model (-g) Integer (defined $value)? " -g $value":"" ("" , " -g " + str(value))[ value is not None ] Enter an integer number between 2 and 32 $value >= 2 and $value <= 32 value >= 2 and value <= 32 1 Using this option the user may specify the number of categories for the discrete gamma rate heterogeneity model. The default is no site-specific rate heterogeneity (or the continuous model if only the -a option is specified. Enter an integer number between 2 and 32 invar_site Proportion of sites that should be invariable (-i) Float 0.0 (defined $value and $value != $vdef)? " -i $value":"" ("" , " -i " + str(value))[ value is not None and value != vdef] Enter a real number between 0.0 and 1.0 $value >= 0.0 and $value <= 1.0 value >= 0.0 and value <= 1.0 1 Specify the proportion of sites that should be invariable. These sites will be chosen randomly with this expected frequency. The default is no invariable sites. Invariable sites are sites thar cannot change as opposed to sites which don't exhibit any changes due to chance (and perhaps a low rate). Enter a real number between 0.0 and 1.0 nucleotide_opt Nucleotid model specific options 1 rate Rates for codon position heterogeneity (-c) Using this option the user may specify the relative rates for each codon position. This allows codon-specific rate heterogeneity to be simulated. The default is no site-specific rate heterogeneity. You can only have codon rates when using nucleotide models of substitution. rate1 First position Float "" "" Enter a decimal number You can only have codon rates when using nucleotide models of substitution. rate2 Second position Float "" "" Enter a decimal number rate3 Third position (enter a decimal number) Float "" "" Enter a decimal number rateAll Rates Float defined $rate1 and defined $rate2 and defined $rate3 rate1 is not None and rate2 is not None and rate3 is not None " -c $rate1 $rate2 $rate3" " -c %f %f %f " %(rate1,rate2,rate3) transratio Transition transversion ratio (TS/TV) for HKY or F84 model (-t) Float $model eq 'HKY' or $model eq 'F84' model =='HKY' or model == 'F84' (defined $value)? " -t $value":"" ("" , " -t " + str(value))[ value is not None ] This option allows the user to set a value for the transition transversion ratio (TS/TV). This is only valid when either the HKY or F84 model has been selected. matrix 6 values for the general reversable model's rate matrix (ACTG x ACTG) separated by one space (-r) String 1.0,1.0,1.0,1.0,1.0,1.0 (defined $value and $value ne $vdef)? " -r $value":"" ("" , " -r " + str(value))[ value is not None and value != vdef] 1 This option allows the user to set 6 values for the general reversable model's rate matrix. This is only valid when either the REV model has been selected. The values are six decimal numbers for the rates of transition from A to C, A to G, A to T, C to G, C to T and G to T respectively, separated by spaces or commas. The matrix is symmetrical so the reverse transitions equal the ones set (e.g. C to A equals A to C) and therefore only six values need be set. These values will be scaled such that the last value (G to T) is 1.0 and the others are set relative to this. frequencies Relative frequencies of nucleotides (-f) 1 This option is used to specify the relative frequencies of the four nucleotides. By default, Seq-Gen will assume these to be equal. If the given values don't sum to 1.0 then they will be scaled so that they do. You must give the frequencies for the 4 nucleotides freqA Frequencies of the A nucleotide Float "" "" freqC Frequencies of the C nucleotide Float "" "" freqG Frequencies of the G nucleotide Float "" "" freqT Frequencies of the T nucleotide Float "" "" freqAll Frequencies String defined $freqA and defined $freqC and defined $freqG and defined $freqT freqA is not None and freqC is not None and freqG is not None and freqT is not None " -f $freqA,$freqC,$freqG,$freqT" " -f " + str(freqA) + "," + str(freqC) + "," + str(freqG) + "," + str(freqT) miscellaneous_opt Miscellaneous options 1 random_seed Random number seed (-z) Integer (defined $value)? "-z $value":"" ("" , "-z " + str(value))[ value is not None ] This option allows to specify a seed for the random number generator. Using the same seed (with the same input) will result in identical simulated datasets. This is useful because you can recreate a set of simulations, you must use exactly the same model options output Output parameters 1 output_format Output file format (-o) Choice p p r n (defined $value and $value ne $vdef)? " -o$value":"" ("" , " -o" + str(value))[ value is not None and value != vdef] quiet Non verbose output (-q) Boolean 0 ($value)? " -q":"" ("" , " -q")[ value ] 1 write_ancest Write the ancestral sequences (-wa) Boolean 0 ($value)? " -wa":"" ("" , " -wa")[ value ] This option allows to obtain the sequences for each of the internal nodes in the tree. The sequences are written out along with the sequences for the tips of the tree in relaxed PHYLIP format. write_sites Write the sites rates (-wr) Boolean 0 ($value)? " -wr":"" ("" , " -wr")[ value ] This option allows to obtain the relative rate of substitution for each sites as used in each simulation. This will go to stderr and will be produced for each replicate simulation. outfile Output alignment file Alignment PHYLIPI RPHYLIP NEXUS "seqgen.out" "seqgen.out" Programs-5.1.1/growthpred.xml0000644000175000001560000002732111441651470015071 0ustar bneronsis growthpred 1.07 growthpred Sequence-based Prediction of Minimum Generation Times for Bacteria and Archaea S. Vieira-Silva, E. Rocha The program sources and Example files are downloadable here. Vieira-Silva S, Rocha EPC, 2010 The Systemic Imprint of Growth and Its Uses in Ecological (Meta)Genomics. PLoS Genet 6(1): e1000808. doi:10.1371/journal.pgen.1000808 http://bioweb2.pasteur.fr/docs/growthpred/growthpred.pdf This application predicts the minimum generation time for a bacterial or archaeal organism based on its codon usage bias intensity (CUB). The CUB index is calculated given two input sets of sequences: 1) highly expressed genes 2) other genes. The application runs 1000 bootstraps and outputs the average and the standard deviation of the predictions. ftp://ftp.pasteur.fr/pub/GenSoft/projects/growthpred/ sequence:nucleic:prediction sequence:nucleic:codon_usage growthpred example Run with example data (-e) 4 Boolean 0 ( "" , " -e " )[value] Use a set of example files (E. coli K12) to run the program. The expected results depending on the choosen option and the example files are shown in the program help pages (end of the form). input Input section 1 hsequence Enter sequences of highly expressed genes (-f) DNA Sequence FASTA 1,n not example and not b ( "" , " -f " + str( value ) + " " )[value is not None] Set of genes under purifying selection for codon usage. b Retrieve ribosomal protein genes by blast (-b) Boolean 0 ( "" , " -b")[value] nhsequence Enter sequences of non-highly expressed genes/Complete genome (-g) DNA Sequence FASTA 1,n not example ( "" , " -g " + str( value ) + " " )[value is not None] Set of control genes with near random codon usage. typeg Mixed organisms sequences Boolean 0 ( "" , " -m" )[value] You need to precise to the program if your sequences are metagenome or mixed organisms sequences. codon_remove Remove from sequences 3 fc First codon (-s) Boolean 1 ( " -s " , "" )[value!=vdef] lc Last codon (-S) Boolean 1 ( " -S " , "" )[value] fc lc options Options 2 geneticcode Choose genetic code (-c) Choice 0 0 1 2 3 4 5 6 7 8 " -c " + str( value ) rfiles Recover file with ribosomal protein genes retrieved by blast or given as input (-r) Boolean 0 ( "" , " -r")[value] ifiles Recover file with codon usage bias indexes for each gene (-i) Boolean 0 ( "" , " -i")[value ] autotemp Estimate optimal growth temperature (-t) Boolean 0 ( "" , " -t")[value ] temp Enter optimal growth temperature (Celsius) (-T) Integer 36 not $autotemp not autotemp ( "" , " -T " + str( value ) )[value is not None and value != vdef] outfile Outfile name (-o) Filename outfile ( "" , " -o " + str(value))[ value is not None ] res Text Prediction results "*.results" err Text Prediction error(s) "*.errors" cubs Text Codon usage bias indexes for each gene ifiles "*.cub" ribs Text Ribosomal protein genes retrieved by blast rfiles "*.ribs" Programs-5.1.1/netOglyc.xml0000644000175000001560000001744311542645122014473 0ustar bneronsis netOglyc 3.1 netOglyc predict O-glycosylation sites in proteins. http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?netNglyc Karin Julenius, kj@cbs.dtu.dk Prediction, conservation analysis and structural characterization of mammalian mucin-type O-glycosylation sites. K. Julenius, A. Moelgaard, R. Gupta and S. Brunak. Glycobiology, 15:153-164, 2005. http://www.cbs.dtu.dk/services/NetOGlyc/ http://www.cbs.dtu.dk/databases/OGLYCBASE/ The NetOglyc server produces neural network predictions of mucin type GalNAc O-glycosylation sites in mammalian proteins. sequence:protein:motifs sequence:protein:pattern sequence:protein:profiles netoglyc String " netOglyc " " netOglyc " sequence Input Sequence Sequence FASTA " " + str( value ) 50 >LEUK_RAT P13838 LEUKOSIALIN PRECURSOR (LEUCOCYTE SIALOGLYCOPROTEIN) (SIALOPHORIN) (CD43) (W3/13 ANTIGEN). WAQVVSQENLPNTMTMLPFTPNSESPSTSEALSTYSSIATVPVTEDPKESISPWGQTTAP ASSIPLGTPELSSFFFTSAGASGNTPVPELTTSQEVSTEASLVLFPKSSGVASDPPVTIT NPATSSAVASTSLETFKGTSAPPVTVTSSTMTSGPFVATTVSSETSGPPVTMATGSLGPS KETHGLSATIATSSGESSSVAGGTPVFSTKISTTSTPNPITTVPPRPGSSGMLLVSMLIA LTVVLVLVALLLLWRQRQKRRTGALTLSRGGKRNGTVDAWAGPARVPDEEATTASGSGGN KSSGAPETDGSGQRPTLTTFFSRRKSRQGSVALEELKPGTGPNLKGEEEPLVGSEDEAVE TPTSDGPQAKDGAAPQSL signal_peptide Run signalp on the input sequences (-sp). Boolean 0 ( $value ) ? "-sp ": "" ( "","-sp ")[ bool( value ) ] 10 Non-secretory proteins are unlikely to be glycosylated in vivo even though they contain potential motifs. Therefore, it is possible to run the signal peptide predictor signalp on the input sequences graphics generate graphics (-g). Boolean 0 ( $value )? "-g " : "" ( "" , "-g " )[ bool( value ) ] Generate graphics, plotting the G-score against the position in the sequence of each serine and threonine residue. The I-score is plotted instead for the residues where it decides the final answer. For each input sequence two files will be produced ``<seqname>.ps'' (in PostScript) and ``<seqname>.gif'' (in GIF). 20 results netOglyc report Report NetOGlyc "netOglyc.out" "netOglyc.out"

Each input sequence is displayed with the predicted sites indicated, labelled with ``S'' and ``T'' for serine and threonine, respectively. The signal peptide (if predicted) is labelled with ``_''. The details of the prediction for each serine and threonine residue are then shown in a table. The columns are:

  • sequence name
  • residue (S or T)
  • position in the sequence
  • G-score (general predictor)
  • I-score (isolated site predictor)
  • final answer (S/T for predicted sites, otherwise `.')
  • comment

The final answer is calculated as follows. If the G-score is >0.5 the residue is predicted as glycosy†lated; the higher the score the more confident the prediction. If the G-score is < 0.5 but the I-score >0.5 and there are no predicted neighbouring sites (distance <10 residues) the residue is also predicted as gly†cosylated.

If a residue in a predicted signal peptide is predicted as glycosylated there is a warning in the comment field.

postscript graphic in Postsricpt Binary NetOGlyc_graphic PostScript graphics graphics "*.ps" "*.ps" plotting the G-score against the position in the sequence of each serine and threonine residue. The I-score is plotted instead for the residues where it decides the final answer. gif graphic in GIF Binary NetOGlyc_graphic GIF graphics graphics "*.gif" "*.gif" plotting the G-score against the position in the sequence of each serine and threonine residue. The I-score is plotted instead for the residues where it decides the final answer.
Programs-5.1.1/rnalfold.xml0000644000175000001560000002473011672710655014515 0ustar bneronsis rnalfold 1.7 RNALfold Calculate locally stable secondary structures of RNAs Ivo L Hofacker, Peter F Stadler I.L. Hofacker, B. Priwitzer, and P.F. Stadler "Prediction of Locally Stable RNA Secondary Structures for Genome-Wide Surveys" Bioinformatics, 20, 186-190 (2004) RNALfold computes locally stable RNA secondary structure with a maximal base pair span. Output consists of a list of secondary structure components of size <= L, one entry per line. Each output line contains the predicted local structure its energy in kcal/mol and the starting position of the local structure. sequence:nucleic:2D_structure structure:2D_structure RNALfold seq RNA Sequence File DNA Sequence FASTA " < $value" " < " + str(value) 1000 control Control options 2 span Maximum allowed separation of a base pair to span (-L) Integer (defined $value)? " -L $value" : "" ( "" , " -L " + str(value))[ value is not None ] Set the maximum allowed separation of a base pair to span. I.e. no pairs (i,j) with j-i>span will be allowed. temperature Rescale energy parameters to a temperature of temperature Celcius (-T) Float 37 (defined $value and $value != $vdef)? " -T $value" : "" ( "" , " -T " + str(value) )[ value is not None and value != vdef] tetraloops Do not include special stabilizing energies for certain tetraloops (-4) Boolean 0 ($value)? " -4" : "" ( "" , " -4" )[ value ] dangling How to treat dangling end energies for bases adjacent to helices in free ends and multiloops (-d) Choice -d1 -d1 -d -d2 (defined $value and $value ne $vdef)? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] How to treat 'dangling end' energies for bases adjacent to helices in free ends and multiloops: Normally only unpaired bases can participate in at most one dangling end. With -d2 this check is ignored, this is the default for partition function folding (-p). -d ignores dangling ends altogether. Note that by default pf and mfe folding treat dangling ends differently, use -d2 (or -d) in addition to -p to ensure that both algorithms use the same energy model. The -d2 options is available for RNAfold, RNAeval, and RNAinverse only. input Input parameters 2 noGU Do not allow GU pairs (-noGU) Boolean 0 ($value)? " -noGU" : "" ( "" , " -noGU" )[ value ] noCloseGU Do not allow GU pairs at the end of helices (-noCloseGU) Boolean 0 ($value)? " -noCloseGU" : "" ( "" , " -noCloseGU" )[ value ] nsp Non standard pairs (comma seperated list) (-nsp) String (defined $value)? " -nsp $value" : "" ( "" , " -nsp " + str(value) )[ value is not None ] Allow other pairs in addition to the usual AU,GC,and GU pairs. pairs is a comma seperated list of additionally allowed pairs. If a the first character is a '-' then AB will imply that AB and BA are allowed pairs. e.g. RNAfold -nsp -GA will allow GA and AG pairs. Nonstandard pairs are given 0 stacking energy. parameter Energy parameter file (-P) EnergyParameterFile AbstractText (defined $value)? " -P $value" : "" ( "" , " -P " + str(value) )[ value is not None ] Read energy parameters from paramfile, instead of using the default parameter set. A sample parameterfile should accompany your distribution. See the RNAlib documentation for details on the file format. readseq String "readseq -f=19 -a $seq > $seq.tmp && (cp $seq $seq.orig && mv $seq.tmp $seq) ; " "readseq -f=19 -a "+ str(seq) + " > "+ str(seq) +".tmp && (cp "+ str(seq) +" "+ str(seq) +".orig && mv "+ str(seq) +".tmp "+ str(seq) +") ; " -10 psfiles Postscript file PostScript Binary "*.ps" "*.ps" Programs-5.1.1/bionj.xml0000644000175000001560000000642011441651470014002 0ustar bneronsis bionj BIONJ Neighbor Joining algorithm improved for molecular sequences O. Gascuel Gascuel O., 1997, BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data, Molecular Biology and Evolution 14(7):685-695 http://www.lirmm.fr/~w3ifa/MAAS/BIONJ/ http://www.lirmm.fr/~w3ifa/MAAS/BIONJ/ phylogeny:distance bionj infile Distances matrix File PhylipDistanceMatrix AbstractText " $value" " "+str(value) 1 Enter a matrix in Phylip format. This algorithm is adapted to evolutive distances calculated from molecular data sequences (O. Gascuel, 1997, MBE 14(7), 685-695). If only one data matrix is given, then BIONJ returns one tree. When the input file contains several matrices given one after the other, as obtained when combining PHYLIP's SEQBOOT and DNADIST to perform a bootstrap, BIONJ returns the same number of trees, written one after the other in the output file; this file may be given to PHYLIP's CONSENSE to obtain the bootstrap tree. 5 Alpha 0.000000 0.330447 0.625670 1.032032 1.354086 Beta 0.330447 0.000000 0.375578 1.096290 0.677616 Gamma 0.625670 0.375578 0.000000 0.975798 0.861634 Delta 1.032032 1.096290 0.975798 0.000000 0.226703 Epsilon 1.354086 0.677616 0.861634 0.226703 0.000000 treefile_name Name of Tree File Filename treefile " $value" " " + str(value) 2 treefile Tree File Tree NEWICK $treefile_name str(treefile_name) Programs-5.1.1/Env/0000755000175000001560000000000012175673300012706 5ustar bneronsisPrograms-5.1.1/Env/imogene_env.xml0000644000175000001560000000024012173751335015722 0ustar bneronsis /local/gensoft/bin Programs-5.1.1/Env/ref_genomes_annotation.xml0000644000175000001560000000400212005517123020137 0ustar bneronsis null upload /iGenomes/Bos_taurus/UCSC/bosTau4/Annotation/Genes/genes.gtf /iGenomes/Caenorhabditis_elegans/UCSC/ce6/Annotation/Genes/genes.gtf /iGenomes/Canis_familiaris/UCSC/canFam2/Annotation/Genes/genes.gtf /iGenomes/Drosophila_melanogaster/UCSC/dm3/Annotation/Genes/genes.gtf /iGenomes/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.gtf /iGenomes/Gallus_gallus/UCSC/galGal3/Annotation/Genes/genes.gtf /iGenomes/Homo_sapiens/UCSC/hg18/Annotation/Genes/genes.gtf /iGenomes/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf /iGenomes/Mus_musculus/UCSC/mm9/Annotation/Genes/genes.gtf /iGenomes/Pan_troglodytes/UCSC/panTro2/Annotation/Genes/genes.gtf /iGenomes/Rattus_norvegicus/UCSC/rn4/Annotation/Genes/genes.gtf /iGenomes/Saccharomyces_cerevisiae/UCSC/sacCer2/Annotation/Genes/genes.gtf /iGenomes/Sus_scrofa/UCSC/susScr2/Annotation/Genes/genes.gtf Programs-5.1.1/Env/bowtie_db.xml0000644000175000001560000000377112006757213015375 0ustar bneronsis null upload /iGenomes/Mus_musculus/UCSC/mm9/Sequence/BowtieIndex/genome /iGenomes/Bos_taurus/UCSC/bosTau4/Sequence/BowtieIndex/genome /iGenomes/Caenorhabditis_elegans/UCSC/ce6/Sequence/BowtieIndex/genome /iGenomes/Canis_familiaris/UCSC/canFam2/Sequence/BowtieIndex/genome /iGenomes/Drosophila_melanogaster/UCSC/dm3/Sequence/BowtieIndex/genome /iGenomes/Equus_caballus/UCSC/equCab2/Sequence/BowtieIndex/genome /iGenomes/Gallus_gallus/UCSC/galGal3/Sequence/BowtieIndex/genome /iGenomes/Homo_sapiens/UCSC/hg18/Sequence/BowtieIndex/genome /iGenomes/Homo_sapiens/UCSC/hg19/Sequence/BowtieIndex/genome /iGenomes/Pan_troglodytes/UCSC/panTro2/Sequence/BowtieIndex/genome /iGenomes/Rattus_norvegicus/UCSC/rn4/Sequence/BowtieIndex/genome /iGenomes/Saccharomyces_cerevisiae/UCSC/sacCer2/Sequence/BowtieIndex/genome /iGenomes/Sus_scrofa/UCSC/susScr2/Sequence/BowtieIndex/genome Programs-5.1.1/Env/tophat_colorspace_db.xml0000644000175000001560000000407212006757213017610 0ustar bneronsis null upload /iGenomes/Mus_musculus/UCSC/mm9/Sequence/BowtieColorIndex/genome /iGenomes/Bos_taurus/UCSC/bosTau4/Sequence/BowtieColorIndex/genome /iGenomes/Caenorhabditis_elegans/UCSC/ce6/Sequence/BowtieColorIndex/genome /iGenomes/Canis_familiaris/UCSC/canFam2/Sequence/BowtieColorIndex/genome /iGenomes/Drosophila_melanogaster/UCSC/dm3/Sequence/BowtieColorIndex/genome /iGenomes/Equus_caballus/UCSC/equCab2/Sequence/BowtieColorIndex/genome /iGenomes/Gallus_gallus/UCSC/galGal3/Sequence/BowtieColorIndex/genome /iGenomes/Homo_sapiens/UCSC/hg18/Sequence/BowtieColorIndex/genome /iGenomes/Homo_sapiens/UCSC/hg19/Sequence/BowtieColorIndex/genome /iGenomes/Pan_troglodytes/UCSC/panTro2/Sequence/BowtieColorIndex/genome /iGenomes/Rattus_norvegicus/UCSC/rn4/Sequence/BowtieColorIndex/genome /iGenomes/Saccharomyces_cerevisiae/UCSC/sacCer2/Sequence/BowtieColorIndex/genome /iGenomes/Sus_scrofa/UCSC/susScr2/Sequence/BowtieColorIndex/genome Programs-5.1.1/Env/scangen_data.xml0000644000175000001560000000005411672707410016037 0ustar bneronsis/path/to/scangen/dir/Programs-5.1.1/Env/penncnv_data.xml0000644000175000001560000000021611672707410016070 0ustar bneronsis /path/to/penncnv/executables/dir/ /path/to/penncnv/shared/data/dir/ Programs-5.1.1/Env/pftools_data.xml0000644000175000001560000000015211672707410016106 0ustar bneronsis /path/to/Prosite/prosite.dat /path/to/fasta/databank/dir/ Programs-5.1.1/Env/nucdbs.xml0000644000175000001560000000614211672707410014712 0ustar bneronsis null embl embl_new genbank genbank_new gbbct gbpri gbmam gbrod gbvrt gbinv gbpln gbvrl gbphg gbest gbsts gbsyn gbpat gbuna gbgss gbhtg imgt borrelia ecoli genitalium pneumoniae hpylori bsubtilis tuberculosis ypestis yeast pfalciparum Programs-5.1.1/Env/nucdbs_blast.xml0000644000175000001560000000063611772052071016076 0ustar bneronsis null name of the 1rst bank (available for blast only) as indexed by formatdb Programs-5.1.1/Env/pratt_data.xml0000644000175000001560000000017511672707410015557 0ustar bneronsis /path/to/share/pratt/data/dir/ /path/to/databank/uniprot_sprot.dat Programs-5.1.1/Env/dssp_data.xml0000644000175000001560000000005711672707410015375 0ustar bneronsis/path/to/Pdb/databank/dir/Programs-5.1.1/Env/consensus_data.xml0000644000175000001560000000006511672707410016443 0ustar bneronsis/path/to/consensus/alphabet/dir/Programs-5.1.1/Env/fasta_data.xml0000644000175000001560000000004511672707410015517 0ustar bneronsis/path/to/fasta/databank/dir/Programs-5.1.1/Env/smile_data.xml0000644000175000001560000000006111672707410015530 0ustar bneronsis/path/to/smile/alphabet/dir/Programs-5.1.1/Env/tophat_transcriptome_color_index.xml0000644000175000001560000001041212006757213022263 0ustar bneronsis null '' raw "" gtf "" mm9 " -G /iGenomes/Mus_musculus/UCSC/mm9/Sequence/BowtieTranscriptomeColorIndex/genes.gff --transcriptome-index=/iGenomes/Mus_musculus/UCSC/mm9/Sequence/BowtieTranscriptomeColorIndex/genes" bosTau4 " -G /iGenomes/Bos_taurus/UCSC/bosTau4/Sequence/BowtieTranscriptomeColorIndex/genes.gff --transcriptome-index=/iGenomes/Bos_taurus/UCSC/bosTau4/Sequence/BowtieTranscriptomeColorIndex/genes" ce6 " -G /iGenomes/Caenorhabditis_elegans/UCSC/ce6/Sequence/BowtieTranscriptomeColorIndex/genes.gff --transcriptome-index=/iGenomes/Caenorhabditis_elegans/UCSC/ce6/Sequence/BowtieTranscriptomeColorIndex/genes" canFam2 " -G /iGenomes/Canis_familiaris/UCSC/canFam2/Sequence/BowtieTranscriptomeColorIndex/genes.gff --transcriptome-index=/iGenomes/Canis_familiaris/UCSC/canFam2/Sequence/BowtieTranscriptomeColorIndex/genes" dm3 " -G /iGenomes/Drosophila_melanogaster/UCSC/dm3/Sequence/BowtieTranscriptomeColorIndex/genes.gff --transcriptome-index=/iGenomes/Drosophila_melanogaster/UCSC/dm3/Sequence/BowtieTranscriptomeColorIndex/genes" equCab2 " -G /iGenomes/Equus_caballus/UCSC/equCab2/Sequence/BowtieTranscriptomeColorIndex/genes.gff --transcriptome-index=/iGenomes/Equus_caballus/UCSC/equCab2/Sequence/BowtieTranscriptomeColorIndex/genes" galGal3 " -G /iGenomes/Gallus_gallus/UCSC/galGal3/Sequence/BowtieTranscriptomeColorIndex/genes.gff --transcriptome-index=/iGenomes/Gallus_gallus/UCSC/galGal3/Sequence/BowtieTranscriptomeColorIndex/genes" hg_18 " -G /iGenomes/Homo_sapiens/UCSC/hg18/Sequence/BowtieTranscriptomeColorIndex/genes.gff --transcriptome-index=/iGenomes/Homo_sapiens/UCSC/hg18/Sequence/BowtieTranscriptomeColorIndex/genes" hg_19 " -G /iGenomes/Homo_sapiens/UCSC/hg19/Sequence/BowtieTranscriptomeColorIndex/genes.gff --transcriptome-index=/iGenomes/Homo_sapiens/UCSC/hg19/Sequence/BowtieTranscriptomeColorIndex/genes" panTro2 " -G /iGenomes/Pan_troglodytes/UCSC/panTro2/Sequence/BowtieTranscriptomeColorIndex/genes.gff --transcriptome-index=/iGenomes/Pan_troglodytes/UCSC/panTro2/Sequence/BowtieTranscriptomeColorIndex/genes" rn2 " -G /iGenomes/Rattus_norvegicus/UCSC/rn4/Sequence/BowtieTranscriptomeColorIndex/genes.gff --transcriptome-index=/iGenomes/Rattus_norvegicus/UCSC/rn4/Sequence/BowtieTranscriptomeColorIndex/genes" sacCer2 " -G /iGenomes/Saccharomyces_cerevisiae/UCSC/sacCer2/Sequence/BowtieTranscriptomeColorIndex/genes.gff --transcriptome-index=/iGenomes/Saccharomyces_cerevisiae/UCSC/sacCer2/Sequence/BowtieTranscriptomeColorIndex/genes" susScr2 " -G /iGenomes/Sus_scrofa/UCSC/susScr2/Sequence/BowtieTranscriptomeColorIndex/genes.gff --transcriptome-index=/iGenomes/Sus_scrofa/UCSC/susScr2/Sequence/BowtieTranscriptomeColorIndex/genes" Programs-5.1.1/Env/ViennaRNA_readseq.xml0000644000175000001560000000007011672707410016713 0ustar bneronsis/path/to/ViennaRNA/readseq/Programs-5.1.1/Env/goldendb.xml0000644000175000001560000000244111767572177015227 0ustar bneronsis null embl enzyme genbank genpept imgt prosite rdpii refseq uniprot Programs-5.1.1/Env/gruppi_data.xml0000644000175000001560000000006111672707410015725 0ustar bneronsis/path/to/the/needed/matrix.list Programs-5.1.1/Env/tophat_transcriptome_index.xml0000644000175000001560000001021012006757213021061 0ustar bneronsis null '' raw "" gtf "" mm9 " -G /iGenomes/Mus_musculus/UCSC/mm9/Sequence/BowtieTranscriptomeIndex/genes.gff --transcriptome-index=/iGenomes/Mus_musculus/UCSC/mm9/Sequence/BowtieTranscriptomeIndex/genes" bosTau4 " -G /iGenomes/Bos_taurus/UCSC/bosTau4/Sequence/BowtieTranscriptomeIndex/genes.gff --transcriptome-index=/iGenomes/Bos_taurus/UCSC/bosTau4/Sequence/BowtieTranscriptomeIndex/genes" ce6 " -G /iGenomes/Caenorhabditis_elegans/UCSC/ce6/Sequence/BowtieTranscriptomeIndex/genes.gff --transcriptome-index=/iGenomes/Caenorhabditis_elegans/UCSC/ce6/Sequence/BowtieTranscriptomeIndex/genes" canFam2 " -G /iGenomes/Canis_familiaris/UCSC/canFam2/Sequence/BowtieTranscriptomeIndex/genes.gff --transcriptome-index=/iGenomes/Canis_familiaris/UCSC/canFam2/Sequence/BowtieTranscriptomeIndex/genes" dm3 " -G /iGenomes/Drosophila_melanogaster/UCSC/dm3/Sequence/BowtieTranscriptomeIndex/genes.gff --transcriptome-index=/iGenomes/Drosophila_melanogaster/UCSC/dm3/Sequence/BowtieTranscriptomeIndex/genes" equCab2 " -G /iGenomes/Equus_caballus/UCSC/equCab2/Sequence/BowtieTranscriptomeIndex/genes.gff --transcriptome-index=/iGenomes/Equus_caballus/UCSC/equCab2/Sequence/BowtieTranscriptomeIndex/genes" galGal3 " -G /iGenomes/Gallus_gallus/UCSC/galGal3/Sequence/BowtieTranscriptomeIndex/genes.gff --transcriptome-index=/iGenomes/Gallus_gallus/UCSC/galGal3/Sequence/BowtieTranscriptomeIndex/genes" hg_18 " -G /iGenomes/Homo_sapiens/UCSC/hg18/Sequence/BowtieTranscriptomeIndex/genes.gff --transcriptome-index=/iGenomes/Homo_sapiens/UCSC/hg18/Sequence/BowtieTranscriptomeIndex/genes" hg_19 " -G /iGenomes/Homo_sapiens/UCSC/hg19/Sequence/BowtieTranscriptomeIndex/genes.gff --transcriptome-index=/iGenomes/Homo_sapiens/UCSC/hg19/Sequence/BowtieTranscriptomeIndex/genes" panTro2 " -G /iGenomes/Pan_troglodytes/UCSC/panTro2/Sequence/BowtieTranscriptomeIndex/genes.gff --transcriptome-index=/iGenomes/Pan_troglodytes/UCSC/panTro2/Sequence/BowtieTranscriptomeIndex/genes" rn2 " -G /iGenomes/Rattus_norvegicus/UCSC/rn4/Sequence/BowtieTranscriptomeIndex/genes.gff --transcriptome-index=/iGenomes/Rattus_norvegicus/UCSC/rn4/Sequence/BowtieTranscriptomeIndex/genes" sacCer2 " -G /iGenomes/Saccharomyces_cerevisiae/UCSC/sacCer2/Sequence/BowtieTranscriptomeIndex/genes.gff --transcriptome-index=/iGenomes/Saccharomyces_cerevisiae/UCSC/sacCer2/Sequence/BowtieTranscriptomeIndex/genes" susScr2 " -G /iGenomes/Sus_scrofa/UCSC/susScr2/Sequence/BowtieTranscriptomeIndex/genes.gff --transcriptome-index=/iGenomes/Sus_scrofa/UCSC/susScr2/Sequence/BowtieTranscriptomeIndex/genes" Programs-5.1.1/Env/combat_data.xml0000644000175000001560000000004711672707410015670 0ustar bneronsis/path/to/combat/matrixPrograms-5.1.1/Env/protdbs.xml0000644000175000001560000000154112117636214015105 0ustar bneronsis null uniprot uniprot_sprot uniprot_trembl nrprot nrprot_month sbase Programs-5.1.1/Env/blast2_env.xml0000644000175000001560000000020211430014317015447 0ustar bneronsis /path/to/the/databases/files /path/to/the/matrix/files Programs-5.1.1/pratt.xml0000644000175000001560000010452211767572177014056 0ustar bneronsis pratt 2.1 Pratt Pattern discovery in sets of unaligned protein sequences K. Sturzrehm, I. Jonassen http://www.ii.uib.no/~inge/Pratt.html ftp://ftp.ebi.ac.uk/pub/software/unix/pratt/ sequence:protein:pattern pratt seq Sequence File Sequence FASTA " fasta $value" " fasta "+str(value) 2 Fasta format. One file containing all the sequences. One sequence is specified by one line starting with '>' in position 1 and then the name of the sequence, and some lines containing the sequence in upper or lower case. The end of a sequence is identified by looking for either the start of a new sequence or the end of the file. >seq1 HLKSEDEMKASEDLKKHKKKGHHEAEIKPLAQSHATKHKIPVKYLEFIS SDLHAHKLRKRKLVKNMVL >seq2 VEVFISEDDLKRKLVKNMVLDKDNKQHMLPKAGDTVTNYTLLGLVLVLI WLIMQRKRKKKEDVSEENIQEIPKKELAASS conservation Pattern conservation parameters 3 CM Minimum number of Sequences to Match (-CM) Integer (defined $value)? " -CM $value":"" ("" , " -CM " + str(value))[ value is not None ] Value must be between 2 and 4 $value >= 2 and $value <= 4 value >= 2 and value <= 4 Set the minimum number of sequences to match a pattern. Pratt will only report patterns that match at least the chosen number of the sequences that you have input. Pratt will not allow you to choose a value higher than the number of sequences input. Cpct Minimum Percentage of Sequences to Match (-C%) Integer 100 (defined $value and $value != $vdef)? " -C% $value":"" ("" , " -C% " + str(value))[ value is not None and value != vdef] Set the minimum percentage of the input sequences that should match a pattern. If you set this to, say 80, Pratt will only report patterns matching at least 80 % of the sequences input. restrictions Pattern restrictions parameters 3 PP Position in sequence (-PP) Choice off off complete start (defined $value and $value ne $vdef)? " -PP $value":"" ("" , " -PP " + str(value))[ value is not None and value != vdef] You must give a file to define the area (PF) $value eq $vdef or ($value ne $vdef and defined $PF) value == vdef or (value != vdef and PF is not None) PF Restriction File name (if PP not off) (-PF) RestrictionPattern AbstractText $PP ne "off" PP != "off" " -PF $value" " -PF " + str(value) This file contains lines to restrict pattern searches to certain regions in a sequence, say ACE2_YEAST: >ACE2_YEAST (100,200) PL Maximum Pattern Length (-PL) Integer 50 (defined $value and $value != $vdef)? " -PL $value":"" ("" , " -PL " + str(value))[ value is not None and value != vdef] Allows you to set the maximum length of a pattern. The length of the pattern C-x(2,4)-[DE] is 1+4+1=6. PN Maximum number of Pattern Symbols (-PN) Integer 50 (defined $value and $value != $vdef)? " -PN $value":" " ("" , " -PN " + str(value))[ value is not None and value != vdef] Using this you can set the maximum number of symbols in a pattern. The pattern C-x(2,4)-[DE] has 2 symbols (C and [DE]). PX Maximum number of consecutive x's (-PX) Integer 5 (defined $value and $value != $vdef)? " -PX $value":"" ("" , " -PX " + str(value))[ value is not None and value != vdef] Using this option you can set the maximum length of a wildcard. x - 1 x(10) - 10 x(3,4) - 4 FN Maximum number of flexible spacers (-FN) Integer 2 (defined $value and $value != $vdef)? " -FN $value":"" ("" , " -FN " + str(value))[ value is not None and value != vdef] Using this option you can set the maximum number of flexible wildcards (matching a variable number of arbitrary sequence symbols). For instance x(2,4) is a flexible wildcard, and the pattern C-x(2,4)-[DE]-x(10)-F contains one flexible wildcard. FL Maximum Flexibility (-FL) Integer 2 (defined $value and $value != $vdef)? " -FL $value":"" ("" , " -FL " + str(value))[ value is not None and value != vdef] You can set the maximum flexibility of a flexible wildcard (matching a variable number of arbitrary sequence symbols). For instance x(2,4) and x(10,12) has flexibility 2, and x(10) has flexibility 0. Increasing FL will increase the time used by Pratt. FP Maximum Flexibility Product (-FP) Integer 10 (defined $value and $value != $vdef)? " -FP $value":"" ("" , " -FP " + str(value))[ value is not None and value != vdef] Using option FP you can set an upper limit on the product of a flexibilities for a pattern. This is related to the memory requirements of the search, and increasing the limit, increases the memory usage. Some patterns and the corresponding product of flexibilities. C-x(2,4)-[DE]-x(10)-F - (4-2+1)*(10-10+1)= 3 C-x(2,4)-[DE]-x(10-14)-F - (4-2+1)*(14-10+1)= 3*5= 15 BI Input Pattern Symbol File? (-BI) Boolean 0 ($value)? " -BI on":"" ("" , " -BI on")[ value ] Using the B options (BN,BI,BF) on the menu you can control which pattern symbols will be used during the initial pattern search and during the refinement phase. The pattern symbols that can be used, are read from a file if the BI option is set, otherwise a default set will be used. The default set has as the 20 first elements, the single amino acid symbols, and it also contains a set of ambiguous symbols, each containing amino acids that share some physio-chemical properties BF Input Pattern Symbol File name (if BI on) (-BF) PatternSymbol AbstractText $BI BI (defined $value) ? " -BF $value" : "-BF Pratt.sets.big" ( " -BF Pratt.sets.big" , " -BF " + str(value) )[ value is not None ] In the file each symbol is given on a separate line concataining the letters that the symbol should match. For the example, only patterns with the symbols C and [DE] would be considered. During the initial search, pattern symbols corresponding to the first BN lines can be used. default file is: Pratt.sets.big C DE BN Number of Pattern Symbols Initial Search (-BN) Integer 20 (defined $value and $value != $vdef)? " -BN $value":"" ("" , " -BN " + str(value))[ value is not None and value != vdef] Increasing BN will slow down the search and increase the memory usage, but allow more ambiguous pattern symbols. scoring Pattern Scoring parameters 3 S Scoring (-S) Choice info info mdl tree dist ppv (defined $value and $value ne $vdef)? " -S $value":"" ("" , " -S " + str(value))[ value is not None and value != vdef] The S option allows you to control the scoring of patterns. There are five possible scoring schemes to be used: info: patterns are scored by their information content as defined in (Jonassen et al, 1995). Note that a pattern's score is independent of which sequences it matches. mdl: patterns are scored by a Minimum Description Length principle derived scoring scheme, which is related to the one above, but penalises patterns scoring few sequences vs. patterns scoring many. Parameters Z0 to Z3 are required when this scoring scheme is used. tree: a pattern is scored higher if it contains more information and/or if it matches more diverse sequences. The sequence diversity is calculated from a dendrogram which has to be input. dist: similar to the tree scoring, except a matrix with pairwise the similarity between all pairs of input sequences are used instead of the tree. The matrix has to be input. ppv: a measure of Positive Predictive Value - it is assumed that the input sequences constitute a family, and are all contained in the Swiss-Prot database. PPV measures how certain one can be that a sequence belongs to the family given that it matches the pattern. For the last three scoring schemes (tree, dist, ppv), an input file is needed. treefile Tree File for Scoring equal to tree (-SF) Tree $S eq "tree" S == "tree" " -SF $value " " -SF " + str(value) distfile Distances File if Scoring equal to dist (-SF) PhylipDistanceMatrix AbstractText $S eq "dist" S == "dist" " -SF $value; " " -SF " + str(value) A matrix with pairwise the similarity between all pairs of input sequences are used instead of the tree. uniprotdb Swissprot file if Scoring equal to ppv (-SF) Sequence SWISSPROT $S eq "ppv" S == "ppv" (defined $value) ? " -SF $value" : " -SF " ( " -SF " , " -SF " + str(value) )[ value is not None ] Default: uniprot_sprot databank. mdl_param MDL parameters (Z0-Z3) (if MDL scoring) $S eq "mdl" S == "mdl" 3 Z0 Z0 Integer 10.00 (defined $value and $value != $vdef)? " -Z0 $value" : "" ( "" , " -Z0 " + str(value) )[ value is not None and value != vdef] Z1 Z1 Integer 10.00 (defined $value and $value != $vdef)? " -Z1 $value" : "" ( "" , " -Z1 " + str(value) )[ value is not None and value != vdef] Z2 Z2 Integer 3.00 (defined $value and $value != $vdef)? " -Z2 $value" : "" ( "" , " -Z2 " + str(value) )[ value is not None and value != vdef] Z3 Z3 Integer 10.00 (defined $value and $value != $vdef)? " -Z3 $value" : "" ( "" , " -Z3 " + str(value) )[ value is not None and value != vdef] search Search parameters 3 G Pattern Graph from: (-G) Choice seq seq al query (defined $value and $value ne $vdef)? " -G $value":"" ("" , " -G " + str(value))[ value is not None and value != vdef] If G is set to al or query, another option GF is required allowing the user to give the name of a file containing a multiple sequence alignment (in Clustal W format), or a query sequence in FastA format (without annotation). Only patterns consistent with the alignment/matching the query sequence will be considered. GF_ali Alignment file (if G set to al) (-GF) Alignment CLUSTAL $G eq "al" G == "al" " -GF $value" " -GF " + str(value) Alignment file must be in CLUSTALW format GF_seq Query sequence file (if G set query) (-GF) Sequence FASTA $G eq "query" G == "query" " -GF $value" " -GF " + str(value) Query file must be in Fasta format E Search Greediness (-E) Integer 3 (defined $value and $value != $vdef)? " -E $value":"" ("" , " -E " + str(value))[ value is not None and value != vdef] Using the E parameter you can adjust the greediness of the search. Setting E to 0 (zero), the search will be exhaustive. Increasing E increases the greediness, and decreases the time used in the search. R Pattern Refinement (-R) Boolean 1 ($value) ? " -R off" : "" ( " " , " -R off" )[ value ] When the R option is switched on, patterns found during the initial pattern search are input to a refinement algorithm where more ambiguous pattern symbols can be added. For instance the pattern C-x(4)-D might be refined to C-x-[ILV]-x-D-x(3)-[DEF] RG Generalise ambiguous symbols (if Pattern Refinement on) (-RG) Boolean $R R 0 ($value)? " -RG on" : "" ( "" , " -RG on" )[ value ] If the RG option is switched on, then ambiguous symbols listed in the symbols file (or in the default symbol set -- see help for option B), are used. If RG is off, only the letters needed to match the input sequences are included in the ambiguous pattern positions. For example, if [ILV] is a listed allowed symbol, and [IL] is not, [IL] can be included in a pattern if RG is off, but if RG is on, the full symbol [ILV] will be included instead. output Output options 3 OP PROSITE Pattern Format (-OP) Boolean 1 ($value) ? " -OP off " : "" ( "" , " -OP off " )[ value ] When switched on, patterns will be output in PROSITE style (for instance C-x(2,4)-[DE]). When switched off, patterns are output in a simpler consensus pattern style (for instance Cxx--[DE] where x matches exactly one arbitrary sequence symbol and - matches zero or one arbitrary sequence symbol). ON Maximum number patterns (-ON) Integer 50 (defined $value and $value != $vdef)? " -ON $value":"" ("" , " -ON " + str(value))[ value is not None and value != vdef] Set the maximum number of patterns to be found by Pratt OA Maximum number Alignments (-OA) Integer 50 (defined $value and $value != $vdef)? " -OA $value":"" ("" , " -OA " + str(value))[ value is not None and value != vdef] Set the max. nr of patterns for which Pratt is to produce an alignment of the sequence segments matching it. M Print Patterns in sequences (-M) Boolean 1 ($value) ? " -M off " : "" ( " " , " -M off" )[ value ] If the M option is set, then Pratt will print out the location of the sequence segments matching each of the (maximum 52) best patterns. The patterns are given labels A, B,...Z,a,b,...z in order of decreasing pattern score. Each sequence is printed on a line, one character per K-tuple in the sequence. If pattern with label C matches the third K-tuple in a sequence C is printed out. If several patterns match in the same K-tuple, only the best will be printed. MR Ratio for printing (-MR) Integer $M M 10 (defined $value and $value != $vdef)? " -MR $value":"" ("" , " -MR " + str(value))[ value is not None and value != vdef] Sets the K value (ratio) used for printing the summary information about where in each sequence the pattern matches are found. MV Print vertically (-MV) Boolean $M M 0 ($value)? " -MV on " : "" ( "" , " -MV on " )[ value ] If set, the output is printed vertically instead of horizontally, vertical output can be better for large sequence sets. outfiles Output file Text "*.pat" "*.pat" report Report output file Text "report" "report" Programs-5.1.1/neighbor.xml0000644000175000001560000004665411443701343014510 0ustar bneronsis neighbor neighbor Neighbor-Joining and UPGMA methods http://bioweb2.pasteur.fr/docs/phylip/doc/neighbor.html This program implements the Neighbor-Joining method of Saitou and Nei (1987) and the UPGMA method of clustering. NEIGHBOR constructs a tree by successive clustering of lineages, setting branch lengths as the lineages join. The tree is not rearranged thereafter. The tree does not assume an evolutionary clock, so that it is in effect an unrooted tree. phylogeny:distance neighbor String "neighbor <neighbor.params" "neighbor <neighbor.params" 0 distance_method Distance method Choice neighbor neighbor upgma ($value eq "upgma") ? "N\\n" : "" ( "" , "N\n")[ value == "upgma" ] 1 neighbor.params infile Distances matrix File PhylipDistanceMatrix AbstractText $infile ne "infile" infile != "infile" "ln -s $infile infile && " "ln -s "+str(infile)+" infile && " Distance matrix file cannot be named `outfile' $infile ne "outfile" infile != "outfile" -10 Give a file containing a distance matrix obtained by distance matrix programs like protdist or dnadist 5 Alpha 0.000000 0.330447 0.625670 1.032032 1.354086 Beta 0.330447 0.000000 0.375578 1.096290 0.677616 Gamma 0.625670 0.375578 0.000000 0.975798 0.861634 Delta 1.032032 1.096290 0.975798 0.000000 0.226703 Epsilon 1.354086 0.677616 0.861634 0.226703 0.000000 jumble_opt Randomize options jumble Randomize (jumble) input order (J) Boolean 0 ($value) ? "J\\n$jumble_seed\\n" : "" ( "" , "J\n" + str( jumble_seed ) + "\n" )[ value ] you can't use "Jumble options" and "Bootstrap options" at the same time not( $multiple and $jumble) not( multiple and jumble) 20 neighbor.params jumble_seed Random number seed for jumble (must be odd) Integer $jumble jumble "" "" Random number seed must be odd $value > 0 and ($value % 2) != 0 value > 0 and (value % 2) != 0 bootstrap Bootstrap options multiple Analyze multiple data sets (M) Boolean 0 ($value) ? "M\\n$multiple_number\\n$multiple_seed\\n" : "" ("", "M\n"+str(multiple_number)+"\n"+str(multiple_seed)+"\n")[ value ] you can't use "Jumble options" and "Bootstrap options" at the same time not( $multiple and $jumble) not( multiple and jumble) 10 neighbor.params multiple_number How many data sets Integer $multiple multiple "" "" There must be no more than 1000 datasets for this server $value <= 1000 value <= 1000 Bad data sets number: it must be greater than 1 $value > 1 value > 1 multiple_seed Random number seed for multiple dataset (must be odd) Integer $multiple multiple "" "" Random number seed must be odd $value < 0 and ($value % 2) != 0 value > 0 and (value % 2) != 0 neighbor.params consense Compute a consensus tree Boolean $multiple and $print_treefile multiple and print_treefile 0 ($value)? " && cp infile neighbor.infile && cp neighbor.outtree intree && consense <consense.params && mv outtree consense.outtree && mv outfile consense.outfile" : "" ("" , " && cp infile neighbor.infile && cp neighbor.outtree intree && consense <consense.params && mv outtree consense.outtree && mv outfile consense.outfile")[ value ] 10 consense_confirm String $consense consense "Y\\n" "Y\n" 1000 consense.params consense_terminal_type String $consense consense "T\\n" "T\n" -2 consense.params consense_outfile Consense output file Text $consense consense "consense.outfile" "consense.outfile" consense_treefile Consense output tree Tree NEWICK $consense consense "consense.outtree" "consense.outtree" output Output options print_tree Print out tree (3) Boolean 1 ($value) ? "" : "3\\n" ( "3\n" , "" )[ value ] 1 Tells the program to print a semi-graphical picture of the tree in the outfile. neighbor.params print_treefile Write out trees onto tree file (4) Boolean 1 ($value) ? "" : "4\\n" ( "4\n" , "" )[ value ] 1 Tells the program to save the tree in a treefile (a standard representation of trees where the tree is specified by a nested pairs of parentheses, enclosing names and separated by commas). neighbor.params printdata Print out the data at start of run (1) Boolean 0 ($value) ? "1\\n" : "" ( "" , "1\n" )[ value ] 1 neighbor.params other_options Other options outgroup Outgroup species (default, use as outgroup species 1) (O) Integer $distance_method eq "neighbor" distance_method == "neighbor" 1 (defined $value and $value != $vdef) ? "O\\n$value\\n" : "" ( "" , "O\n" +str( value )+ "\n" )[ value is not None and value != vdef] Please enter a value greater than 0 $value > 0 value > 0 1 The O (Outgroup) option specifies which species is to have the root of the tree be on the line leading to it. For example, if the outgroup is a species "Mouse" then the root of the tree will be placed in the middle of the branch which is connected to this species, with Mouse branching off on one side of the root and the lineage leading to the rest of the tree on the other. This option is toggle on by choosing the number of the outgroup (the species being taken in the numerical order that they occur in the input file). Outgroup-rooting will not be attempted if it is a user-defined tree, despite your invoking the option. When it is used, the tree as printed out is still listed as being an unrooted tree, though the outgroup is connected to the bottommost node so that it is easy to visually convert the tree into rooted form. neighbor.params triangular Matrix format Choice square square "" "" lower "L\\n" "L\n" upper "R\\n" "R\n" 1 neighbor.params outfile Neighbor output file Text " && mv outfile neighbor.outfile" " && mv outfile neighbor.outfile" "neighbor.outfile" "neighbor.outfile" treefile Neighbor output tree file Tree NEWICK $print_treefile print_treefile " && mv outtree neighbor.outtree" " && mv outtree neighbor.outtree" "neighbor.outtree" "neighbor.outtree" confirm String "Y\\n" "Y\n" 1000 neighbor.params terminal_type String "0\\n" "0\n" -1 neighbor.params Programs-5.1.1/density.xml0000644000175000001560000003224712072525233014364 0ustar bneronsis density EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net density Draw a nucleic acid density plot http://bioweb2.pasteur.fr/docs/EMBOSS/density.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:composition density e_input Input section e_seqall seqall option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -seqall=" + str(value))[value is not None] 1 e_additional Additional section e_window Window length (value greater than or equal to 1) Integer 100 ("", " -window=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 2 e_display Graph type Choice none D Q none ("", " -display=" + str(value))[value is not None and value!=vdef] 3 e_output Output section e_graph Choose the e_graph output format Choice e_display !="none" png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 4 xy_goutfile Name of the output graph Filename e_display !="none" density_xygraph ("" , " -goutfile=" + str(value))[value is not None] 5 xy_outgraph_png Graph file Picture Binary e_display !="none" and e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_display !="none" and e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_display !="none" and e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_display !="none" and e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_display !="none" and e_graph == "data" "*.dat" e_outfile Name of the report file Filename e_display =="none" density.report ("" , " -outfile=" + str(value))[value is not None] 6 e_rformat_outfile Choose the report output format Choice e_display =="none" TABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 7 e_outfile_out outfile_out option DensityReport Report e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 8 Programs-5.1.1/quicktree.xml0000644000175000001560000002117711652470410014700 0ustar bneronsis quicktree 1.1 QuickTree Rapid reconstruction of phylogenies by the Neighbor-Joining method Kevin Howe, Alex Bateman, Richard Durbin Kevin Howe, Alex Bateman and Richard Durbin (2002). QuickTree: building huge Neighbour-Joining trees of protein sequences. Bioinformatics 18(11):1546-1547. QuickTree is an efficient implementation of the Neighbor-Joining algorithm, capable of reconstructing phylogenies from huge alignments in time less than the age of the universe. http://www.sanger.ac.uk/Software/analysis/quicktree/ http://www.sanger.ac.uk/Software/analysis/quicktree/ phylogeny:distance quicktree inpufile Input file You must enter either a distance matrix or a alignment. aligfile Alignment file (-in a) Alignment STOCKHOLM not defined $distfile or (defined $aligfile and defined $distfile) distfile is None or (distfile and aligfile) (defined $value) ? " -in a $value" : "" ( "" , " -in a " + str(value) )[ value is not None ] You must enter either a distance matrix or a alignment. not $distfile distfile is None 10 distfile or Distance matrix (-in m) PhylipDistanceMatrix AbstractText not defined $aligfile aligfile is None (defined $value) ? " -in m $value" : "" ( "" , " -in m " + str(value) )[ value is not None ] You must enter either a distance matrix or a alignment. not defined $aligfile aligfile is None 10 Give a file containing a distance matrix obtained by distance matrix programs like prodist or dnadist, ... 5 Alpha 0.000000 0.330447 0.625670 1.032032 1.354086 Beta 0.330447 0.000000 0.375578 1.096290 0.677616 Gamma 0.625670 0.375578 0.000000 0.975798 0.861634 Delta 1.032032 1.096290 0.975798 0.000000 0.226703 Epsilon 1.354086 0.677616 0.861634 0.226703 0.000000 out Output (-out) Choice t m t (defined $value and $value ne $vdef) ? " -out $value" : "" ( "" , " -out " + str(value) )[ value is not None and value != vdef] 3 treeopt Tree output options $out ne "m" out != "m" upgma Use the UPGMA method to construct the tree (-upgma) Boolean 0 ($value) ? " -upgma" : "" ( "" , " -upgma" )[ value ] 3 Bootstrapping is not available for a matrix output aligopt Alignment options boot Calculate bootstrap values with n iterations (-boot) Integer not defined $distfile and $out ne "m" distfile is None and out != "m" (defined $value) ? " -boot $value" : "" ( "" , " -boot " + str(value))[ value is not None ] 3 Bootstrapping is not available for a matrix output kimura Use the kimura translation for pairwise distances (-kimura) Boolean defined $aligfile aligfile is not None 0 ($value) ? " -kimura" : "" ( "" , " -kimura" )[ value ] 3 treefile Tree file Tree NEWICK $out ne "m" out != "m" "quicktree.out" "quicktree.out" distoutfile Distance matrix PhylipDistanceMatrix AbstractText $out eq "m" out == "m" "quicktree.out" "quicktree.out" Programs-5.1.1/smile.xml0000644000175000001560000003137611767572177014043 0ustar bneronsis smile 1.47 SMILE Inference of structured signals in multiple sequences L. Marsan, J. Allali Marsan L, Sagot MF (2001). Algorithms for extracting structured motifs using a suffix-tree with application to promoter and regulatory site consensus identification. J. of Computational Biology, 7:345-360. smile is a program that was primarily made to extract promoter sequences from sequences. The interest of this program is to infer simultaneously several motifs (calledboxes) that respects distance constraints. The user has to select criterias. In a first step of extraction, all signals respecting these criterias are found. In a second step, they are all statistically evaluated, aiming to detect the ones that are exceptionally represented in the original sequences. http://www-igm.univ-mlv.fr/~marsan/smile_english.html http://www-igm.univ-mlv.fr/~marsan/smile_english.html sequence:nucleic:pattern sequence:protein:pattern smile String "smile smile.params" "smile smile.params" seq Sequences File Sequence FASTA "FASTA file\t\t\t$value\\n" "FASTA file\t\t\t" + str(value) + "\n" 1 This file must contain at least two sequences, as you cannot detect motifs which are common to several sequences in one sequence! smile.params >sequence1 atagtagtagcatcagcatcgatccggactcgtcgcgagcactgacacgatc aatcgatgatgcacgacgatcgactgatgctacgtcgacatcgctgctgtcc >sequence2 gatccggactcgtcagcatcagcagcgatagtagtagcatcagcatcgatcc ggactcgtcgcgagcactgacacgatcatgctacgtcgacatcgcatgctac >sequenceN gtcgacatcgccgagcactgacacgatcatgctacgtcgacatcgcatgctac aatcgatgatccggactcgtcgcggatgcacgacagcatcagcagatcgactg alphabet Alphabet Choice dna.alphabet dna.alphabet dnadeg.alphabet aa-barton.alphabet aa-smiths.alphabet "Alphabet file\t\t\t$value\\n" "Alphabet file\t\t\t" + str(value) + "\n" 2 smile.params how_many_N Maximum number of N in a motif for degenerated DNA or protein alphabet. Integer $alphabet ne "dna.alphabet" alphabet != "dna.alphabet" 0 "Composition in *\t\t$value\\n" "Composition in *\t\t" + str(value)+"\n" 3 smile.params how_many_R Maximum number of purine (R) in a motif for degenerated DNA Integer $alphabet eq "dnadeg.alphabet" alphabet == "dnadeg.alphabet" 0 "Composition in AG\t\t$value\\n" "Composition in AG\t\t" + str(value)+"\n" 3 smile.params how_many_Y Maximum number of pyrimidine (Y) in a motif for degenerated DNA Integer $alphabet eq "dnadeg.alphabet" alphabet == "dnadeg.alphabet" 0 "Composition in CT\t\t$value\\n" "Composition in CT\t\t" + str(value)+"\n" 3 smile.params quorum Minimum percentage of sequences containing a motif (quorum) Integer 50 "Quorum\t\t\t\t$value\\n" "Quorum\t\t\t\t" + str(value)+"\n" 10 The percentage of sequences where at least one occurrence of a motif must appear to make it valid. 100 means that a motif must have occurrences in every sequences. smile.params minlen Total min length Integer "Total min length\t\t$value\\n" "Total min length\t\t" + str(value)+"\n" 11 The minimal length of the whole motif, i.e. the sum of minimal lengths of each box. Warning: the length of the gaps between boxes mustn't me taken into account. The total minimal length may differ of the sum of boxs's minimal length: you can, for instance, infer motifs made of two boxes, with min length of boxes equals to 4 and a total min length equals to 10. smile.params maxlen Total max length Integer 0 (defined $value)? "Total max length\t\t$value\\n" : "" ( "" , "Total max length\t\t" + str(value)+"\n" )[ value is not None ] 12 Same explanation as "Total min length". Excepted that a 0 length means "infinity" smile.params subst Total substitutions Integer "Total substitutions\t\t$value\\n" "Total substitutions\t\t" + str(value)+"\n" Too many substitutions (number of substitutions must be smaller than min length) defined $subst and $subst <= $minlen subst is not None and subst <= minlen 13 Total maximum number of substitutions for the motif. smile.params boxes Integer "Boxes\t\t\t\t\1\\n" "Boxes\t\t\t\t1\n" 20 The number of boxes that compose the motifs to infer. Imposed 1 for xml. smile.params shuffling Number of shufflings Integer 100 (defined $value)? "Shufflings\t\t\t$value\\n":"" ( "" , "Shufflings\t\t\t"+ str(value)+"\n")[value is not None] 100 The number of shufflings of the original sequences to realize for the evaluation of the statistical significance of the motifs found. smile.params kmer Length of the words to conserve during shufflings Integer 2 (defined $value)? "Size k-mer\t\t\t$value\\n":"" ("" , "Size k-mer\t\t\t"+ str(value)+"\n")[value is not None] 100 Length of the words to conserve during shufflings (usually 2). smile.params result Result file Text "Output file\t\t\tsmile.result\\n" "Output file\t\t\tsmile.result\n" 1 smile.params "smile.result" "smile.result" shufflefiles Shuffle result file Text "smile.result.shuffle" "smile.result.shuffle" Programs-5.1.1/toppred.xml0000644000175000001560000003122611767572177014401 0ustar bneronsis toppred 0.01 TopPred Topology prediction of membrane proteins Heijne, Wallin, Claros, Deveaud, Schuerer von Heijne, G. (1992) Membrane Protein Structure Prediction: Hydrophobicity Analysis and the 'Positive Inside' Rule. J.Mol.Biol. 225, 487-494. Claros, M.G., and von Heijne, G. (1994) TopPred II: An Improved Software For Membrane Protein Structure Predictions. CABIOS 10, 685-686. Deveaud and Schuerer (Institut Pasteur) new implementation of the original toppred program, based on G. von Heijne algorithm. http://bioweb2.pasteur.fr/docs/toppred/toppred.pdf ftp://ftp.pasteur.fr/pub/gensoft/projects/toppred/ sequence:protein:2D_structure structure:2D_structure toppred Command String "toppred" "toppred" 0 query Sequence Protein Sequence FASTA 1,n " $value" " "+ str( value ) 10 graph_output Produce hydrophobicity graph image (-g) Boolean 1 topo_output Produce image of each topology (-t) Boolean 0 ($value) ? "" : " -t none" (" -t none","")[ value ] 7 control Control options scale Hydrophobicity scale (-H) Choice GES-scale KD-scale GVH-scale GES-scale (defined $value and $value ne $vdef) ? "-H $value" : "" ( "" , " -H " + str( value ) )[ value is not None and value != vdef ] 1 organism Organism: eukaryot (default is prokaryot) (-e) Boolean 0 ($value) ? " -e" : "" ( "" , " -e" )[ value ] 1 certain Certain cutoff (-c) Float 1.0 (defined $value and $value != $vdef) ? " -c $value" : "" ( "" , " -c " + str(value) ) [ value is not None and value!= vdef ] Certain cutoff must be greater than putative cutoff $certain > $putative certain > putative 2 putative Putative cutoff (-p) Float 0.6 (defined $value and $value != $vdef) ? "-p $value" : "" ( "" , " -p " + str(value) )[ value is not None and value != vdef ] Putative cutoff must be not be greater than certain cutoff $certain > $putative certain > putative 2 core Core window size: (-n) Integer 10 (defined $value and $value != $vdef) ? " -n $value" : "" ( "" , " -n " + str(value) )[ value is not None and value != vdef ] 2 triangle Wedge window size: (-q) Integer 5 (defined $value and $value != $vdef) ? " -q $value" : "" ( "" , " -q " + str(value) )[ value is not None and value != vdef ] 2 loop_length Critical loop length (-s) Integer 60 (defined $value and $value != $vdef) ? " -s $value" : "" ( "" , " -s " + str(value) )[ value is not None and value != vdef ] 2 Segment_distance Critical transmembrane spacer (-d) Integer 2 (defined $value and $value != $vdef) ? " -d $value" : "" ( "" , " -d " + str( value ) )[ value is not None and value!= vdef ] 2 output_options Output options outformat Output format (-O) Choice new new html old (defined $value and $value ne $vdef) ? " -O $value" : "" ( "" , " -O " + str(value) )[ value is not None and value != vdef ] 5 profile_format Hydrophobicity Profile file format (-g) Choice $graph_output graph_output png png ps ppm ($graph_output) ? " -g $value" : " -g none" ( " -g none" , " -g " + str(value) )[ graph_output ] 7 graphicfiles Graphic output files Picture Binary 0,n $graph_output graph_output *.$profile_format '*.' + profile_format hydrophobicity_files Hydrophobicity output files Text 1,n *.hydro* '*.hydro*' html_file Output file in html format ToppredHtmlReport Report outformat eq 'html' outformat == 'html' *.html '*.html' Programs-5.1.1/drawgram.xml0000644000175000001560000007200711724156742014517 0ustar bneronsis drawgram drawgram Plots a cladogram- or phenogram-like rooted tree http://bioweb2.pasteur.fr/docs/phylip/doc/drawgram.html DRAWGRAM interactively plots a cladogram- or phenogram-like rooted tree diagram, with many options including orientation of tree and branches, style of tree, label sizes and angles, tree depth, margin sizes, stem lengths, and placement of nodes in the tree. Particularly if you can use your computer to preview the plot, you can very effectively adjust the details of the plotting to get just the kind of plot you want. phylogeny:display display:tree drawgram String "drawgram <drawgram.params" "drawgram <drawgram.params" 0 treefile Tree File (intree) Tree NEWICK "ln -s $treefile intree && " "ln -s "+ str( treefile ) +" intree && " -10 Tree in Newick format. (A,(B,(H,(D,(J,(((G,E),(F,I)),C)))))); screen_type String "0\\n" "0\n" -1 drawgram.params options Drawgram options plotter Which plotter or printer will the tree be drawn on (P) Choice L L M J W K H D B E C O T P X F A Z V R (defined $value and $value ne $vdef) ? "P\\n$value\\n" : "" ("" , "P\n" + str(value) +"\n")[ value is not None and value != vdef ] 2 drawgram.params xbitmap_options Bitmap options $plotter =~ /^[XW]$/ plotter in [ "X" , "W" ] xres X resolution (in pixels) Integer 500 "$value\\n" str( value ) + "\n" X resolution cannot exceed 2500 pixels $value <= 2500 value <= 2500 3 drawgram.params yres Y resolution (in pixels) Integer 500 "$value\\n" str( value ) + "\n" Y resolution cannot exceed 2500 pixels $value <= 2500 value <= 2500 4 drawgram.params laserjet_options Laserjet options $plotter eq "J" plotter == "J" laserjet_resolution Laserjet resolution Choice 3 1 2 3 "$value\\n" str(value) +"\n" 3 drawgram.params pcx_options Paintbrush options $plotter eq "P" plotter == "P" pcx_resolution Paintbrush PCX resolution Choice 3 1 2 3 "$value\\n" str(value) + "\n" 3 drawgram.params ps_options PostScript options $plotter eq "L" plotter == "L" font Font (F) Choice Times-Roman Courier Helvetica Helvetica-Bold Helvetica-BoldOblique Helvetica-Oblique Hershey Times Times-Bold Times-BoldItalic Times-Italic Times-Roman (defined $value and $value ne $vdef) ? "F\\n$value\\n" : "" ("", "F\n"+str(value)+"\n")[value is not None and value != vdef] 5 drawgram.params pov_options POVRAY options $plotter eq "V" plotter == "V" pov_validate String "Y\\n" "Y\n" 2000 drawgram.params vrml_options VRML options $plotter eq "Z" plotter == "Z" vrml_validate String "Y\\n" "Y\n" 2000 drawgram.params ray_options Rayshade options $plotter eq "R" plotter == "R" ray_validate String "Y\\n" "Y\n" 2000 drawgram.params screen String "V\\nN\\n" "V\nN\n" 1 drawgram.params grows Tree grows... (H) Choice Horizontally Vertically Horizontally (defined $value and $value ne $vdef) ? "H\\n" : "" ( "" , "H\n" )[ value is not None and value != vdef ] 5 drawgram.params tree_style Tree style (S) Choice C P "S\\nP\\n" "S\nP\n" C "" "" S "S\\nS\\n" "S\nS\n" E "S\\nE\\n" "S\nE\n" V "S\\nV\\n" "S\nV\n" O "S\\nO\\n" "S\nO\n" 5 drawgram.params branch_lengths Use branch lengths (B) Boolean 1 ($value) ? "" : "B\\n" ("B\n" , "")[ value ] 5 drawgram.params horizontal_margins Horizontal margins (M) Float 1.65 (defined $value and $value != $vdef) ? "M\\n$value\\n$vertical_margins\\n" : "" ("" , "M\n" + str( value ) + "\n" + str( vertical_margins ) + "\n")[ value is not None and value != vdef ] 10 drawgram.params vertical_margins Vertical margins Float 2.16 "" "" 9 drawgram.params scale Scale of branch length (R) Float (defined $value) ? "R\\n$value\\n" : "" ("" , "R\n" +str( value ) +"\n")[ value is not None ] 5 Default value: Automatically rescaled drawgram.params depth Depth/Breadth of tree (D) Float 1.00 (defined $value and $value != $vdef) ? "D\\n$value\\n" : "" ( "" , "D\n" + str( value ) + "\n" )[ value is not None and value != vdef ] 5 drawgram.params stem Stem-length/tree-depth (T) Float 0.05 (defined $value and $value != $vdef) ? "T\\n$value\\n" : "" ("" , "T\n" + str( value ) + "\n")[ value is not None and value != vdef ] You should enter a value between 0.0 and 1.0. $value >= 0.0 and $value < 1.0 value >= 0.0 and value < 1.0 5 Enter the stem length as fraction of tree depth (a value between 0.0 and 1.0). drawgram.params character_height Character ht / tip space (C) Float 0.3333 (defined $value and $value != $vdef) ? "C\\n$value\\n" : "" ("" , "C\n" + str( value ) +"\n")[ value is not None and value != vdef ] 5 Enter character height as fraction of tip spacing. drawgram.params ancestral Ancestral nodes (A) Choice I I W C N V (defined $value and $value ne $vdef) ? "A\\n$value\\n" : "" ("" , "A\n" + str( value )+ "\n")[ value is not None and value != vdef ] 5 drawgram.params plotfile Graphic tree file Picture Binary $plotter !~ /^[LMWX]$/ plotter not in [ "L" , "M" , "W", "X" ] "plotfile" "plotfile" psfile Graphic tree file ( postscript format ) PostScript Binary $plotter eq "L" plotter == "L" " && ln -s plotfile plotfile.ps" " && ln -s plotfile plotfile.ps" 10 plotfile.ps 'plotfile.ps' pictfile Graphic tree file ( pict format ) Picture Binary $plotter eq "M" plotter == "M" " && ln -s plotfile plotfile.pict" " && ln -s plotfile plotfile.pict" 10 plotfile.pict 'plotfile.pict' xbmfile Graphic tree file ( xbm format ) Picture Binary $plotter eq "X" plotter == "X" " && ln -s plotfile plotfile.xbm" " && ln -s plotfile plotfile.xbm" 10 plotfile.xbm 'plotfile.xbm' bmpfile Graphic tree file ( bmp format ) Picture Binary $plotter eq "W" plotter == "W" " && ln -s plotfile plotfile.bmp" " && ln -s plotfile plotfile.bmp" 10 plotfile.bmp 'plotfile.bmp' confirm String "Y\\n" "Y\n" 1000 drawgram.params Programs-5.1.1/union.xml0000644000175000001560000001743412072525233014036 0ustar bneronsis union EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net union Concatenate multiple sequences into a single sequence http://bioweb2.pasteur.fr/docs/EMBOSS/union.html http://emboss.sourceforge.net/docs/themes sequence:edit union e_input Input section e_feature Use feature information Boolean 0 ("", " -feature")[ bool(value) ] 1 e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 2 e_advanced Advanced section e_source Create source features Boolean 0 ("", " -source")[ bool(value) ] 3 e_findoverlap Look for overlaps when joining Boolean 0 ("", " -findoverlap")[ bool(value) ] 4 e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename union.e_outseq ("" , " -outseq=" + str(value))[value is not None] 5 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 6 e_outseq_out outseq_out option Sequence e_outseq e_overlapfile Name of the output file (e_overlapfile) Filename outfile.overlaps ("" , " -overlapfile=" + str(value))[value is not None] 7 e_overlapfile_out overlapfile_out option SequenceOverlap AbstractText e_overlapfile auto Turn off any prompting String " -auto -stdout" 8 Programs-5.1.1/newcpgreport.xml0000644000175000001560000002021712072525233015416 0ustar bneronsis newcpgreport EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net newcpgreport Identify CpG islands in nucleotide sequence(s) http://bioweb2.pasteur.fr/docs/EMBOSS/newcpgreport.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:cpg_islands newcpgreport e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_window Window size (value greater than or equal to 1) Integer 100 ("", " -window=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 2 e_shift Shift increment (value greater than or equal to 1) Integer 1 ("", " -shift=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 3 e_minlen Minimum length (value greater than or equal to 1) Integer 200 ("", " -minlen=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 4 e_minoe Minimum observed/expected (value from 0. to 10.) Float 0.6 ("", " -minoe=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0. is required value >= 0. Value less than or equal to 10. is required value <= 10. 5 e_minpc Minimum percentage (value from 0. to 100.) Float 50. ("", " -minpc=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0. is required value >= 0. Value less than or equal to 100. is required value <= 100. 6 e_output Output section e_outfile Name of the report file Filename newcpgreport.e_outfile ("" , " -outfile=" + str(value))[value is not None] 7 e_outfile_out outfile_out option NewcpgreportReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 8 Programs-5.1.1/drawtree.xml0000644000175000001560000007222411724156742014531 0ustar bneronsis drawtree drawtree Plots an unrooted tree diagram http://bioweb2.pasteur.fr/docs/phylip/doc/drawtree.html DRAWTREE interactively plots an unrooted tree diagram, with many options including orientation of tree and branches, label sizes and angles, margin sizes. Particularly if you can use your computer screen to preview the plot, you can very effectively adjust the details of the plotting to get just the kind of plot you want. phylogeny:display display:tree drawtree String "drawtree <drawtree.params" "drawtree <drawtree.params" 0 treefile Tree File (intree) Tree NEWICK "ln -s $treefile intree && " "ln -s "+str( treefile ) + " intree && " -10 Tree in Newick format. (A,(B,(H,(D,(J,(((G,E),(F,I)),C)))))); screen_type String "0\\n" "0\n" -1 drawtree.params options Drawtree options plotter Which plotter or printer will the tree be drawn on Choice L L M J W K H D B E C O T P X F A Z V R (defined $value and $value ne $vdef) ? "P\\n$value\\n" : "" ("" , "P\n" + str(value) + "\n")[ value is not None and value != vdef ] 2 drawtree.params bitmap_options Bitmap options $plotter =~ /^[XW]$/ plotter in [ "X" , "W" ] xres X resolution (in pixels) Integer 500 "$value\\n" str( value ) + "\n" X resolution cannot exceed 2500 pixels $value <= 2500 value <= 2500 3 drawtree.params xyres Y resolution (in pixels) Integer 500 "$value\\n" str( value ) + "\n" Y resolution cannot exceed 2500 pixels $value <= 2500 value <= 2500 4 drawtree.params laserjet_options Laserjet options $plotter eq "J" plotter == "J" laserjet_resolution Laserjet resolution Choice 3 1 2 3 "$value\\n" str(value) + "\n" 3 drawtree.params pcx_options Paintbrush options $plotter eq "P" plotter == "P" pcx_resolution Paintbrush PCX resolution Choice 3 1 2 3 "$value\\n" str(value) + "\n" 3 drawtree.params pov_options POVRAY options $plotter eq "V" plotter == "V" pov_validate String "Y\\n" "Y\n" 2000 drawtree.params ps_options PostScript options $plotter eq "L" plotter == "L" font Font (F) Choice Times-Roman Courier Helvetica Helvetica-Bold Helvetica-BoldOblique Helvetica-Oblique Hershey Times Times-Bold Times-BoldItalic Times-Italic Times-Roman (defined $value and $value ne $vdef) ? "F\\n$value\\n" : "" ("", "F\n"+str(value)+"\n")[value is not None and value != vdef] 5 drawtree.params vrml_options VRML options $plotter eq "Z" plotter == "Z" vrml_validate String "Y\\n" "Y\n" 2000 drawtree.params ray_options Rayshade options $plotter eq "R" plotter == "R" ray_validate String "Y\\n" "Y\n" 2000 drawtree.params preview String "V\\nN\\n" "V\nN\n" 1 drawtree.params branch_lengths Use branch lengths (B) Boolean 1 ($value) ? "" : "B\\n" ("B\n" , "")[ value ] 5 drawtree.params angle Angle of labels (L) Choice M A "L\\nA\\n" "L\nA\n" R "L\\nR\\n" "L\nR\n" F "" "" M "" "" 5 drawtree.params fixed_angle Fixed angle: Are the labels to be plotted vertically (90), horizontally (0), or downwards (-90) (L)? Float $angle eq "" or $angle eq "F" angle == "" or angle == "F" 0.0 (defined $value and $value != $vdef) ? "L\\nF\\n$value\\n" : "" ( "" , "L\nF\n"+ str( value ) +"\n")[ value is not None and value != vdef ] The value must be comprised between -90.0 and 90.0 $value >= -90.0 and $value <= 90.0 value >= -90.0 and value <= 90.0 7 drawtree.params rotation Rotation of tree (in degrees from 360 to -360) (R) Float 0.0 (defined $value and $value != $vdef) ? "R\\n$value\\n" : "" ("" , "R\n" +str( value ) +"\n" )[ value is not None and value != vdef] -360 360 5 drawtree.params arc Angle of arc for tree (in degrees from 0 to 360) (A) Float 0.0 (defined $value and $value != $vdef) ? "A\\n$value\\n" : "" ( "" , "A\n" + str( value ) +"\n" )[ value is not None and value != vdef] 0 360 5 drawtree.params iterate Iterate to improve tree (I) Choice E E "" "" B "I\\n" "I\n" N "I\\nI\\n" "I\nI\n" 5 drawtree.params scale Scale of branch length (S) Float (defined $value) ? "S\\n$value\\n" : "" ( "" , "S\n"+ str( value )+ "\n")[ value is not None ] 5 Default value: Automatically rescaled drawtree.params label_overlap Try to avoid label overlap (D) Boolean $iterate ne "N" iterate != "N" 0 ($value) ? "D\\n" : "" ( "" , "D\n" )[ value ] 5 drawtree.params horizontal_margins Horizontal margins (M) Float 1.73 (defined $value and $value != $vdef) ? "M\\n$value\\n$vertical_margins\\n" : "" ("" , "M\n" + str( value ) + "\n" + str(vertical_margins ) + "\n")[ value is not None and value != vdef ] 10 drawtree.params vertical_margins Vertical margins (M) Float 2.24 "" "" 9 drawtree.params character_height Relative character height (C) Float 0.3333 (defined $value and $value != $vdef) ? "C\\n$value\\n" : "" ("", "C\n" + str( value ) + "\n" )[ value is not None and value != vdef ] 5 drawtree.params plotfile Graphic tree file Picture Binary $plotter !~ /^[LMWX]$/ plotter not in [ "L" , "M" , "W", "X" ] "plotfile" "plotfile" psfile Graphic tree file ( postscript format ) PostScript Binary $plotter eq "L" plotter == "L" " && ln -s plotfile plotfile.ps" " && ln -s plotfile plotfile.ps" 10 "plotfile.ps" "plotfile.ps" pictfile Graphic tree file ( pict format ) Picture Binary $plotter eq "M" plotter == "M" " && ln -s plotfile plotfile.pict" " && ln -s plotfile plotfile.pict" 10 "plotfile.pict" "plotfile.pict" xbmfile Graphic tree file ( xbm format ) Picture Binary $plotter eq "X" plotter == "X" " && ln -s plotfile plotfile.xbm" " && ln -s plotfile plotfile.xbm" 10 "plotfile.xbm" "plotfile.xbm" bmpfile Graphic tree file ( bmp format ) Picture Binary $plotter eq "W" plotter == "W" " && ln -s plotfile plotfile.bmp" " && ln -s plotfile plotfile.bmp" 10 "plotfile.bmp" "plotfile.bmp" confirm String "Y\\n" "Y\n" 1000 drawtree.params Programs-5.1.1/marscan.xml0000644000175000001560000001626112072525233014327 0ustar bneronsis marscan EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net marscan Finds matrix/scaffold recognition (MRS) signatures in DNA sequences http://bioweb2.pasteur.fr/docs/EMBOSS/marscan.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:gene_finding sequence:nucleic:motifs marscan e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_output Output section e_outfile Name of the report file Filename marscan.report ("" , " -outfile=" + str(value))[value is not None] 2 File for output of MAR/SAR recognition signature (MRS) regions. This contains details of the MRS in normal GFF format. The MRS consists of two recognition sites, one of 8 bp and one of 16 bp on either sense strand of the genomic DNA, within 200 bp of each other. e_rformat_outfile Choose the report output format Choice GFF DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 3 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 4 Programs-5.1.1/wordfinder.xml0000644000175000001560000006053712072525233015053 0ustar bneronsis wordfinder EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net wordfinder Match large sequences against one or more other sequences http://bioweb2.pasteur.fr/docs/EMBOSS/wordfinder.html http://emboss.sourceforge.net/docs/themes alignment:pairwise:local wordfinder e_input Input section e_asequence asequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 2,n ("", " -asequence=" + str(value))[value is not None] 1 e_bsequence bsequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -bsequence=" + str(value))[value is not None] 2 e_datafile Matrix file Choice mobyle_null mobyle_null EBLOSUM30 EBLOSUM35 EBLOSUM40 EBLOSUM45 EBLOSUM50 EBLOSUM55 EBLOSUM60 EBLOSUM62 EBLOSUM62-12 EBLOSUM65 EBLOSUM70 EBLOSUM75 EBLOSUM80 EBLOSUM85 EBLOSUM90 EBLOSUMN EDNAFULL EDNAMAT EDNASIMPLE EPAM10 EPAM100 EPAM110 EPAM120 EPAM130 EPAM140 EPAM150 EPAM160 EPAM170 EPAM180 EPAM190 EPAM20 EPAM200 EPAM210 EPAM220 EPAM230 EPAM240 EPAM250 EPAM260 EPAM270 EPAM280 EPAM290 EPAM30 EPAM300 EPAM310 EPAM320 EPAM330 EPAM340 EPAM350 EPAM360 EPAM370 EPAM380 EPAM390 EPAM40 EPAM400 EPAM410 EPAM420 EPAM430 EPAM440 EPAM450 EPAM460 EPAM470 EPAM480 EPAM490 EPAM50 EPAM500 EPAM60 EPAM70 EPAM80 EPAM90 SSSUB ("", " -datafile=" + str(value))[value is not None and value!=vdef] 3 This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. e_required Required section e_gapopen Gap opening penalty (value from 0.0 to 1000.0) Float ("", " -gapopen=" + str(value))[value is not None] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 1000.0 is required value <= 1000.0 4 10.0 for any sequence type e_gapextend Gap extension penalty (value from 0.0 to 10.0) Float ("", " -gapextend=" + str(value))[value is not None] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 10.0 is required value <= 10.0 5 0.5 for any sequence type e_additional Additional section e_width Alignment width (value greater than or equal to 1) Integer 16 ("", " -width=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 6 e_wordlen Word length for initial matching (value greater than or equal to 3) Integer 6 ("", " -wordlen=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 3 is required value >= 3 7 e_limitmatch Maximum match score (zero for no limit) (value greater than or equal to 0) Integer 0 ("", " -limitmatch=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 8 e_limitalign Maximum alignment length (zero for no limit) (value greater than or equal to 0) Integer 0 ("", " -limitalign=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 9 e_lowmatch Minimum match score (zero for no limit) (value greater than or equal to 0) Integer 0 ("", " -lowmatch=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 10 e_lowalign Minimum alignment length (zero for no limit) (value greater than or equal to 0) Integer 0 ("", " -lowalign=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 11 e_output Output section e_outfile Name of the output alignment file Filename wordfinder.align ("" , " -outfile=" + str(value))[value is not None] 12 e_aformat_outfile Choose the alignment output format Choice SIMPLE FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH ("", " -aformat=" + str(value))[value is not None and value!=vdef] 13 e_outfile_out outfile_out option Alignment e_aformat_outfile in ['FASTA', 'MSF'] e_outfile e_outfile_out2 outfile_out2 option Text e_aformat_outfile in ['PAIR', 'MARKX0', 'MARKX1', 'MARKX2', 'MARKX3', 'MARKX10', 'SRS', 'SRSPAIR', 'SCORE', 'UNKNOWN', 'MULTIPLE', 'SIMPLE', 'MATCH'] e_outfile e_errorfile errorfile option Filename wordfinder.e_errorfile ("" , " -errorfile=" + str(value))[value is not None] 14 Error file to be written to e_errorfile_out errorfile_out option WordfinderError AbstractText e_errorfile auto Turn off any prompting String " -auto -stdout" 15 Programs-5.1.1/needleall.xml0000644000175000001560000006202212072525233014624 0ustar bneronsis needleall EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net needleall Many-to-many pairwise alignments of two sequence sets http://bioweb2.pasteur.fr/docs/EMBOSS/needleall.html http://emboss.sourceforge.net/docs/themes alignment:pairwise:global needleall e_input Input section e_asequence asequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 2,n ("", " -asequence=" + str(value))[value is not None] 1 e_bsequence bsequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -bsequence=" + str(value))[value is not None] 2 e_datafile Matrix file Choice mobyle_null mobyle_null EBLOSUM30 EBLOSUM35 EBLOSUM40 EBLOSUM45 EBLOSUM50 EBLOSUM55 EBLOSUM60 EBLOSUM62 EBLOSUM62-12 EBLOSUM65 EBLOSUM70 EBLOSUM75 EBLOSUM80 EBLOSUM85 EBLOSUM90 EBLOSUMN EDNAFULL EDNAMAT EDNASIMPLE EPAM10 EPAM100 EPAM110 EPAM120 EPAM130 EPAM140 EPAM150 EPAM160 EPAM170 EPAM180 EPAM190 EPAM20 EPAM200 EPAM210 EPAM220 EPAM230 EPAM240 EPAM250 EPAM260 EPAM270 EPAM280 EPAM290 EPAM30 EPAM300 EPAM310 EPAM320 EPAM330 EPAM340 EPAM350 EPAM360 EPAM370 EPAM380 EPAM390 EPAM40 EPAM400 EPAM410 EPAM420 EPAM430 EPAM440 EPAM450 EPAM460 EPAM470 EPAM480 EPAM490 EPAM50 EPAM500 EPAM60 EPAM70 EPAM80 EPAM90 SSSUB ("", " -datafile=" + str(value))[value is not None and value!=vdef] 3 This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. e_required Required section e_gapopen Gap opening penalty (Floating point number from 1.0 to 100.0) Float ("", " -gapopen=" + str(value))[value is not None] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 100.0 is required value <= 100.0 4 The gap open penalty is the score taken away when a gap is created. The best value depends on the choice of comparison matrix. The default value assumes you are using the EBLOSUM62 matrix for protein sequences, and the EDNAFULL matrix for nucleotide sequences. e_gapextend Gap extension penalty (Floating point number from 0.0 to 10.0) Float ("", " -gapextend=" + str(value))[value is not None] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 10.0 is required value <= 10.0 5 The gap extension, penalty is added to the standard gap penalty for each base or residue in the gap. This is how long gaps are penalized. Usually you will expect a few long gaps rather than many short gaps, so the gap extension penalty should be lower than the gap penalty. An exception is where one or both sequences are single reads with possible sequencing errors in which case you would expect many single base gaps. You can get this result by setting the gap open penalty to zero (or very low) and using the gap extension penalty to control gap scoring. e_additional Additional section e_endweight Apply end gap penalties. Boolean 0 ("", " -endweight")[ bool(value) ] 6 e_endopen End gap opening penalty (Floating point number from 1.0 to 100.0) Float ("", " -endopen=" + str(value))[value is not None] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 100.0 is required value <= 100.0 7 The end gap open penalty is the score taken away when an end gap is created. The best value depends on the choice of comparison matrix. The default value assumes you are using the EBLOSUM62 matrix for protein sequences, and the EDNAFULL matrix for nucleotide sequences. e_endextend End gap extension penalty (Floating point number from 0.0 to 10.0) Float ("", " -endextend=" + str(value))[value is not None] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 10.0 is required value <= 10.0 8 The end gap extension, penalty is added to the end gap penalty for each base or residue in the end gap. e_minscore Minimum alignment score (Floating point number from -10.0 to 100.0) Float 0 ("", " -minscore=" + str(value))[value is not None and value!=vdef] Value greater than or equal to -10.0 is required value >= -10.0 Value less than or equal to 100.0 is required value <= 100.0 9 Minimum alignment score to report an alignment. e_output Output section e_brief Brief identity and similarity Boolean 1 (" -nobrief", "")[ bool(value) ] 10 Brief identity and similarity e_outfile Name of the output alignment file Filename needleall.align ("" , " -outfile=" + str(value))[value is not None] 11 e_aformat_outfile Choose the alignment output format Choice SCORE FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH ("", " -aformat=" + str(value))[value is not None and value!=vdef] 11 e_outfile_out outfile_out option Alignment e_aformat_outfile in ['FASTA', 'MSF'] e_outfile e_outfile_out2 outfile_out2 option Text e_aformat_outfile in ['PAIR', 'MARKX0', 'MARKX1', 'MARKX2', 'MARKX3', 'MARKX10', 'SRS', 'SRSPAIR', 'SCORE', 'UNKNOWN', 'MULTIPLE', 'SIMPLE', 'MATCH'] e_outfile e_errorfile errorfile option Filename needleall.e_errorfile ("" , " -errorfile=" + str(value))[value is not None] 12 Error file to be written to e_errorfile_out errorfile_out option NeedleallError Report e_errorfile auto Turn off any prompting String " -auto -stdout" 13 Programs-5.1.1/dssp.xml0000644000175000001560000001407211672710655013663 0ustar bneronsis dssp DSSP Definition of secondary structure of proteins given a set of 3D coordinates W.Kabsch, C. Sander Kabsch,W. and Sander,C. (1983) Biopolymers 22, 2577-2637. http://swift.cmbi.ru.nl/gv/dssp/ ftp://ftp.cmbi.ru.nl/pub/molbio/software/ sequence:protein:2D_structure structure:2D_structure dssp pdbfile PDB File AbstractText _3DStructure PDB (defined $value) ? " $value" : " -- " ( " -- " , " " + str(value) )[ value is not None ] You must enter either the PDB data or the PDB id not defined $pdbid and defined $pdbfile pdbfile is not None and pdbid is None 10 pdbid or you can instead enter a PDB id. String (defined $value) ? "cat pdb$value.ent | " : "" ( "" , "cat pdb" + str( value ).lower() + ".ent | " )[ value is not None ] You must enter either the PDB data or the PDB id defined $pdbid and not defined $pdbfile pdbid is not None and pdbfile is None -1 output Output parameters surface Disables the calculation of accessible surface (-na) Boolean 0 ($value) ? " -na " : "" ( "" , " -na " )[ value ] 1 classic Classic (pre-July 1995) format (-c) Boolean 0 ($value) ? " -c " : "" ( "" , " -c " )[ value ] 1 disulfide Adds information about disulfide bonds to output file (-ssa) Boolean 0 ($value) ? " -ssa " : "" ( "" , " -ssa " )[ value ] 1 sidechains2X Renames residues with incomplete sidechains to 'X' (-x) Boolean 0 ($value) ? " -x " : "" ( "" , " -x " )[ value ] 1 altLoc Keeps an additional AltLoc indicator at the line ends (-alt2) Boolean 0 ($value) ? " -alt2 " : "" ( "" , " -alt2 " )[ value ] 1 outfile Standard output DsspReport Report "dssp.out" "dssp.out" Programs-5.1.1/pasteseq.xml0000644000175000001560000001625412072525233014532 0ustar bneronsis pasteseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net pasteseq Insert one sequence into another http://bioweb2.pasteur.fr/docs/EMBOSS/pasteseq.html http://emboss.sourceforge.net/docs/themes sequence:edit pasteseq e_input Input section e_asequence asequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -asequence=" + str(value))[value is not None] 1 e_bsequence Sequence to insert Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -bsequence=" + str(value))[value is not None] 2 e_required Required section e_pos Position to insert after (value greater than or equal to 0) Integer ("", " -pos=" + str(value))[value is not None] Value greater than or equal to 0 is required value >= 0 3 The position in the main input sequence to insert after. To insert before the start use the position 0. e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename pasteseq.e_outseq ("" , " -outseq=" + str(value))[value is not None] 4 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 5 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 6 Programs-5.1.1/codonw.xml0000644000175000001560000005660511767572177014225 0ustar bneronsis codonw 1.4.4 codonw Correspondence Analysis of Codon Usage J. Peden http://codonw.sourceforge.net/Readme.html http://codonw.sourceforge.net/Tutorial.html http://codonw.sourceforge.net/ http://sourceforge.net/projects/codonw/files/ sequence:nucleic:codon_usage codonw String "codonw -silent -nomenu -nowarn" "codonw -silent -nomenu -nowarn" 0 outfiles String " $seqfile.indices $seqfile.bulk" " " + str(seqfile) + ".indices " + str(seqfile) + ".bulk" 2000 results_files Results files Text "*.indices" "*.bulk" "*.indices" "*.bulk" seqfile Sequences File Sequence FASTA " $value" " " + str(value) 1000 defaults Defaults settings gc Genetic codes (-code) Choice 0 0 1 2 3 4 5 6 7 (defined $value and $value ne $vdef) ? " -code $value" : "" ( "" , " -code " + str(value) )[value is not None and value != vdef] 2 fop_values Fop/CBI codons (-f_type) Choice 0 0 1 2 3 4 5 6 7 (defined $value and $value ne $vdef) ? " -f_type $value" : "" ( "" , " -f_type " + str(value) )[ value is not None and value != vdef] 2 cai_values CAI fitness values (-c_type) Choice 0 0 1 2 (defined $value and $value ne $vdef) ? " -c_type $value" : "" ( "" , " -c_type " + str(value) )[value is not None and value != vdef] 2 output_type Output Computer readable (-machine) Boolean 0 ($value) ? " -machine" : "" ( "" , " -machine" )[ value ] 2 genes Concatenate all genes instead of individual genes (-totals) Boolean 0 ($value) ? " -totals" : "" ( "" , " -totals" )[ value ] 3 CU_options Codon usage indices 4 all_indices All indices (-all_indices) Boolean 0 ($value) ? " -all_indices" : "" ( "" , " -all_indices" )[ value ] sp_indices Special indices not $all_indices not all_indices CAI Codon Adaptation Index (-cai) Boolean 0 ($value) ? " -cai" : "" ( "" , " -cai" )[ value ] cai_file User input file of CAI values (-cai_file) CaiValues AbstractText $CAI CAI (defined $value) ? " -cai_file $value" : "" ( "" , " -cai_file " + str(value) )[value is not None] 2 Fop Frequency of OPtimal codons index (-fop) Boolean 0 ($value) ? " -fop" : "" ( "" , " -fop" )[ value ] fop_file User input file of Fop values (-fop_file) FopValues AbstractText $Fop Fop (defined $value) ? " -fop_file $value" : "" ( "" , " -fop_file " + str(value) )[ value is not None ] 2 CBI Codon bias index (-cbi) Boolean 0 ($value) ? " -cbi" : "" ( "" , " -cbi" )[ value ] cbi_file User input file of CBI values (-cbi_file) CbiValues AbstractText $CBI CBI (defined $value) ? " -cbi_file $value" : "" ( "" , " -cbi_file " + str(value) )[ value is not None ] 2 ENc Effective Number of Codons (-enc) Boolean 0 ($value) ? " -enc" : "" ( "" , " -enc" )[ value ] GC GC content of gene (-gc) Boolean 0 ($value) ? " -gc" : "" ( "" , " -gc" )[ value ] GC3s GC of silent 3rd codon posit. (-gc3s) Boolean 0 ($value) ? " -gc3s" : "" ( "" , " -gc3s" )[ value ] silent_bc Base composition at synonymous third codon positions (-sil_base) Boolean 0 ($value) ? " -sil_base" : "" ( "" , " -sil_base" )[ value ] L_sym Number of synonymous codons (-L_sym) Boolean 0 ( $value) ? " -L_sym" : "" ( "" , " -L_sym" )[ value ] L_aa Total Number of synonymous and non-synonymous codons (-L_aa) Boolean 0 ($value) ? " -L_aa" : "" ( "" , " -L_aa" )[ value ] AA_options Amino acid indices Hydro Hydrophobicity of protein (-hyd) Boolean 0 ($value) ? " -hyd" : "" ( "" , " -hyd" )[ value ] 4 Aromo Aromaticity of protein (-aro) Boolean 0 ($value) ? " -aro" : "" ( "" , " -aro" )[ value ] 4 bulk_output_option Bulk output option Choice -cu -cu -aau -raau -cutab -cutot -rscu -fasta -reader -transl -base -dinuc -noblk (defined $value and $value ne $vdef) ? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] 4 COA_option Correspondence analysis options (available for several sequences) Choice Null Null -coa_cu -coa_rscu -coa_aa (defined $value) ? " $value" : "" ( "" , " %s" %value)[value is not None] 3 coa_advanced Advanced COA options $COA_option COA_option coa_expert Generate detailed (expert) statistics on COA (-coa_expert) Boolean 0 ($value) ? " -coa_expert" : "" ( "" , " -coa_expert" )[ value ] 3 coa_axes Select number of axis to record (-coa_axes) Integer 4 (defined $value and $value != $vdef) ? " -coa_axes $value" : "" ( "" , " -coa_axes " + str(value) )[value is not None and value!=vdef] 3 coa_num Select number of genes to use to identify optimal codons (-coa_num) Integer (defined $value) ? " -coa_num $value" : "" ( "" , " -coa_num " + str(value) )[ value is not None ] 3 Values can be whole numbers or a percentage (5 or 10%). coa_files Coa file Text "*.coa" "coa_raw" "*.coa" "coa_raw" Programs-5.1.1/trimseq.xml0000644000175000001560000002325612072525233014371 0ustar bneronsis trimseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net trimseq Remove unwanted characters from start and end of sequence(s) http://bioweb2.pasteur.fr/docs/EMBOSS/trimseq.html http://emboss.sourceforge.net/docs/themes sequence:edit trimseq e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_additional Additional section e_window Window size (value greater than or equal to 1) Integer 1 ("", " -window=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 2 This determines the size of the region that is considered when deciding whether the percentage of ambiguity is greater than the threshold. A value of 5 means that a region of 5 letters in the sequence is shifted along the sequence from the ends and trimming is done only if there is a greater or equal percentage of ambiguity than the threshold percentage. e_percent Percent threshold of ambiguity in window Float 100.0 ("", " -percent=" + str(value))[value is not None and value!=vdef] 3 This is the threshold of the percentage ambiguity in the window required in order to trim a sequence. e_strict Trim off all ambiguity codes, not just n or x Boolean 0 ("", " -strict")[ bool(value) ] 4 In nucleic sequences, trim off not only N's and X's, but also the nucleotide IUPAC ambiguity codes M, R, W, S, Y, K, V, H, D and B. In protein sequences, trim off not only X's but also B and Z. e_star Trim off asterisks Boolean 0 ("", " -star")[ bool(value) ] 5 In protein sequences, trim off not only X's, but also the *'s e_advanced Advanced section e_left Trim at the start Boolean 1 (" -noleft", "")[ bool(value) ] 6 e_right Trim at the end Boolean 1 (" -noright", "")[ bool(value) ] 7 e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename trimseq.e_outseq ("" , " -outseq=" + str(value))[value is not None] 8 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 9 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 10 Programs-5.1.1/charge.xml0000644000175000001560000002403412072525233014131 0ustar bneronsis charge EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net charge Draw a protein charge plot http://bioweb2.pasteur.fr/docs/EMBOSS/charge.html http://emboss.sourceforge.net/docs/themes sequence:protein:composition charge e_input Input section e_seqall seqall option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -seqall=" + str(value))[value is not None] 1 e_aadata Amino acids properties and molecular weight data file Protein AminoAcidProperties AbstractText ("", " -aadata=" + str(value))[value is not None ] 2 e_additional Additional section e_window Window length (value greater than or equal to 1) Integer 5 ("", " -window=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 3 e_output Output section e_plot Produce graphic Boolean 0 ("", " -plot")[ bool(value) ] 4 e_graph Choose the e_graph output format Choice e_plot png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 5 xy_goutfile Name of the output graph Filename e_plot charge_xygraph ("" , " -goutfile=" + str(value))[value is not None] 6 xy_outgraph_png Graph file Picture Binary e_plot and e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_plot and e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_plot and e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_plot and e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_plot and e_graph == "data" "*.dat" e_outfile Name of the output file (e_outfile) Filename not e_plot charge.e_outfile ("" , " -outfile=" + str(value))[value is not None] 7 e_outfile_out outfile_out option ChargeReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 8 Programs-5.1.1/html4blast.xml0000644000175000001560000001317111441651470014760 0ustar bneronsis html4blast 1.6 html4blast HTML blast results formatter Nicolas Joly ftp://ftp.pasteur.fr/pub/gensoft/projects/html4blast/ database:search:display html4blast String "html4blast" "html4blast" 1 input Blast input file BlastTextReport Report " $value" " %s" % str(value) 10 links Database links Choice golden none " -n" " -n" golden "" "" srs " -s" " -s" extern " -e" " -e" 2 graph Graphical alignment summary (-g) Boolean 1 ($value) ? " -g" : "" ("" , " -g")[ value ] 3 hspline Draw one HSP per graphic line (-l) Boolean $graph graph 0 ($value) ? " -l" : "" ("" , " -l")[ value ] 4 queryimagename Generate query based images names (-q) Boolean $graph graph 0 ($value) ? " -q" : "" ("" , " -q")[ value ] 5 outfile Outfile name (-o) Filename blast.html (defined $value and $value ne $vdef) ? " -o $value" : "" ("", " -o " + str(value))[value is not None and value != vdef] 6 output Output file BlastHtmlReport Report $outfile outfile image Picture Binary $graph graph "*.png" "*.gif" "*.png" "*.gif" Programs-5.1.1/saxs_merge.xml0000644000175000001560000013066212133263665015050 0ustar bneronsis saxs_merge r17125 SAXS Merge Statistical merge of SAXS curves Yannick G. Spill, Seung Joong Kim, Dina Schneidman-Duhovny, Daniel Russel, Ben Webb, Andrej Sali and Michael Nilges protein:data:processing saxs_merge.py verbosity String " -v -v -v" " -v -v -v " 1 required Required parameters stop stop after the given step Choice merging cleanup fitting rescaling classification merging (defined $value and $value ne $vdef)? " --stop=$value " : "" ( "" , " --stop="+value )[value is not None and value != vdef ] 200 postpone_cleanup Cleanup step comes after rescaling step (default is False) Boolean 0 ( $value)? " --postpone_cleanup ": "" ( "" , " --postpone_cleanup ")[ value ] nb_expe Number of times each profile has been recorded Integer 10 the number of recordings must be > 0 $value > 0 value > 0 11 SAXS SAXS profile MultipleIntensity MultipleAbstractText (defined value)? "$value=$nb_expe_1" : "" " ".join([ "%s=%d"%(v, nb_expe) for v in value.split(" ")]) 10 advanced Advanced parameters advanced_general General general_header First line of output files is a header Boolean 0 ($value)? " --header "":"" ( "" , " --header ")[ value ] allfiles Output data files for parsed input files as well. Boolean 0 ($value)? " --allfiles "":"" ( "" , " --allfiles ")[ value ] outlevel Set the output level Choice normal sparse normal full (defined $value and $value neq $vdef) " --outlevel=$value"":"" ( "" , " --outlevel="+value)[ value is not None and value != vdef]
  • 'sparse' is for q,I,err columns
  • 'normal' adds eorigin, eoriname and eextrapol (default)
  • 'full' outputs all flag
advanced_cleanup Cleanup (Step 1) aalpha type I error (default 1e-4) Float 1e-4 must be >= 0 and <=1 $value>= 0 && $value <=1 value>= 0 and value <=1 (defined $value and value != $vdef )" --aalpha=$value":"" ( "" , " --aalpha="+str(value))[ value is not None and value != vdef] Discard or keep SAXS curves' points based on their SNR. Points with an error of zero are discarded as well advanced_fitting Fitting (Step 2) bmean bmean Choice Full Flat Simple Generalized Full (defined $value and value neq $vdef)? " --bmean=$value ": "" ("" , " --bmean="+str(value))[value is not None and value != vdef]
Defines the most complex mean function that will be tried during model comparison.
  • One of Flat (the offset parameter A is optimized)
  • Simple (optimizes A, G and Rg)
  • Generalized (optimizes G, Rg and d)
  • Full (optimizes G, Rg, d and s) If "Don't perform model comparison" is set, will try to fit only with this model.
bnocomp perform model comparison Boolean 1 (defined $value)? "": " --bnocomp " ( " --bnocomp " ,"" )[ value is not None and value != vdef ] berror berror: Compute error bars on all parameters even in case where model comparison was disabled. Boolean 1 (defined $value)? " --berror " : "" ( "" , " --berror ")[ value ] blimit_fitting Integer 80 " --blimit_fitting=$value " " --blimit_fitting=%d "%(value) blimit_hessian set the maximum number of points used in the Hessian calculation. Integer 80 " --blimit_hessian=$value " " --blimit_hessian=%d "%(value)
To save resources, set the maximum number of points used in the Hessian calculation (model comparison, options baverage, and berror ). Dataset will be subsampled if it is bigger than NUM. If NUM=-1 (default), all points will be used.
advanced_rescaling Rescaling (Step 3) cmodel Which rescaling model to use to calculate gamma. Choice normal normal normal-offset lognormal (defined $value and $value ne $vdef)? " --cmodel=$value ": "" ( "" , " --cmodel="+str(value))[value is not None and value != vdef]
  • 'normal-offset' for a normal model with offset,
  • 'normal' (default) for a normal model with zero offset,
  • and 'lognormal' for a lognormal model.
Find the most probable scaling factor of all curves wrt the first curve.
advanced_classification Classification (Step 4) dalpha type I error (default 0.05) Float 0.05 must be >= 0 and <=1 $value>= 0 && $value <=1 value>= 0 and value <=1 (defined $value and $value != $vdef) " --dalpha=$value " : "" ( "" , " --dalpha="+str(value) )[ value is not None and value != vdef] Classify the mean curves by comparing them using a two-sided two-sample student t test advanced_merging Merging (Step 5) emean Which most complex mean function to try for model comparison. Choice Full Flat Simple Generalized Full (defined $value and value neq $vdef)? " --emean=$value ": "" ("" , " --emean="+str(value))[value is not None and value != vdef]
Which mean parameters are optimized.
  • One of Flat (the offset parameter A is optimized)
  • Simple (optimizes A, G and Rg)
  • Generalized (default, optimizes G, Rg and d)
  • Full (optimizes G, Rg, d and s)
enocomp perform model comparison Boolean 1 (defined $value)? "": " --enocomp " ( " --enocomp " , "")[ value is not None and value != vdef ] Perform model comparison, which allows to choose a mean function that does not overfit the data. If ecomp is given, eoptimize is taken to be the most complex model. (Default: don't perform it.) eerror Compute error bars on all parameters even in case where model comparison was disabled. Boolean 1 (defined $value)? " --eerror " : "" ( "" , " --eerror ")[ value ] elimit_fitting Integer 80 " --elimit_fitting=$value " " --elimit_fitting=%d "%(value) elimit_hessian Set the maximum number of points used in the Hessian calculation Integer 80 " --elimit_hessian=$value " " --elimit_hessian=%d "% value enoextrapolate Don't extrapolate at all, even at low angle. Boolean 0 (value?) " --enoextrapolate ": "" ( "" , " --enoextrapolate ")[value]
Find the most probable scaling factor of all curves wrt the first
expert Expert parameters advanced_general General use_npoints Skip next option and use q values of the first datafile instead. Boolean 0 npoints Number of points to output for the mean function. Integer 200 ($use_npoints and $value > 0)? " --npoints=$value ": " --npoints=-1 " ( " --npoints=-1 " , " --npoints="+str(value) )[use_npoints and value > 0] lambdamin lower bound for lambda parameter in steps 2 and 5 Float 0.005 (defined $value and $value != $vdef)?" --lambdamin=$value ":"" ( "" , " --lambdamin="+str(value))[value is not None and value != vdef] expert_cleanup Cleanup (Step 1) acutof when a value after CUT is discarded, the rest of the curve is discarded as well (default is 0.1) Float 0.1 (defined $value and $value != $vdef)? " --acutoff=$value " : "" ( "" , " --acutoff="+str(value))[ value is not None and value != vdef] Discard or keep SAXS curves' points based on their SNR. Points with an error of zero are discarded as well expert_fitting Fitting (Step 2) baverage baverage: Average over all possible parameters instead of just taking the most probable set of parameters. Boolean 0 (defined $value and $value != $vdef)? " --baverage ": "" ( "" , " --baverage ")[ value is not None and value!= vdef ] bd Initial value for d Float 4 must be greater than 0 $value > 0 value > 0 (defined $value and $value != $vdef)? " --bd=$value":"" ( "", " --bd="+str(value))[ value is not None and value != vdef ] bs Initial value for s Float 0 must be greater or equal than 0 $value >= 0 value >= 0 (defined $value and $value != $vdef)? " --bs=$value":"" ( "", " --bs="+str(value))[ value is not None and value != vdef ] Estimate the mean function and the noise level of each SAXS curve. expert_rescaling Rescaling (Step 3) creference Define which input curve the other curves will be recaled to. Choice last last first (defined $value and $value != vdef) " --creference=$value" : "" ( "" , " --creference="+str(value))[value is not None and value != vdef] cnpoints Number of points to use to compute gamma (default 200) Integer 200 (defined $value and $value != vdef)? " --cnpoints=$value ":"" ( "", " --cnpoints="+str(value))[value is not None and value != vdef] Find the most probable scaling factor of all curves wrt the first expert_merging Merging (Step 5) eaverage Average over all possible parameters instead of just taking the most probable set of parameters. Boolean 0 (defined $value and $value != $vdef)? " --eaverage ": "" ( "" , " --eaverage ")[ value is not None and value!= vdef ] eextrapolate Extrapolate NUM percent outside of the curve's bounds. Integer 0 (defined $value and $value != $vdef)? " --eextrapolate=$value":"" ( "", " --eextrapolate="+str(value))[value is not None and value != vdef] Example: if NUM=50 and the highest acceptable data point is at q=0.3, the mean will be estimated up to q=0.45. Default is 0 (just extrapolate at low angle). Collect compatible data and produce best estimate of mean function. sparse_output Output outlevel eq "sparse" outlevel == "sparse" sparse_summary Report Report "summary.txt" "summary.txt" sparse_merge Merged data SAXSMerge AbstractText "data_merged.dat" "data_merged.dat" selection de données d'entrées 3 cols q i err sparse_mean Interpolation SAXSIntepolation AbstractText "mean_merged.dat" "mean_merged.dat" interpolation 3 cols q i err normal_output Output outlevel eq "normal" outlevel == "normal" normal_summary Report Report "summary.txt" "summary.txt" normal_merge Merged data SAXSMerge AbstractText "data_merged.dat" "data_merged.dat" selection de données d'entrées q i err eorigin eoriname eextrapol normal_mean Interpolation SAXSIntepolation AbstractText "mean_merged.dat" "mean_merged.dat" interpolation q i err eorigin eoriname eextrapol full_output Output outlevel eq "full" outlevel == "full" full_summary Report Report "summary.txt" "summary.txt" full_merge Merged data SAXSMerge AbstractText "data_merged.dat" "data_merged.dat" q i err eorigin eoriname eextrapol full_mean Interpolation SAXSIntepolation AbstractText "mean_merged.dat" "mean_merged.dat" q i err eorigin eoriname eextrapol all_files_output Output allfiles allfiles all_file_data Merged data SAXSMerge AbstractText "data_*" "data_*" all_files_mean Interpolation SAXSIntepolation AbstractText "mean_*" "mean_*"
Programs-5.1.1/gff2ps.xml0000644000175000001560000004631711767572177014122 0ustar bneronsis gff2ps gff2ps Produces PostScript graphical output from GFF-files Josep Francesc ABRIL FERRANDO, Roderic GUIGO SERRA http://genome.imim.es/software/gfftools/GFF2PS.html display:feature_table gff2ps gff_file GFF file Feature AbstractText GFF " $value" " "+str(value) 100 output_options Output Options 1 page_size Modify page size (-s) String a4 (defined $value and $value ne $vdef)? " -s $value" : "" ( "" , " -s "+str(value) )[ value is not None and value != vdef] orientation Switches page orientation to Portrait (default is Landscape) (-p) Boolean 0 ($value)? " -p" : "" ( "" , " -p" )[ value ] split Sets how many pages are needed to split your output (-P) Integer 1 (defined $value and $value != $vdef)? " -P $value" : "" ( "" , " -P " +str(value) )[ value is not None and value != vdef] zoom_first Zoom first nucleotide (default is sequence origin) (-S) Integer (defined $value)? " -S $value" : "" ( "" , " -S " + str(value) )[ value is not None ] zoom_last Zoom last nucleotide (default is sequence length) (-E) Integer (defined $value)? " -E $value" : "" ( "" , " -E " + str(value) )[ value is not None ] blocks Sets blocks per page (-B) Integer 1 (defined $value and $value != $vdef)? " -B $value" : "" ( "" , " -B " + str(value) )[ value is not None and value != vdef] nuc_per_line Sets nucleotides per line (default is the largest sequence position from input gff-files) (-N) Integer (defined $value)? " -N $value" : "" ( "" , " -N " + str(value) )[ value is not None ] blocks_from_left_to_right Blocks from left to right and from top to bottom (default is top to bottom first) (-b) Boolean 0 ($value)? " -b" : "" ( "" , " -b" )[ value ] no_headers Switch off Header (Title area) (-L) Boolean 0 ($value)? " -L" : "" ( "" , " -L" )[ value ] set_title Defining title (default is input gff filename) (-T) String (defined $value)? " -T $value" : "" ( "" , " -T " + str(value) )[ value is not None ] no_page_nb Does not show page numbering (-l) Boolean 0 ($value)? " -l" : "" ( "" , " -l" )[ value ] no_date Does not show date (-O) Boolean 0 ($value)? " -O" : "" ( "" , " -O" )[ value ] no_time Does not show time (-o) Boolean 0 ($value)? " -o" : "" ( "" , " -o" )[ value ] no_copyright Switch off CopyRight line on plot (-a) Boolean 0 ($value)? " -a" : "" ( "" , " -a" )[ value ] color_options Color Options 3 fg_color_name Sets color for FOREGROUND (-G) String black (defined $value and $value ne $vdef)? " -G $value" : "" ( "" , " -G " + str(value) )[ value is not None and value != vdef] bg_color_name Sets color for BACKGROUND (-g) String white (defined $value and $value ne $vdef)? " -g $value" : "" ( "" , " -g " + str(value) )[ value is not None and value != vdef] f0_color_name Sets color for frame 0 (-0) String blue (defined $value and $value ne $vdef)? " -0 $value" : "" ( "" , " -0 " + str(value) )[ value is not None and value != vdef] f1_color_name Sets color for frame 1 (-1) String red (defined $value and $value ne $vdef)? " -1 $value" : "" ( "" , " -1 " + str(value) )[ value is not None and value != vdef] f2_color_name Sets color for frame 2 (-2) String green (defined $value and $value ne $vdef)? " -2 $value" : "" ( "" , " -2 " + str(value) )[ value is not None and value != vdef] f_color_name Sets color for frame . (-3) String orange (defined $value and $value ne $vdef)? " -3 $value" : "" ( "" , " -3 " + str(value) )[ value is not None and value != vdef] tickmark_options Tickmark Options 2 major_tickmarks Number of major tickmarks per line (-M) Integer 10 (defined $value and $value != $vdef)? " -M $value" : "" ( "" , " -M " + str(value) )[ value is not None and value != vdef] major_tickmarks_scale Major tickmarks scale in nucleotides (-K) Integer (defined $value)? " -K $value" : "" ( "" , " -K " + str(value) )[ value is not None ] Default is nucleotide length for lines divided by major tickmarks number (see option -T). minor_tickmarks Number of minor tickmarks between major tickmarks (-m) Integer 10 (defined $value and $value != $vdef)? " -m $value" : "" ( "" , " -m " + str(value) )[ value is not None and value != vdef] minor_tickmarks_scale Minor tickmarks scale in nucleotides (-k) Integer (defined $value)? " -k $value" : "" ( "" , " -k " + str(value) )[ value is not None ] Default is major tickmarks size divided by minor tickmarks number (see option -t). element_options Display elements Options 4 no_forward_strand Switch off displaying forward-strand(Watson) elements (-w) Boolean 0 ($value)? " -w" : "" ( "" , " -w" )[ value ] no_reverse_strand Switch off displaying reverse-strand(Crick) elements (-c) Boolean 0 ($value)? " -c" : "" ( "" , " -c" )[ value ] no_strand_independent Switch off displaying strand-independent elements (-i) Boolean 0 ($value)? " -i" : "" ( "" , " -i" )[ value ] no_label Switch off labels for element positions (-n) Boolean 0 ($value)? " -n" : "" ( "" , " -n" )[ value ] other_options Other Options 5 default_custom_file Create a new default customfile (-D) Boolean 0 ($value)? " -D gff2psrc" : "" ( "" , " -D gff2psrc" )[ value ] user_custom_file Your custom rc file (-C) GffCustomRc AbstractText (defined $value)? " -C $value" : "" ( "" , " -C " + str(value) )[ value is not None ] outfile Postscript output file PostScript Binary "; ln -s gff2ps.out gff2ps.ps" "; ln -s gff2ps.out gff2ps.ps" 1000 "gff2ps.ps" "gff2ps.ps" custom_file Custom output file Text $default_custom_file default_custom_file "gff2psrc" "gff2psrc" Programs-5.1.1/fastdnaml.xml0000644000175000001560000006752111724156742014671 0ustar bneronsis fastdnaml 1.2.2 fastDNAml Construction of phylogenetic trees of DNA sequences using maximum likelihood Olsen, Matsuda, Hagstrom, Overbeek Olsen, G. J., Matsuda, H., Hagstrom, R., and Overbeek, R. 1994. fastDNAml: A tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput. Appl. Biosci. 10: 41-48. Felsenstein, J. 1981. Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol. 17: 368-376. http://iubio.bio.indiana.edu/soft/molbio/evolve/fastdnaml/fastDNAml.html phylogeny:likelihood fastdnaml String ($bootstrap) ? "cat > $infile.tmp;" : "fastDNAml" ("fastDNAml", "cat > %s.tmp;" % infile )[bootstrap] 1000 clean_tmp String " && clean_checkpoints" " && clean_checkpoints" 1100 infile Sequence Alignment File DNA Alignment PHYLIPI "cat $value | " "cat " + str(value) + " | " 1 The input to fastDNAml is similar to that used by DNAML (and the other PHYLIP programs). At least 3 sequences are required. inputopt Input Options frequencies Use empirical base frequencies derived from the sequence data ? Boolean 1 (not $value and not ($fA and $fC and $fG and $fT )) ? "": "frequencies $fA $fC $fG $fT | " ( "" ,"frequencies %s %s %s %s | " % (str(fA), str(fC), str(fG), str(fT)) )[ not ( value and fA is None and fC is None and fG is None and fT is None) ] 2 user_frequencies User bases frequencies (instead of empirical frequencies) not $frequencies not frequencies fA A frequency Float fC C frequency Float fG G frequency Float fT T frequency Float transition Transition/transversion ratio Float 2.0 (defined $value and $value!=$vdef) ? " transition $value | " : "" ( "" , " transition " + str(value) + " | " )[ value is not None and value != vdef] 10 This option can be used before a global or treefile option with auxiliary data. jumble Randomize input order of sequences (jumble) Boolean not $bootstrap not bootstrap 0 ($value) ? "jumble | " : "" ( "" , "jumble | " )[ value ] 10 Note that fastDNAml explores a very small number of alternative tree topologies relative to a typical parsimony program. There is a very real chance that the search procedure will not find the tree topology with the highest likelihood. Altering the order of taxon addition and comparing the trees found is a fairly efficient method for testing convergence. Typically, it would be nice to find the same best tree at least twice (if not three times), as opposed to simply performing some fixed number of jumbles and hoping that at least one of them will be the optimum. global_opp Global rearrangements Boolean defined $transition or $bootstrap or $jumble transition is not None or bootstrap or jumble 0 ($value) ? " global " : "" ( "" , " global ")[ value ] 11 The G (global) option has been generalized to permit crossing any number of branches during tree rearrangements. In addition, it is possible to modify the extent of rearrangement explored during the sequential addition phase of tree building. The G U (global and user tree) option combination instructs the program to find the best of the user trees, and then look for rearrangements that are better still. If a rearrangement distance is specified, the input must contain a transition option. The Global option can be used to force branch swapping on user trees, (combination of Global and User Tree(s) options). final_arrgt Number of branches to cross in rearrangements of the completed tree Integer (defined $transition or $bootstrap or $jumble) and $global_opp (transition is not None or bootstrap or jumble) and global_opp (defined $value) ? " $final_arrgt " : "" ( "" , " %s " % str(final_arrgt))[ value is not None] 12 partial_arrgt Number of branches to cross in testing rearrangements during the sequential addition phase of tree inference Integer (defined $transition or $bootstrap or $jumble) and $global_opp (transition is not None or bootstrap or jumble) and global_opp (defined $value) ? "$partial_arrgt " : "" ( "" , " %s " % str(partial_arrgt))[ value is not None] 13 pipe_arrgt Number of branches to cross in testing rearrangements during the sequential addition phase of tree inference Integer (defined $transition or $bootstrap or $jumble) and $global_opp (transition is not None or bootstrap or jumble) and global_opp " | " " | " 14 quickadd Decreases the time in initially placing a new sequence in the growing tree (quickadd) Boolean 0 ($value)? "quickadd | " : "" ( "" , " quickadd | " )[ value ] 2 This option greatly decreases the time in initially placing a new sequence in the growing tree (but does not change the time required to subsequently test rearrangements). The overall time savings seems to be about 30%, based on a very limited number of test cases. Its downside, if any, is unknown. This will probably become default program behavior in the near future. If the analysis is run with a global option of 'G 0 0', so that no rearrangements are permitted, the tree is build very approximately, but very quickly. This may be of greatest interest if the question is, 'Where does this one new sequence fit into this known tree?' The known tree is provided with the restart option, below. PHYLIP DNAML does not include anything comparable to the quickadd option. outgroup Use the specified sequence number as outgroup Integer (defined $value) ? "outgroup $value | " : "" ( "" , "outgroup " + str(value) + " | " )[ value is not None ] 2 treeopt User input Tree Options not $bootstrap not bootstrap This options allows you to enter your own trees and instructs the program to evaluate them. user_tree User tree - tree(s) file Tree NEWICK (defined $value) ? "usertree $value |" : "" ( "" , " usertree %s |" % str(value) )[ value is not None ] 2 The trees must be in Newick format, and terminated with a semicolon. (The program also accepts a pseudo_newick format, which is a valid prolog fact.) The tree reader in this program is more powerful than that in PHYLIP 3.3. In particular, material enclosed in square brackets, [ like this ], is ignored as comments; taxa names can be wrapped in single quotation marks to support the inclusion of characters that would otherwise end the name (i.e., '(', ')', ':', ';', '[', ']', ',' and ' '); names of internal nodes are properly ignored; and exponential notation (such as 1.0E-6) for branch lengths is supported. user_lengths User trees to be read with branch lengths Boolean not $bootstrap not bootstrap 0 ($value) ? "userlengths |" : "" ( "" , "userlengths |" )[ value ] 2 Causes user trees to be read with branch lengths (and it is an error to omit any of them). Without the L option, branch lengths in user trees are not required, and are ignored if present. boot Bootstrap bootstrap Generates a re-sample of the input data (bootstrap) Boolean 0 ($value) ? " fastdnaml " : "" ( "" , " fastdnaml " )[ value ] 1001 Tree files will be summarized in one '.tree' file as well as output files in one '.out' file bootopt Bootstrap options $bootstrap bootstrap nboots Number of different bootstrap samples Integer 1 (defined $value and $value != $vdef) ? " -boots $value" : "" ( "" , " -boots " + str(value) )[ value is not None and value != vdef] More than 1000 samples is not possible on this server $value <= 1000 value <= 1000 1002 bootstrap_seed Random number seed for first bootstrap Integer (defined $value) ? " -seed $value" : "" ( "" , " -seed " + str(value) )[ value is not None ] 1002 Warning: For a given random number seed, the sample will always be the same. bootstrap_maxjumble Maximum attempts at replicating inferred tree (max jumble) Integer $bootstrap bootstrap 10 (defined $value and $value != $vdef) ? " -jumble $value" : "" ( "" , " -jumble " + str(value) )[ value is not None and value != vdef] 1002 in_bootfile String " $infile.tmp" " %s.tmp" % str(infile) 1003 output_opt Output and Results Options outfile Output File Filename not $bootstrap not bootstrap (defined $value and $value ne $vdef) ? " > $outfile" : "" ( "" , " > " + str(outfile) )[ value is not None ] 1010 outputfile Output(s) file Report not $bootstrap and not defined $outfile not bootstrap and outfile is None "fastdnaml.out" "fastdnaml.out" outputfile_name Output(s) file Report not $bootstrap and defined $outfile not bootstrap and outfile is not None "$outfile" str(outfile) treefile Save tree in treefile Boolean not $bootstrap not bootstrap 1 ($value) ? "" : "treefile | " ( "treefile | ", "" )[ value ] 2 printdata Print the input alignment at start of run (printdata) Boolean 0 ($value) ? "printdata | " : "" ( "" , "printdata | " )[ value ] 2 categopt Categories and Weights Options categories Rate categories file (user-specified) PhylipCategoriesRates AbstractText (defined $value) ? "categories $value |" : "" ( "" , "categories "+ str(value) + " |" )[ value is not None ] 2 The data must have the format specified for PHYLIP dnaml 3.3. The first line must be the letter C, followed by the number of categories (a number in the range 1 through 35), and then a blank-separated list of the rates for each category. (The list can take more than one line; the program reads until it finds the specified number of rate values.) The next line should be the word Categories followed by one rate category character per sequence position. The categories 1 - 35 are represented by the series 1, 2, 3, ..., 8, 9, A, B, C, ..., Y, Z. These latter data can be on one or more lines. For example: C 12 0.0625 0.125 0.25 0.5 1 2 4 8 16 32 64 128 Categories 5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9 Category 'numbers' are ordered: 1, 2, 3, ..., 9, A, B, ..., Y, Z. Category zero (undefined rate) is permitted at sites with a zero in a user-supplied weighting mask. weights Weights file (user-specified column weighting information) PhylipWeight AbstractText (defined $value) ? "weights $value |" : "" ( "" , "weights " + str(value) + " |" )[ value is not None ] 2 example: Weights 111111111111001100000100011111100000000000000110000110000000 In case of bootstrap, only positions that have nonzero weights are used in computing the bootstrap sample. treefiles Tree file Tree NEWICK not $bootstrap not bootstrap "_treefile.[0-9]*" "_treefile.[0-9]*" bootstrap_report Bootstrap output report Report $bootstrap bootstrap "$infile.tmp.out" "%s.tmp.out" % str(infile) bootstrap_tree Bootstrap tree file Tree NEWICK $bootstrap bootstrap "$infile.tmp.tree" "%s.tmp.tree" % str(infile) bootstrap_aln Bootstrap alignment file Alignment $bootstrap bootstrap "$infile.tmp" "%s.tmp" % str(infile) Programs-5.1.1/rnaheat.xml0000644000175000001560000002737611767572177014361 0ustar bneronsis rnaheat 1.7 RNAheat Calculate specific heat of RNAs Hofacker, Stadler I.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994) Fast Folding and Comparison of RNA Secondary Structures. Monatshefte f. Chemie 125: 167-188 J.S. McCaskill (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structures, Biopolymers 29: 11051119 D. Adams (1979) The hitchhiker's guide to the galaxy, Pan Books, London RNAheat reads RNA sequences from stdin and calculates their specific heat in the temperature range t1 to t2, from the partition function by numeric differentiation. The result is written in file as a list of pairs of temperature in C and specific heat in Kcal/(Mol*K). sequence:nucleic:2D_structure structure:2D_structure rnaheat String "RNAheat" "RNAheat" seq RNA Sequences File DNA Sequence FASTA " < $value" " < " + str(value) 1000 control Control options 2 temp_min Lowest temperature (-Tmin) Integer 0 (defined $value and $value != $vdef)? " -Tmin $value" : "" ( "" , " -Tmin " + str(value) )[ value is not None and value != vdef] temp_max Highest temperature (-Tmax) Integer 100 (defined $value and $value != $vdef)? " -Tmax $value" : "" ( "" , " -Tmax " + str(value) )[ value is not None and value != vdef] stepsize Calculate partition function every stepsize degrees Celcius (-h) Integer 1 (defined $value and $value != $vdef)? " -h $value" : "" ( "" , " -h " + str(value) )[ value is not None and value != vdef] ipoints Produces a smoother curve by increasing ipoints (-m) Integer 2 (defined $value and $value != $vdef)? " -m $value" : "" ( "" , " -m " + str(value) )[ value is not None and value != vdef] The program fits a parabola to 2*ipoints+1 data points to calculate 2nd derivatives. Increasing this parameter produces a smoother curve. tetraloops Do not include special stabilizing energies for certain tetraloops (-4) Boolean 0 ($value)? " -4" : "" ( "" , " -4" )[ value ] dangling How to treat dangling end energies for bases adjacent to helices in free ends and multiloops (-d) Choice -d1 -d1 -d -d2 (defined $value and $value ne $vdef)? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] How to treat 'dangling end' energies for bases adjacent to helices in free ends and multiloops: Normally only unpaired bases can participate in at most one dangling end. With -d2 this check is ignored, this is the default for partition function folding (-p). -d ignores dangling ends altogether. Note that by default pf and mfe folding treat dangling ends differently, use -d2 (or -d) in addition to -p to ensure that both algorithms use the same energy model. The -d2 options is available for RNAfold, RNAeval, and RNAinverse only. input Input parameters 2 noGU Do not allow GU pairs (-noGU) Boolean 0 ($value)? " -noGU" : "" ( "" , " -noGU" )[ value ] noCloseGU Do not allow GU pairs at the end of helices (-noCloseGU) Boolean 0 ($value)? " -noCloseGU" : "" ( "" , " -noCloseGU" )[ value ] nsp Non standard pairs (comma seperated list) (-nsp) String (defined $value)? " -nsp $value" : "" ( "" , " -nsp " + str(value) )[ value is not None ] Allow other pairs in addition to the usual AU,GC,and GU pairs. pairs is a comma seperated list of additionally allowed pairs. If a the first character is a '-' then AB will imply that AB and BA are allowed pairs. e.g. RNAfold -nsp -GA will allow GA and AG pairs. Nonstandard pairs are given 0 stacking energy. parameter Parameter file (-P) EnergyParameterFile AbstractText (defined $value)? " -P $value" : "" ( "" , " -P " + str(value) )[ value is not None ] Read energy parameters from paramfile, instead of using the default parameter set. A sample parameterfile should accompany your distribution. See the RNAlib documentation for details on the file format. readseq String "readseq -f=19 -a $seq > $seq.tmp && (cp $seq $seq.orig && mv $seq.tmp $seq) ; " "readseq -f=19 -a " + str(seq) + " > " + str(seq) + ".tmp && (cp " + str(seq) + " " + str(seq) + ".orig && mv " + str(seq) + ".tmp " + str(seq) + ") ; " -10 Programs-5.1.1/getorf.xml0000644000175000001560000003504312072525233014170 0ustar bneronsis getorf EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net getorf Finds and extracts open reading frames (ORFs) http://bioweb2.pasteur.fr/docs/EMBOSS/getorf.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:gene_finding getorf e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_additional Additional section e_table Genetic codes Choice 0 0 1 2 3 4 5 6 9 10 11 12 13 14 15 16 21 22 23 ("", " -table=" + str(value))[value is not None and value!=vdef] 2 e_minsize Minimum nucleotide size of orf to report Integer 30 ("", " -minsize=" + str(value))[value is not None and value!=vdef] 3 e_maxsize Maximum nucleotide size of orf to report Integer 1000000 ("", " -maxsize=" + str(value))[value is not None and value!=vdef] 4 e_find Type of sequence to output Choice 0 0 1 2 3 4 5 6 ("", " -find=" + str(value))[value is not None and value!=vdef] 5 This is a small menu of possible output options. The first four options are to select either the protein translation or the original nucleic acid sequence of the open reading frame. There are two possible definitions of an open reading frame: it can either be a region that is free of STOP codons or a region that begins with a START codon and ends with a STOP codon. The last three options are probably only of interest to people who wish to investigate the statistical properties of the regions around potential START or STOP codons. The last option assumes that ORF lengths are calculated between two STOP codons. e_advanced Advanced section e_methionine Change initial start codons to methionine Boolean 1 (" -nomethionine", "")[ bool(value) ] 6 START codons at the beginning of protein products will usually code for Methionine, despite what the codon will code for when it is internal to a protein. This qualifier sets all such START codons to code for Methionine by default. e_circular Is the sequence circular Boolean 0 ("", " -circular")[ bool(value) ] 7 e_reverse Find orfs in the reverse sequence Boolean 1 (" -noreverse", "")[ bool(value) ] 8 Set this to be false if you do not wish to find ORFs in the reverse complement of the sequence. e_flanking Number of flanking nucleotides to report Integer 100 ("", " -flanking=" + str(value))[value is not None and value!=vdef] 9 If you have chosen one of the options of the type of sequence to find that gives the flanking sequence around a STOP or START codon, this allows you to set the number of nucleotides either side of that codon to output. If the region of flanking nucleotides crosses the start or end of the sequence, no output is given for this codon. e_output Output section e_outseq Name of the output sequence file (e_outseq) Protein Filename outseq.orf ("" , " -outseq=" + str(value))[value is not None] 10 e_osformat_outseq Choose the sequence output format Protein Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 11 e_outseq_out outseq_out option Protein Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 12 Programs-5.1.1/textsearch.xml0000644000175000001560000002222512072525233015052 0ustar bneronsis textsearch EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net textsearch Search the textual description of sequence(s) http://bioweb2.pasteur.fr/docs/EMBOSS/textsearch.html http://emboss.sourceforge.net/docs/themes display:information textsearch e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_pattern Enter a pattern to search for String ("", " -pattern=" + str(value))[value is not None] 2 The search pattern is a regular expression. Use a | to indicate OR. For example: human|mouse will find text with either 'human' OR 'mouse' in the text e_additional Additional section e_casesensitive Do a case-sensitive search Boolean 0 ("", " -casesensitive")[ bool(value) ] 3 e_output Output section e_html Format output as an html table Boolean 0 ("", " -html")[ bool(value) ] 4 e_only Display the specified columns Boolean 0 ("", " -only")[ bool(value) ] 5 This is a way of shortening the command line if you only want a few things to be displayed. Instead of specifying: '-nohead -noname -nousa -noacc -nodesc' to get only the name output, you can specify '-only -name' e_heading Display column headings Boolean not e_only 0 ("", " -heading")[ bool(value) ] 6 e_usa Display the usa of the sequence Boolean not e_only 0 ("", " -usa")[ bool(value) ] 7 e_accession Display 'accession' column Boolean not e_only 0 ("", " -accession")[ bool(value) ] 8 e_name Display 'name' column Boolean not e_only 0 ("", " -name")[ bool(value) ] 9 e_description Display 'description' column Boolean not e_only 0 ("", " -description")[ bool(value) ] 10 e_outfile Name of the output file (e_outfile) Filename textsearch.e_outfile ("" , " -outfile=" + str(value))[value is not None] 11 e_outfile_out outfile_out option TextsearchReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 12 Programs-5.1.1/pepinfo.xml0000644000175000001560000002553712072525233014351 0ustar bneronsis pepinfo EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net pepinfo Plot amino acid properties of a protein sequence in parallel. http://bioweb2.pasteur.fr/docs/EMBOSS/pepinfo.html http://emboss.sourceforge.net/docs/themes sequence:protein:composition pepinfo e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_aaproperties Amino acid chemical classes data file Protein AminoAcidClassification AbstractText ("", " -aaproperties=" + str(value))[value is not None ] 2 e_aahydropathy Amino acid hydropathy values data file Protein AminoAcidHydropathy AbstractText ("", " -aahydropathy=" + str(value))[value is not None ] 3 e_additional Additional section e_hwindow Window size for hydropathy averaging (value greater than or equal to 1) Integer 9 ("", " -hwindow=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 4 e_output Output section e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 5 xy_goutfile Name of the output graph Filename pepinfo_xygraph ("" , " -goutfile=" + str(value))[value is not None] 6 xy_outgraph_png Graph file Picture Binary e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_graph == "data" "*.dat" e_outfile Name of the output file (e_outfile) Filename pepinfo.e_outfile ("" , " -outfile=" + str(value))[value is not None] 7 e_outfile_out outfile_out option PepinfoReport Report e_outfile e_generalplot Plot histogram of general properties Boolean 1 (" -nogeneralplot", "")[ bool(value) ] 8 e_hydropathyplot Plot graphs of hydropathy Boolean 1 (" -nohydropathyplot", "")[ bool(value) ] 9 auto Turn off any prompting String " -auto -stdout" 10 Programs-5.1.1/forest2consense.xml0000644000175000001560000002374112124542142016022 0ustar bneronsis forest2consense forest2consense Consensus tree "list processor" forest2consense processes two files containing lists of trees, and, for each of the possible combinations between the a tree of file 1 and a tree of file 2, a consensus tree using PHYLIP's consense package. tree leaves IDs renaming is processed using nw_rename Eric Deveaud phylogeny:tree_analyser forest2consense.py infile1 First series of trees in file (intree) Tree " $value " " %s " % value 2 Input is a tree file which contains a series of trees in the Newick standard form (A,(B,(H,(D,(J,(((G,E),(F,I)),C)))))); (A,(B,(D,((J,H),(((G,E),(F,I)),C))))); (A,(B,(D,(H,(J,(((G,E),(F,I)),C)))))); (A,(B,(E,(G,((F,I),((J,(H,D)),C)))))); (A,(B,(E,(G,((F,I),(((J,H),D),C)))))); (A,(B,(E,((F,I),(G,((J,(H,D)),C)))))); (A,(B,(E,((F,I),(G,(((J,H),D),C)))))); (A,(B,(E,((G,(F,I)),((J,(H,D)),C))))); (A,(B,(E,((G,(F,I)),(((J,H),D),C))))); infile2 Second series of trees in file (intree) Tree " $value " " %s " % value 3 Input is a tree file which contains a series of trees in the Newick standard form (A,(B,(H,(D,(J,(((G,E),(F,I)),C)))))); (A,(B,(D,((J,H),(((G,E),(F,I)),C))))); (A,(B,(D,(H,(J,(((G,E),(F,I)),C)))))); (A,(B,(E,(G,((F,I),((J,(H,D)),C)))))); (A,(B,(E,(G,((F,I),(((J,H),D),C)))))); (A,(B,(E,((F,I),(G,((J,(H,D)),C)))))); (A,(B,(E,((F,I),(G,(((J,H),D),C)))))); (A,(B,(E,((G,(F,I)),((J,(H,D)),C))))); (A,(B,(E,((G,(F,I)),(((J,H),D),C))))); map_file Map file the map file contains the mapping between short IDs that may have been used for tree generation and the real entry names ID_Mapping AbstractText " -m $value" " -m " + str(value) 1 type Consensus type Choice MRE MRE "" "" S "C\\n" "C\n" MR "C\\nC\\n" "C\nC\n" ML "C\\nC\\nC\\n" "C\nC\nC\n" params output Output options print_tree Print out tree (3) Boolean 1 ($value) ? "" : "3\\n" ( "3\n" , "" )[ value ] 1 Tells the program to print a semi-graphical picture of the tree in the outfile. params print_treefile Write out trees onto tree file (4) Boolean 1 ($value) ? "" : "4\\n" ( "4\n" , "" )[ value ] 1 Tells the program to save the tree in a treefile (a standard representation of trees where the tree is specified by a nested pairs of parentheses, enclosing names and separated by commas). params printdata Print out the data at start of run (1) Boolean 0 ($value) ? "1\\n" : "" ( "" , "1\n" )[ value ] 1 params other_options Other options outgroup Outgroup species (O) Integer 1 (defined $value and $value != $vdef) ? "O\\n$value\\n" : "" ( "" , "O\n"+ str( value ) +"\n" )[ value is not None and value != vdef ] Please enter a value greater than 0 $value > 0 value > 0 1 params rooted Trees to be treated as rooted (R) Boolean 0 ($value) ? "R\\n" : "" ( "" , "R\n" )[ value ] 1 params treefile Consense tree files Tree NEWICK $print_treefile print_treefile 50 "$infile1-$infile2-*.outtree" "%s-%s-*.outtree" % (infile1,infile2) confirm String "Y\\n" "Y\n" 1000 params outfile Consense output files Text 40 "$infile1-$infile2-*.outfile" "%s-%s-*.outfile" % (infile1,infile2) terminal_type String "T\\n" "T\n" -1 params Programs-5.1.1/clustalw-multialign.xml0000644000175000001560000012777612073003734016720 0ustar bneronsis clustalw-multialign Clustalw: Multiple alignment Do full multiple alignment alignment:multiple clustalw -align input Data Input sequences_input Sequences File ( a file containing several sequences ) (-infile) not $alignment_input or ($sequences_input and $alignment_input) not alignment_input or (sequences_input and alignment_input) Sequence FASTA NBRF EMBL GCG GDE SWISSPROT 2,n " -infile=$value" " -infile=" + str( value ) Can not handle both Sequence and Alignment at the same time not $alignment_input not alignment_input 1 alignment_input Aligned sequences not $sequences_input or ($sequences_input and $alignment_input) not sequences_input or (sequences_input and alignment_input) Protein DNA Alignment CLUSTAL FASTA 1 " -infile=$value" " -infile=" + str( value ) Can not handle both Sequence and Alignment at the same time not $sequences_input not sequences_input When the sequences are aligned (all sequences have the same length and at least one sequence has at least one gap) general General settings 2 quicktree Toggle Slow/Fast pairwise alignments (-quicktree) Choice slow slow fast ($value eq "fast") ? " -quicktree" : "" ( "" , " -quicktree")[ value == "fast"] slow: by dynamic programming (slow but accurate) fast: method of Wilbur and Lipman (extremely fast but approximate) typeseq Protein or DNA (-type) Choice auto auto protein dna (defined $value) ? " -type=$value" : "" ("", " -type="+str(value))[value is not None] multalign Multiple Alignments parameters 3 Multiple alignments are carried out in 3 stages : 1) all sequences are compared to each other (pairwise alignments); 2) a dendrogram (like a phylogenetic tree) is constructed, describing the approximate groupings of the sequences by similarity (stored in a file). 3) the final multiple alignment is carried out, using the dendrogram as a guide. Pairwise alignment parameters control the speed/sensitivity of the initial alignments. Multiple alignment parameters control the gaps in the final multiple alignments. gapopen Gap opening penalty (-gapopen) Float 10.00 (defined $value and $value != $vdef) ? " -gapopen=$value" : "" ( "" , " -gapopen=" + str( value ))[ value is not None and value != vdef ] gapext Gap extension penalty (-gapext) Float 0.20 (defined $value and $value != $vdef) ? " -gapext=$value" : "" ( "" , " -gapext=" + str( value ))[ value is not None and value != vdef ] endgaps No end gap separation penalty (-endgaps) Boolean 0 ($value) ? " -endgaps" : "" ( "" ," -endgaps" )[ value ] End gap separation treats end gaps just like internal gaps for the purposes of avoiding gaps that are too close (set by GAP SEPARATION DISTANCE above). If you turn this off, end gaps will be ignored for this purpose. This is useful when you wish to align fragments where the end gaps are not biologically meaningful. gapdist Gap separation penalty range (-gapdist) Integer 8 (defined $value and $value != $vdef) ? " -gapdist=$value" : "" ( "" , " -gapdist=" + str( value ))[ value is not None and value != vdef] Gap separation distance tries to decrease the chances of gaps being too close to each other. Gaps that are less than this distance apart are penalised more than other gaps. This does not prevent close gaps; it makes them less frequent, promoting a block-like appearance of the alignment. maxdiv Delay divergent sequences : % ident. for delay (-maxdiv) Integer 30 (defined $value and $value != $vdef) ? " -maxdiv=$value" : "" ( "" , " -maxdiv=" + str( value ))[ value is not None and value != vdef ] Delays the alignment of the most distantly related sequences until after the most closely related sequences have been aligned. The setting shows the percent identity level required to delay the addition of a sequence; sequences that are less identical than this level to any other sequences will be aligned later. newtree File for new guide tree (-newtree) Filename (defined $value) ? " -newtree=$value" : "" ( "" , " -newtree=" + str( value ))[value is not None] newtreefile Output tree Tree NEWICK defined $newtree newtree is not None $newtree newtree usetree File for old guide tree (-usetree) Tree NEWICK (defined $value) ? " -usetree=$value" : "" ( "" ," -usetree=" + str( value ))[value is not None] You can give a previously computed tree (.dnd file) - on the same data multalign_prot Protein parameters $typeseq eq "protein" typeseq == "protein" matrix Protein weight matrix (-matrix) Choice gonnet gonnet blosum pam id (defined $value and $value ne $vdef) ? " -matrix=$value" : "" ("", " -matrix="+str(value))[value is not None and value!=vdef] There are three 'in-built' series of weight matrices offered. Each consists of several matrices which work differently at different evolutionary distances. To see the exact details, read the documentation. Crudely, we store several matrices in memory, spanning the full range of amino acid distance (from almost identical sequences to highly divergent ones). For very similar sequences, it is best to use a strict weight matrix which only gives a high score to identities and the most favoured conservative substitutions. For more divergent sequences, it is appropriate to use 'softer' matrices which give a high score to many other frequent substitutions. BLOSUM (Henikoff). These matrices appear to be the best available for carrying out data base similarity (homology searches). The matrices used are: Blosum80, 62, 40 and 30. The Gonnet Pam 250 matrix has been reported as the best single matrix for alignment, if you only choose one matrix. Our experience with profile database searches is that the Gonnet series is unambiguously superior to the Blosum series at high divergence. However, we did not get the series to perform systematically better than the Blosum series in Clustal W (communication of the authors). PAM (Dayhoff). These have been extremely widely used since the late '70s. We use the PAM 120, 160, 250 and 350 matrices. negative Negative values in matrix ? (-negative) Boolean 0 ($value) ? " -negative" : "" ( "" , " -negative" )[ value ] pgap Residue specific gaps off (-nopgap) Boolean 1 ($value) ? " -nopgap" : "" ( "" , " -nopgap" )[ value ] Residue specific penalties are amino acid specific gap penalties that reduce or increase the gap opening penalties at each position in the alignment or sequence. As an example, positions that are rich in glycine are more likely to have an adjacent gap than positions that are rich in valine. Table of residue specific gap modification factors: A 1.13 M 1.29 C 1.13 N 0.63 D 0.96 P 0.74 E 1.31 Q 1.07 F 1.20 R 0.72 G 0.61 S 0.76 H 1.00 T 0.89 I 1.32 V 1.25 K 0.96 Y 1.00 L 1.21 W 1.23 The values are normalised around a mean value of 1.0 for H. The lower the value, the greater the chance of having an adjacent gap. These are derived from the original table of relative frequencies of gaps adjacent to each residue (12) by subtraction from 2.0. hgap Hydrophilic gaps off (-nohgap) Boolean 1 ($value) ? " -nohgap" : "" ( "" , " -nohgap" )[ value ] Hydrophilic gap penalties are used to increase the chances of a gap within a run (5 or more residues) of hydrophilic amino acids; these are likely to be loop or random coil regions where gaps are more common. The residues that are 'considered' to be hydrophilic are set by menu item 3. hgapresidues Hydrophilic residues list (-hgapresidues) MultipleChoice R N D Q E G K P S A R N D C Q E G H I L K M F P S T W Y V ($value and $value ne $vdef) ? " -hgapresidues=\\"$value\\"" : "" ( '' , ' -hgapresidues="%s"' % str(value) )[ value and value != vdef ] multalign_dna DNA parameters $typeseq eq "dna" typeseq == "dna" dnamatrix DNA weight matrix (-dnamatrix) Choice iub iub clustalw (defined $value and $value ne $vdef) ? " -dnamatrix=$value" : "" ("", " -dnamatrix=" + str(value))[value is not None and value!=vdef] 1) IUB. This is the default scoring matrix used by BESTFIT for the comparison of nucleic acid sequences. X's and N's are treated as matches to any IUB ambiguity symbol. All matches score 1.9; all mismatches for IUB symbols score 0. 2) CLUSTALW(1.6). The previous system used by ClustalW, in which matches score 1.0 and mismatches score 0. All matches for IUB symbols also score 0. transweight Transitions weight (between 0 and 1) (-transweight) Float 0.5 (defined $value and $value != $vdef) ? " -transweight=$value" : "" ( "" , " -transweight=" + str( value ) )[ value is not None and value != vdef ] A weight of zero means that the transitions are scored as mismatches; a weight of 1 gives transitions the full match score. For distantly related DNA sequences, the weight should be near to zero; for closely related sequences it can be useful to assign a higher score. fastpw Fast Pairwise Alignments parameters $quicktree eq "fast" quicktree == "fast" 4 These similarity scores are calculated from fast, approximate, global alignments, which are controlled by 4 parameters. 2 techniques are used to make these alignments very fast: 1) only exactly matching fragments (k-tuples) are considered; 2) only the 'best' diagonals (the ones with most k-tuple matches) are used. ktuple Word size (-ktuple) Integer 1 (defined $value and $value != $vdef) ? " -ktuple=$value" : "" ( "" , " -ktuple=" + str( value ) )[value is not None and value != vdef ] 2 K-TUPLE SIZE: This is the size of exactly matching fragment that is used. INCREASE for speed (max= 2 for proteins; 4 for DNA), DECREASE for sensitivity. For longer sequences (e.g. >1000 residues) you may need to increase the default. topdiags Number of best diagonals (-topdiags) Integer 5 (defined $value and $value != $vdef) ? " -topdiags=$value" : "" ( "" , " -topdiags=" + str( value ))[value is not None and value != vdef ] The number of k-tuple matches on each diagonal (in an imaginary dot-matrix plot) is calculated. Only the best ones (with most matches) are used in the alignment. This parameter specifies how many. Decrease for speed; increase for sensitivity. window Window around best diags (-window) Integer 5 (defined $value and $value != $vdef) ? " -window=$value" : "" ( "" , " -window=" + str( value ) )[ value is not None and value != vdef ] WINDOW SIZE: This is the number of diagonals around each of the 'best' diagonals that will be used. Decrease for speed; increase for sensitivity pairgap Gap penalty (-pairgap) Float 3 (defined $value and $value != $vdef) ? " -pairgap=$value" : "" ( "" , " -pairgap=" + str( value ))[ value is not None and value != vdef ] This is a penalty for each gap in the fast alignments. It has little affect on the speed or sensitivity except for extreme values. score Percent or absolute score ? (-score) Choice percent percent absolute (defined $value and $value ne $vdef) ? " -score=$value" : "" ( "" , " -score=" +str( value ) )[value is not None or value != vdef] slowpw Slow Pairwise Alignments parameters $quicktree eq "slow" quicktree == "slow" 4 These parameters do not have any affect on the speed of the alignments. They are used to give initial alignments which are then rescored to give percent identity scores. These % scores are the ones which are displayed on the screen. The scores are converted to distances for the trees. pwgapopen Gap opening penalty (-pwgapopen) Float 10.00 (defined $value and $value != $vdef) ? " -pwgapopen=$value" : "" ( "" , " -pwgapopen=" + str( value ) )[ value is not None and value != vdef ] pwgapext Gap extension penalty (-pwgapext) Float 0.10 (defined $value and $value != $vdef) ? " -pwgapext=$value" : "" ( "" , " -pwgapext=" + str( value ) )[ value is not None and value != vdef ] slowpw_prot Protein parameters $typeseq eq "protein" typeseq == "protein" pwmatrix Protein weight matrix (-pwmatrix) Choice gonnet blosum gonnet pam id (defined $value and $value ne $vdef) ? " -pwmatrix=$value" : "" ( "" , " -pwmatrix=" + str(value) )[value is not None and value != vdef ] The scoring table which describes the similarity of each amino acid to each other. For DNA, an identity matrix is used. BLOSUM (Henikoff). These matrices appear to be the best available for carrying out data base similarity (homology searches). The matrices used are: Blosum80, 62, 40 and 30. The Gonnet Pam 250 matrix has been reported as the best single matrix for alignment, if you only choose one matrix. Our experience with profile database searches is that the Gonnet series is unambiguously superior to the Blosum series at high divergence. However, we did not get the series to perform systematically better than the Blosum series in Clustal W (communication of the authors). PAM (Dayhoff). These have been extremely widely used since the late '70s. We use the PAM 120, 160, 250 and 350 matrices. slowpw_dna DNA parameters $typeseq eq "dna" typeseq == "dna" pwdnamatrix DNA weight matrix (-pwdnamatrix) Choice iub iub clustalw (defined $value and $value ne $vdef) ? " -pwdnamatrix=$value" : "" ( "" , " -pwdnamatrix=" + str(value) )[ value is not None and value != vdef ] For DNA, a single matrix (not a series) is used. Two hard-coded matrices are available: 1) IUB. This is the default scoring matrix used by BESTFIT for the comparison of nucleic acid sequences. X's and N's are treated as matches to any IUB ambiguity symbol. All matches score 1.9; all mismatches for IUB symbols score 0. 2) CLUSTALW(1.6). The previous system used by ClustalW, in which matches score 1.0 and mismatches score 0. All matches for IUB symbols also score 0. outputparam Output parameters 5 outputformat Output format (-output) Choice null null FASTA GCG GDE PHYLIPI PIR NEXUS (defined $value ) ? " -output=$value" : "" ( "" , " -output=" + str( value) )[ value is not None ] seqnos Output sequence numbers in the output file (for clustalw output only) (-seqnos) Boolean not defined $outputformat outputformat is None 0 (defined $value and $value != $vdef) ? " -seqnos=on" : "" ( "" , " -seqnos=on")[ value is not None and value != vdef] outorder Result order (-outorder) Choice aligned input aligned (defined $value and $value ne $vdef) ? " -outorder=$value" : "" ( "" , " -outorder=" + str(value))[ value is not None and value != vdef ] outfile Sequence alignment file name (-outfile) Filename (defined $value) ? " -outfile=$value" : "" ( "" , " -outfile=" + str( value))[ value is not None ] clustalaligfile Alignment file Alignment CLUSTAL not defined $outputformat outputformat is None (defined $outfile)? "$outfile":"*.aln" ("*.aln", str(outfile))[outfile is not None] In the conservation line output in the clustal format alignment file, three characters are used: '*' indicates positions which have a single, fully conserved residue. ':' indicates that one of the following 'strong' groups is fully conserved (STA,NEQK,NHQK,NDEQ,QHRK,MILV,MILF,HY,FYW). '.' indicates that one of the following 'weaker' groups is fully conserved (CSA,ATV,SAG,STNK,STPA,SGND,SNDEQK,NDEQHK,NEQHRK,FVLIM,HFY). These are all the positively scoring groups that occur in the Gonnet Pam250 matrix. The strong and weak groups are defined as strong score >0.5 and weak score =<0.5 respectively. aligfile Alignment file Alignment $outputformat =~ /^(NEXUS|GCG|PHYLIPI|FASTA)$/ outputformat in [ "NEXUS", "GCG", "PHYLIPI","FASTA"] (defined $outfile)? "$outfile":"*.fasta *.nxs *.phy *.msf" { "OUTFILE":outfile, "FASTA":"*.fasta", "NEXUS": "*.nxs", "PHYLIPI": "*.phy" , 'GCG': '*.msf' }[( "OUTFILE", outputformat)[outfile is None]] seqfile Sequences file Sequence NBRF GDE $outputformat =~ /^(GDE|PIR)$/ outputformat in [ 'GDE', 'PIR' ] (defined $outfile)? "$outfile":"*.gde *.pir" { "OUTFILE":outfile, 'GDE':'*.gde', 'PIR':'*.pir}[( "OUTFILE", outputformat)[outfile is None]] dndfile Tree file Tree NEWICK not defined $newtree newtree is None "*.dnd" "*.dnd" gde_lower Upper case (for GDE output only) (-case) Boolean $outputformat eq "GDE" outputformat == "GDE" 0 ($value) ? " -case=upper" : "" ( "" , " -case=upper" )[ value ] 2 Programs-5.1.1/epestfind.xml0000644000175000001560000003623412072525233014666 0ustar bneronsis epestfind EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net epestfind Finds PEST motifs as potential proteolytic cleavage sites http://bioweb2.pasteur.fr/docs/EMBOSS/epestfind.html http://emboss.sourceforge.net/docs/themes sequence:protein:motifs epestfind e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 Protein sequence USA to be analysed. e_mwdata Molecular weights data file MolecularWeights AbstractText ("", " -mwdata=" + str(value))[value is not None ] 2 e_required Required section e_window Window length (value greater than or equal to 2) Integer 10 ("", " -window=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 2 is required value >= 2 3 Minimal distance between positively charged amino acids. e_order Sort order of results Choice 3 1 2 3 ("", " -order=" + str(value))[value is not None and value!=vdef] 4 Name of the output file which holds the results of the analysis. Results may be sorted by length, position and score. e_additional Additional section e_threshold Threshold score (value from -55.0 to +55.0) Float +5.0 ("", " -threshold=" + str(value))[value is not None and value!=vdef] Value greater than or equal to -55.0 is required value >= -55.0 Value less than or equal to +55.0 is required value <= +55.0 5 Threshold value to discriminate weak from potential PEST motifs. Valid PEST motifs are discriminated into 'poor' and 'potential' motifs depending on this threshold score. By default, the default value is set to +5.0 based on experimental data. Alterations are not recommended since significance is a matter of biology, not mathematics. e_advanced Advanced section e_mono Use monoisotopic weights Boolean 0 ("", " -mono")[ bool(value) ] 6 e_potential Display potential pest motifs Boolean 1 (" -nopotential", "")[ bool(value) ] 7 Decide whether potential PEST motifs should be printed. e_poor Display poor pest motifs Boolean 1 (" -nopoor", "")[ bool(value) ] 8 Decide whether poor PEST motifs should be printed. e_invalid Display invalid pest motifs Boolean 0 ("", " -invalid")[ bool(value) ] 9 Decide whether invalid PEST motifs should be printed. e_map Display pest motifs map Boolean 1 (" -nomap", "")[ bool(value) ] 10 Decide whether PEST motifs should be mapped to sequence. e_output Output section e_outfile Name of the output file (e_outfile) Filename epestfind.e_outfile ("" , " -outfile=" + str(value))[value is not None] 11 Name of file to which results will be written. e_outfile_out outfile_out option EpestfindReport Report e_outfile e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 12 xy_goutfile Name of the output graph Filename epestfind_xygraph ("" , " -goutfile=" + str(value))[value is not None] 13 xy_outgraph_png Graph file Picture Binary e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_graph == "data" "*.dat" auto Turn off any prompting String " -auto -stdout" 14 Programs-5.1.1/netchop.xml0000644000175000001560000001141611643535773014355 0ustar bneronsis netchop 3.1 netChop predict cleavage sites for human proteasome. http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?netchop Morten Nielsen, mniel@cbs.dtu.dk The role of the proteasome in generating cytotoxic T cell epitopes: Insights obtained from improved predictions of proteasomal cleavage. M. Nielsen, C. Lundegaard, S. Brunak, O. Lund, and C. Kesmir. Immunogenetics., 57(1-2):33-41, 2005. http://www.cbs.dtu.dk/services/NetChop/ sequence:protein:motifs sequence:protein:pattern sequence:protein:profiles netchop String "netChop " "netChop " sequence Input Sequence Sequence FASTA " $value" " " + str( value ) 50 >gi|33331470|gb|AAQ10915.1| 55 ISERILSTY A1 MAGRSGDNDEELLKAVRIIKILYKSNPYPEPKGSRQARKNRRRRWRARQRQIDSISERILSTYL GRSTEPVPLQLPPLERLHLDCREDCGTSGTQQSQGVETGVGRPQISVESPVILGSRTKN Method Prediction method (-v). Choice 0 0 1 (defined $value and $value ne $vdef)? " -v $value": "" ( "" , " -v "+ value )[ value is not None and value != vdef ] 10 netchop has been trained using a novel sequence encoding scheme, and an improved neural network training strategy. The netchop 3.0 version has two different network methods that can be used for prediction. Cterm-3.0 and 20S-3.0. threshold Use value as threshold for cleavage sites (-t). Float 0.5 (defined $value and $value ne $vdef)? " -t $value": "" ( "" , " -t "+str(value) )[value is not None and value != vdef] 20 short_output Use short format for output (-s). Boolean 0 ($value)? " -s": "" ( ""," -s ")[ bool( value ) ] 30 results netChop report. Report NetChop "netchop.out" "netchop.out" Programs-5.1.1/sig.xml0000644000175000001560000001262312006241163013455 0ustar bneronsis sig 1.0 sig Multiple Prosite motifs searching Eric Deveaud sig is a program to search multiple occurences of multiple motifs in a set of sequences. ftp://ftp.pasteur.fr/pub/gensoft/projects/sig/ sequence:nucleic:pattern sequence:protein:pattern sig seqfile Protein Sequences Sequence FASTA " $value" " "+str(value) 100 patterns Pattern File (-f) SigPattern AbstractText (defined $value) ? " -f $value" : "" ("", " -f "+str(value))[value is not None] 2
File format :
  • one pattern per line.
  • A pattern consists in motifs definition separated by distance constraints.
  • The format is strictly the following: motif_1 (min,max) motif_2 ... (min,max) motif_n, and so on.
The sig motif syntax follow the syntax used in the PROSITE database :

Pattern syntax

  1. The standard IUPAC one-letter codes for the amino acids are used in PROSITE.
  2. The symbol `x' is used for a position where any amino acid is accepted.
  3. Ambiguities are indicated by listing the acceptable amino acids for a given position, between square brackets `[ ]'. For example: [ALT] stands for Ala or Leu or Thr.
  4. Ambiguities are also indicated by listing between a pair of curly brackets `{ }' the amino acids that are not accepted at a given position. For example: {AM} stands for any amino acid except Ala and Met.
  5. Each element in a pattern is separated from its neighbor by a `-'.
  6. Repetition of an element of the pattern can be indicated by following that element with a numerical value or, if it is a gap ('x'), by a numerical range between parentheses.
    Examples:
    x(3) corresponds to x-x-x
    x(2,4) corresponds to x-x or x-x-x or x-x-x-x
    A(3) corresponds to A-A-A
    Note: You can only use a range with 'x', i.e. A(2,4) is not a valid pattern element.
  7. When a pattern is restricted to either the N- or C-terminal of a sequence, that pattern either starts with a `<' symbol or respectively ends with a `>' symbol. In some rare cases (e.g. PS00267 or PS00539), '>' can also occur inside square brackets for the C-terminal element. 'F-[GSTV]-P-R-L-[G>]' means that either 'F-[GSTV]-P-R-L-G' or 'F-[GSTV]-P-R-L>' are considered.
the sig pattern ( motifs with distance constraints).
  • [RK]-x-V-x-[FW] (0,) F-x-x-[RK]-x-[RK]: no distance constraints are set between motif [RK]-X-V-X-[FW] and motif F-X-X-[RK]-X-[RK]
  • [RK]-x-V-x-[FW] (5,15) F-X-X-[RK]-X-[RK]: motif [RK]-X-V-X-[FW] and motif F-X-X-[RK]-X-[RK] should be separated by a gap whose length is longer or equal to 5 and shorter or equal to 15.
[RK]-x-V-x-[FW] (0,) F-x-x-[RK]-x-[RK] [RK]-x-V-x-[FW] (5,15) F-X-X-[RK]-X-[RK]
overlapping Allows motifs from pattern to be overlapping (-i) Boolean 0 ($value) ? " -i" : "" ("", " -i")[ value ] 2 reverse Searches motifs in ordered and reverse search order, conserving the distance constraints (-r) Boolean 0 ($value) ? " -r" : "" ("", " -r")[ value ] 2
Programs-5.1.1/prettyplot.xml0000644000175000001560000007556111672413202015136 0ustar bneronsis prettyplot EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net prettyplot Draw a sequence alignment with pretty formatting http://bioweb2.pasteur.fr/docs/EMBOSS/prettyplot.html http://emboss.sourceforge.net/docs/themes alignment:multiple:display prettyplot e_input Input section e_sequences sequences option Alignment FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH 1,n ("", " -sequences=" + str(value))[value is not None] 1 e_matrixfile Matrix file Choice mobyle_null mobyle_null EBLOSUM30 EBLOSUM35 EBLOSUM40 EBLOSUM45 EBLOSUM50 EBLOSUM55 EBLOSUM60 EBLOSUM62 EBLOSUM62-12 EBLOSUM65 EBLOSUM70 EBLOSUM75 EBLOSUM80 EBLOSUM85 EBLOSUM90 EBLOSUMN EDNAFULL EDNAMAT EDNASIMPLE EPAM10 EPAM100 EPAM110 EPAM120 EPAM130 EPAM140 EPAM150 EPAM160 EPAM170 EPAM180 EPAM190 EPAM20 EPAM200 EPAM210 EPAM220 EPAM230 EPAM240 EPAM250 EPAM260 EPAM270 EPAM280 EPAM290 EPAM30 EPAM300 EPAM310 EPAM320 EPAM330 EPAM340 EPAM350 EPAM360 EPAM370 EPAM380 EPAM390 EPAM40 EPAM400 EPAM410 EPAM420 EPAM430 EPAM440 EPAM450 EPAM460 EPAM470 EPAM480 EPAM490 EPAM50 EPAM500 EPAM60 EPAM70 EPAM80 EPAM90 SSSUB ("", " -matrixfile=" + str(value))[value is not None and value!=vdef] 2 This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. e_additional Additional section e_residuesperline Number of residues to be displayed on each line Integer 50 ("", " -residuesperline=" + str(value))[value is not None and value!=vdef] 3 The number of residues to be displayed on each line e_resbreak Residues before a space (value greater than or equal to 1) Integer e_residuesperline ("", " -resbreak=" + str(value))[value is not None] Value greater than or equal to 1 is required value >= 1 4 Same as -residuesperline to give no breaks e_ccolours Colour residues by their consensus value. Boolean 1 (" -noccolours", "")[ bool(value) ] 5 e_cidentity Colour to display identical residues (red) String RED ("", " -cidentity=" + str(value))[value is not None and value!=vdef] 6 e_csimilarity Colour to display similar residues (green) String GREEN ("", " -csimilarity=" + str(value))[value is not None and value!=vdef] 7 e_cother Colour to display other residues (black) String BLACK ("", " -cother=" + str(value))[value is not None and value!=vdef] 8 e_docolour Colour residues by table oily, amide etc. Boolean 0 ("", " -docolour")[ bool(value) ] 9 e_shade Shading String ("", " -shade=" + str(value).upper())[value is not None] 10 Set to BPLW for normal shading (black, pale, light, white) so for pair = 1.5,1.0,0.5 and shade = BPLW Residues score Colour 1.5 or over... BLACK (B) 1.0 to 1.5 ... BROWN (P) 0.5 to 1.0 ... WHEAT (L) under 0.5 .... WHITE (W) The only four letters allowed are BPLW, in any order. e_pair Values to represent identical similar related (value greater than or equal to 0.0) String 1.5,1.0,0.5 ("", " -pair=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.0 is required len(value) >= 0.0 3 values separated by commas or spaces are required ( len( value.split(',') ) == 3 ) or ( len( value.split(' ') ) == 3 ) 10 e_identity Only match those which are identical in all sequences. (value greater than or equal to 0) Integer 0 ("", " -identity=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 11 e_box Display prettyboxes Boolean 1 (" -nobox", "")[ bool(value) ] 12 e_boxcol Colour the background in the boxes Boolean 0 ("", " -boxcol")[ bool(value) ] 13 e_boxuse Colour to be used for background. (grey) String GREY ("", " -boxuse=" + str(value))[value is not None and value!=vdef] 14 e_name Display the sequence names Boolean 1 (" -noname", "")[ bool(value) ] 15 e_maxnamelen Margin size for the sequence name. Integer 10 ("", " -maxnamelen=" + str(value))[value is not None and value!=vdef] 16 e_number Display the residue number Boolean 1 (" -nonumber", "")[ bool(value) ] 17 e_listoptions Display the date and options used Boolean 1 (" -nolistoptions", "")[ bool(value) ] 18 e_plurality Plurality check value (totweight/2) Float ("", " -plurality=" + str(value))[value is not None] 19 Half the total sequence weighting e_consensussection Consensus section e_consensus Display the consensus Boolean 0 ("", " -consensus")[ bool(value) ] 20 e_collision Allow collisions in calculating consensus Boolean 1 (" -nocollision", "")[ bool(value) ] 21 e_alternative Use alternative collisions routine Choice 0 0 1 2 3 ("", " -alternative=" + str(value))[value is not None and value!=vdef] 22 Values are 0:Normal collision check. (default) 1:Compares identical scores with the max score found. So if any other residue matches the identical score then a collision has occurred. 2:If another residue has a greater than or equal to matching score and these do not match then a collision has occurred. 3:Checks all those not in the current consensus.If any of these give a top score for matching or identical scores then a collision has occured. e_showscore Print residue scores Integer -1 ("", " -showscore=" + str(value))[value is not None and value!=vdef] 23 e_portrait Set page to portrait Boolean 0 ("", " -portrait")[ bool(value) ] 24 e_output Output section e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 25 e_goutfile Name of the output graph Filename prettyplot_graph ("" , " -goutfile=" + str(value))[value is not None] 26 outgraph_png Graph file Picture Binary e_graph == "png" "*.png" outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" outgraph_data Graph file Text e_graph == "data" "*.dat" auto Turn off any prompting String " -auto -stdout" 27 Programs-5.1.1/clustalO-sequence.xml0000644000175000001560000007074512104230615016277 0ustar bneronsis clustalO-sequence Clustal-Omega: Multiple alignment Add new sequences to an existing alignment. Use this interface to add new sequences to an existing alignment. The profile is converted into a HMM and the un-aligned sequences will be multiply aligned (using the HMM background information) to form a profile; this constructed profile is aligned with the input profile; the columns in each profile (the original one and the one created from the un-aligned sequences) will be kept fixed and the alignment of the two profiles will be written out. The un/aligned sequences must contain at least two sequences. alignment:multiple clustalo input Data Input sequences_input Unaligned set of sequences Protein Sequence FASTA SWISSPROT CODATA NBRF 2,n " --infile=$value" " --infile=" + str( value ) alignment_input Profile (Aligned sequences) Protein Alignment FASTA CLUSTAL STOCKHOLM MSF 1 " --profile1=$value" " --profile1=" + str( value ) seqtype type of sequences Choice auto auto Protein RNA DNA (defined $value and $value neq $vdef)? " --seqtype=$value" : "" ("", " --seqtype="+str(value))[value is not None and value != vdef] Since version 1.1.0 the Clustal-Omega alignment engine can process DNA/RNA. Clustal-Omega tries to guess the sequence type (protein, DNA/RNA), but this can be over-ruled with this flag. dealign Dealign input sequences $alignment_input bool( alignment_input ) Boolean 0 (defined $value and $value) " --dealign " : "" ( "" , " --dealign ")[ value is not None and value !=vdef ] When the sequences are aligned (all sequences have the same length and at least one sequence has at least one gap), then the alignment is turned into a HMM, the sequences are de-aligned and the now un-aligned sequences are aligned using the HMM as an External Profile for External Profile Alignment (EPA). If no EPA is desired use turn on this option. Clustal-Omega reads the file of aligned sequences. It de-aligns the sequences and then re-aligns them. No HMM is produced in the process, no pseudo-count information is transferred. Consequently, the output must be the same as for unaligned output. clustering Clustering In order to produce a multiple alignment Clustal-Omega requires a guide tree which defines the order in which sequences/profiles are aligned. A guide tree in turn is constructed, based on a distance matrix. Conventionally, this distance matrix is comprised of all the pair-wise distances of the sequences. The distance measure Clustal-Omega uses for pair-wise distances of un-aligned sequences is the k-tuple measure [4], which was also implemented in Clustal 1.83 and ClustalW2 [5,6]. If the sequences inputted via -i are aligned Clustal-Omega uses the Kimura-corrected pairwise aligned identities [7]. The computational effort (time/memory) to calculate and store a full distance matrix grows quadratically with the number of sequences. Clustal-Omega can improve this scalability to N*log(N) by employing a fast clustering algorithm called mBed [2]; this option is automatically invoked (default). If a full distance matrix evaluation is desired, then the --full flag has to be set. The mBed mode calculates a reduced set of pair-wise distances. These distances are used in a k-means algorithm, that clusters at most 100 sequences. For each cluster a full distance matrix is calculated. No full distance matrix (of all input sequences) is calculated in mBed mode. If there are less than 100 sequences in the input, then in effect a full distance matrix is calculated in mBed mode, however, no distance matrix can be outputted (see below). Clustal-Omega uses Muscle's [8] fast UPGMA implementation to construct its guide trees from the distance matrix. By default, the distance matrix is used internally to construct the guide tree and is then discarded. By specifying --distmat-out the internal distance matrix can be written to file. This is only possible in --full mode. The guide trees by default are used internally to guide the multiple alignment and are then discarded. By specifying the --guidetree-out option these internal guide trees can be written out to file. Conversely, the distance calculation and/or guide tree building stage can be skipped, by reading in a pre-calculated distance matrix and/or pre-calculated guide tree. These options are invoked by specifying the --distmat-in and/or --guidetree-in flags, respectively. However, distance matrix reading is disabled in the current version. By default, distance matrix and guide tree files are not over-written, if a file with the specified name already exists. In this case Clustal-Omega aborts during the command-line processing stage. In mBed mode a full distance matrix cannot be outputted, distance matrix output is only possible in --full mode. mBed or --full distance mode do not affect the ability to write out guide-trees. Guide trees can be iterated to refine the alignment (see section ITERATION). Clustal-Omega takes the alignment, that was produced initially and constructs a new distance matrix from this alignment. The distance measure used at this stage is the Kimura distance [7]. By default, Clustal-Omega constructs a reduced distance matrix at this stage using the mBed algorithm, which will then be used to create an improved (iterated) new guide tree. To turn off mBed-like clustering at this stage the --full-iter flag has to be set. While Kimura distances in general are much faster to calculate than k-tuple distances, time and memory requirements still scale quadratically with the number of sequences and --full-iter clustering should only be considered for smaller cases ( << 10,000 sequences). [2] Blackshields G, Sievers F, Shi W, Wilm A, Higgins DG. Sequence embedding for fast construction of guide trees for multiple sequence alignment. Algorithms Mol Biol. 2010 May 14;5:21. [4] Wilbur and Lipman, 1983; PMID 6572363 [5] Thompson JD, Higgins DG, Gibson TJ. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673-4680. [6] Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG. (2007). Clustal W and Clustal X version 2.0. Bioinformatics, 23, 2947-2948. [7] Kimura M (1980). "A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences". Journal of Molecular Evolution 16: 111–120. distmat_out Pairwise distance matrix output file Filename (defined $value and $value)? " --distmat-out=$value ":"" ( "" , " --distmat-out="+str(value))[ value is not None ] the full option must be set $full full guidetree_in Guide tree input file (--guidetree-in) Tree NEWICK (defined $value )? " --guidetree-in= $value" : "" ( "" , " --guidetree-in="+str(value))[ value is not None ] guidetree_out Guide tree output file (--guidetree-out) Filename (defined $value and $value)? " --guidetree-out=$value ":"" ( "" , " --guidetree-out="+str(value))[ value is not None ] full Use full distance matrix for guide-tree calculation (slow; mBed is default) (--full) Boolean 0 (defined $full and $ full)? " --full ": "" ( "" , " --full ")[ value is not None and value ] full_iter Use full distance matrix for guide-tree calculation during iteration (mBed is default) (--full-iter) Boolean 0 (defined $full and $ full)? " --full-iter ": "" ( "" , " --full-iter ")[ value is not None and value ] output_format Alignment Output output_format alignment output format Choice fasta fasta clustal msf phylip stockholm vienna (defined $value and $value ne $vdef)? " --outfmt=$value" : "" ( "" , " --outfmt=" + value )[ value is not None and value != vdef ] iteration Iteration By default, Clustal-Omega calculates (or reads in) a guide tree and performs a multiple alignment in the order specified by this guide tree. This alignment is then outputted. Clustal-Omega can 'iterate' its guide tree. The hope is that the (Kimura) distances, that can be derived from the initial alignment, will give rise to a better guide tree, and by extension, to a better alignment. A similar rationale applies to HMM-iteration. MSAs in general are very 'vulnerable' at their early stages. Sequences that are aligned at an early stage remain fixed for the rest of the MSA. Another way of putting this is: 'once a gap, always a gap'. This behaviour can be mitigated by HMM iteration. An initial alignment is created and turned into a HMM. This HMM can help in a new round of MSA to 'anticipate' where residues should align. This is using the HMM as an External Profile and carrying out iterative EPA. In practice, individual sequences and profiles are aligned to the External HMM, derived after the initial alignment. Pseudo-count information is then transferred to the (internal) HMM, corresponding to the individual sequence/profile. The now somewhat 'softened' sequences/profiles are then in turn aligned in the order specified by the guide tree. Pseudo-count transfer is reduced with the size of the profile. Individual sequences attain the greatest pseudo-count transfer, larger profiles less so. Pseudo-count transfer to profiles larger than, say, 10 is negligible. The effect of HMM iteration is more pronounced in larger test sets (that is, with more sequences). Both, HMM- and guide tree-iteration come at a cost of increasing the run-time. One round of guide tree iteration adds on (roughly) the time it took to construct the initial alignment. If, for example, the initial alignment took 1min, then it will take (roughly) 2min to iterate the guide tree once, 3min to iterate the guide tree twice, and so on. HMM-iteration is more costly, as each round of iteration adds three times the time required for the alignment stage. For example, if the initial alignment took 1min, then each additional round of HMM iteration will add on 3min; so 4 iterations will take 13min (=1min+4*3min). The factor of 3 stems from the fact that at every stage both intermediate profiles have to be aligned with the background HMM, and finally the (softened) HMMs have to be aligned as well. All times are quoted for single processors. By default, guide tree iteration and HMM-iteration are coupled. This means, at each iteration step both, guide tree and HMM, are re-calculated. This is invoked by setting the --iter flag. For example, if --iter=1, then first an initial alignment is produced (without external HMM background information and using k-tuple distances to calculate the guide tree). This initial alignment is then used to re-calculate a new guide tree (using Kimura distances) and to create a HMM. The new guide tree and the HMM are then used to produce a new MSA. Iteration of guide tree and HMM can be de-coupled. This means that the number of guide tree iterations and HMM iterations can be different. This can be done by combining the --iter flag with the --max-guidetree-iterations and/or the --max-hmm-iterations flag. The number of guide tree iterations is the minimum of --iter and --max-guidetree-iterations, while the number of HMM iterations is the minimum of --iter and --max-hmm-iterations. If, for example, HMM iteration should be performed 5 times but guide tree iteration should be performed only 3 times, then one should set --iter=5 and --max-guidetree-iterations=3. All three flags can be specified at the same time (however, this makes no sense). It is not sufficient just to specify --max-guidetree-iterations and --max-hmm-iterations but not --iter. If any iteration is desired --iter has to be set. iterations Number of (combined guide-tree/HMM) iterations (--iter) Integer (defined $value)? " --iter=$value ": "" ( "" , " --iter="+str(value) )[ value is not None ] if iterations= 2. Clustal-Omega reads the input file, creates a UPGMA guide tree built from k-tuple distances, and performs an initial alignment. This initial alignment is converted into a HMM and a new guide tree is built from the Kimura distances of the initial alignment. The un-aligned sequences are then aligned (for the second time but this time) using pseudo-count information from the HMM created after the initial alignment (and using the new guide tree). This second alignment is then again converted into a HMM and a new guide tree is constructed. The un-aligned sequences are then aligned (for a third time), again using pseudo-count information of the HMM from the previous step and the most recent guide tree. The final alignment is written to screen. max_guidetree_iterations Maximum number guidetree iterations (--max-guidetree-iterations) Integer (defined $value)? " --max-guidetree-iterations=$value ": "" ( "" , " --max-guidetree-iterations="+str(value) )[ value is not None ] If iterations= 5 and the "Maximum number guidetree iterations" is set to 1. Clustal-Omega reads the input file, creates a UPGMA guide tree built from k-tuple distances, and performs an initial alignment. This initial alignment is converted into a HMM and a new guide tree is built from the Kimura distances of the initial alignment. The un-aligned sequences are then aligned (for the second time but this time) using pseudo-count information from the HMM created after the initial alignment (and using the new guide tree). For the last 4 iterations the guide tree is left unchanged and only HMM iteration is performed. This means that intermediate alignments are converted to HMMs, and these intermediate HMMs are used to guide the MSA during subsequent iteration stages. max_hmm_iterations Maximum number of HMM iterations (--max-hmm-iterations) Integer (defined $value)? " --max-hmm-iterations=$value ": "" ( "" , " --max-hmm-iterations="+str(value) )[ value is not None ] miscellaneous Miscellaneous auto Set options automatically (might overwrite some of your options) (--auto) Boolean 0 (defined $value and $value)? " --auto ": "" ( "" , " --auto ")[value is not None and value] Users may feel unsure which options are appropriate in certain situations even though using ClustalO without any special options should give you the desired results. The --auto flag tries to alleviate this problem and selects accuracy/speed flags according to the number of sequences. For all cases will use mBed and thereby possibly overwrite the --full option. For more than 1,000 sequences the iteration is turned off as the effect of iteration is more noticeable for 'larger' problems. Otherwise iterations are set to 1 if not already set to a higher value by the user. Expert users may want to avoid this flag and exercise more fine tuned control by selecting the appropriate options manually. verbosity 100 String " -v --force --log=clustalO_log" " -v --force --log=clustalO_log" alignment_output Multiple Sequence Alignment Protein Alignment FASTA CLUSTAL MSF PHILIPI STOCKHOLM FASTA "clustalO-sequence.out" "clustalO-sequence.out" guidetree_outfile Guide tree output file defined $guidetree_out guidetree_out is not None Tree NEWICK $guidetree_out guidetree_out distmat_outfile Pairwise distance matrix output file defined $distmat_out distmat_out is not None DistanceMatrix AbstractText $distmat_out distmat_out logfile Clustal omega log file ClustalOReport Report "clustalO_log" "clustalO_log" Programs-5.1.1/align_reorder.xml0000644000175000001560000000424111767601016015516 0ustar bneronsis align_reorder alignment entries reordering Reorders the entries of an MSA
This tool reorders the entries of an MSA according to a reference set of sequences.
Néron, B.
alignment:formatter align_reorder fasta_align alignment Alignment FASTA " -a $value" " -a " + str( value ) 2 fasta_sequences sequences Sequence FASTA " -s $value" " -s " + str( value ) 1 reordered_alignment reordered alignment Alignment FASTA "align_reorder.out" "align_reorder.out"
Programs-5.1.1/ktreedist.xml0000644000175000001560000002336711767572177014731 0ustar bneronsis ktreedist Version 1.0 Ktreedist Calculation of the minimum branch length distance (K tree score) between phylogenetic trees Victor Soria-Carrasco, Jose Castresana Soria-Carrasco, V., Talavera, G., Igea, J., and Castresana, J. (2007). The K tree score: quantification of differences in the relative branch length and topology of phylogenetic trees. Bioinformatics 23, 2954-2956. http://bioweb2.pasteur.fr/docs/ktreedist/ http://molevol.cmima.csic.es/castresana/Ktreedist.html http://molevol.cmima.csic.es/castresana/Ktreedist.html phylogeny:tree_analyser ktreedist input Input The program is supposed to run with one reference tree and one or several comparison trees. If the reference file contains more than one tree, only the first one will be used. ref_tree Reference Tree Tree NEWICK NEXUS "-rt $value" " -rt "+str(value) 2 This is the file that contains the tree to which you want to compare the comparison tree/s. Only NEWICK or NEXUS tree format are accepted by ktreedist. The input tree must be write in one line. for nexus tree the tree itself must be write in one line #NEXUS begin trees; tree 'name' =(1:0.212481,8:0.297838,(9:0.222729,((6:0.201563,7:0.194547):0.282035,(4:1.146091,(3:1.008881,(10:0.384105,(2:0.235682,5:0.353432):0.323680):0.103875):0.413540):0.254687):0.095341):0.079254):0.000000; end; comp_tree Comparison Tree Tree NEWICK NEXUS "-ct $value " " -ct "+str(value)+" " This is the file that contains the tree or the set of trees you want to compare to the reference tree. They will be scaled to match as much as possible the reference tree. Only NEWICK or NEXUS tree format are accepted by ktreedist. The input tree must be write in one line. for nexus tree the tree itself must be write in one line (1:0.212481,8:0.297838,(9:0.222729,((6:0.201563,7:0.194547):0.282035,(4:1.146091,(3:1.008881,(10:0.384105,(2:0.235682,5:0.353432):0.323680):0.103875):0.413540):0.254687):0.095341):0.079254):0.000000; 3 options Options 4 output_res Output file for table of results (-t) Boolean 0 ($value) ? " -t" ("", " -t")[ value ] A file containing a table of results is generated. output_part Output file for table of partitions (-p) Boolean 0 ($value) ? " -p" ("", " -p")[ value ] A file containing a table of partitions for each comparison tree is generated. output_comp Output file for comparison tree/s after scaling (-s) Boolean 0 ($value) ? " -s" ("", " -s")[ value ] A file containing the scaled comparison tree/s is generated. output_rf Output symmetric difference (Robinson-Foulds) (-r) Boolean 0 ($value) ? " -r" ("", " -r")[ value ] The symmetric difference is defined as the number of partitions that are not shared between two trees, that is, the number of partitions of the first tree that are not present in the second tree plus the number of partitions of the second tree that are not present in the first tree. output_nbpf Output number of partitions in the comparison tree/s (-n) Boolean 0 ($value) ? " -n" ("", " -n")[ value ] The knowledge of the number of partitions may be useful to detect trees with polytomies. output_all Equivalent to all options (-a) Boolean 0 ($value) ? " -a" ("", " -a")[ value ] Equivalent to all options. output_res_f Output file for table of results Text $output_res or $output_all output_res or output_all "*.tab" "*.tab" output_part_f Output file for table of partitions Text $output_part or $output_all output_part or output_all "*.part" "*.part" output_comp_f Output file for comparison tree/s after scaling Tree $output_comp or $output_all output_comp or output_all "*.scaled" "*.scaled" Programs-5.1.1/blast2taxonomy.xml0000644000175000001560000002604712125257722015700 0ustar bneronsis blast2taxonomy 2.0 blast2taxonomy Blast Taxonomy report C. Maufrais http://sourceforge.net/p/krona/home/krona/ Krona-2.0: Ondov BD, Bergman NH, and Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30; 12(1):385. database:search:display blast2taxonomy infile Blast output file BlastTextReport Report " $value" " " + str(value) 10 display Display options 1 single Report one branch per organism (-s) Boolean 0 ($value) ? " -s" : "" ("" , " -s") [value] All hit are display in tree by default. acc Report accession number (-a) Boolean 0 ($value) ? " -a" : "" ("" , " -a") [value] node_name Lowest common ancestor name (-n). String ($value) ? " -n $value" : "" ("" , " -n " + str(value)) [value is not None] filterevalue Select hit blast with e-value lower than value (-E). Float 10.0 ($value) ? " - $value" : "" ("" , " -E " + str(value)) [value and value != vdef] evalue Report score and e-value Blast hit (-e). Boolean 0 ($value) ? " -e" : "" ("" , " -e") [value] perlen Report ratio of Blast hit length per query length (-l). Boolean 0 ($value) ? " -l" : "" ("" , " -l") [value] output Output option htmlKronaOutput krona.2-0 representation of HSPs Boolean 1 Abundance is report in html file with krona Specification and Krona javascript library (-k). ('', ' -k blast_kronaView.html' )[value] 30 kronahtmloutfile HTML Output file(s) KronaHtmlReport Report htmlKronaOutput "blast_kronaView.html" xlsoutput Tabular output (-x) Boolean 0 ($value)? " -x" : "" ("" , " -x") [value] htmloutput Html output (-w) Boolean 0 ($value)? " -w" : "" ("" , " -w") [value] dndoutput Taxonomy report in Newick format (-t) Boolean 0 ($value)? " -t" : "" ("" , " -t") [value] outputfile Output file name (-o) Filename (defined $value)? " -o $value" : "" ("" , " -o " + str(value)) [value is not None ] outfile_name Output file Blast2taxonomyReport Report not $htmloutput and $outputfile not htmloutput and outputfile $outputfile str(outputfile) outfile Output file Blast2taxonomyReport Report not ($htmloutput and $outputfile) not (htmloutput and outputfile) "blast2taxonomy.out" "blast2taxonomy.out" htmloutfile Html output file Blast2taxonomyHtmlReport Report $htmloutput htmloutput (defined $outputfile)? "$outputfile.html": "blast2taxonomy.html" ("blast2taxonomy.html", str(outputfile)+".html")[outputfile is not None] htmloutfilealn Alignment Html output file AlnHtmlReport Report $htmloutput htmloutput "alignment.html" "alignment.html" dndoutfile Newick tree file Tree NEWICK $dndoutput dndoutput (defined $outputfile)? "$outputfile.dnd": "blast2taxonomy.dnd" ("blast2taxonomy.dnd", str(outputfile)+".dnd")[outputfile is not None] Programs-5.1.1/entret.xml0000644000175000001560000001057112072525233014202 0ustar bneronsis entret EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net entret Retrieves sequence entries from flatfile databases and files http://bioweb2.pasteur.fr/docs/EMBOSS/entret.html http://emboss.sourceforge.net/docs/themes sequence:edit entret e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_advanced Advanced section e_firstonly Read one sequence and stop Boolean 0 ("", " -firstonly")[ bool(value) ] 2 e_output Output section e_outfile Name of the output file (e_outfile) Filename entret.e_outfile ("" , " -outfile=" + str(value))[value is not None] 3 e_outfile_out outfile_out option EntryFullText AbstractText e_outfile auto Turn off any prompting String " -auto -stdout" 4 Programs-5.1.1/hmmscan.xml0000644000175000001560000010141511767572177014350 0ustar bneronsis hmmscan HMMSCAN Search sequence(s) against pfam a profile HMM database hmmscan reads sequence(s) from seqfile and compares it against all the HMMs in pfam database looking for significantly similar sequence matches. The output consists of three sections: a ranked list of the best scoring HMMs, a list of the best scoring domains in order of their occurrence in the sequence, and alignments for all the best scoring domains. A sequence score may be higher than a domain score for the same sequence if there is more than one domain in the sequence; the sequence score takes into account all the domains. All sequences scoring above the -E and -T cutoffs are shown in the first list, then every domain found in this list is shown in the second list of domain hits. If desired, E-value and score thresholds may also be applied to the domain list using the --domE and --domT options. hmm:database:search database:search:hmm hmmscan seqfile Sequence file Sequence FASTA " $value" " "+str(value) 3 HMMDB HMM database Choice Pfam-A.hmm Pfam-A.hmm Pfam-B.hmm " $value" " "+str(value) 2 thresholds_report Options for reporting thresholds 1 E_value_cutoff E_value cutoff (-E) Float not defined $Bit_cutoff and $model_specific ne '--cut_ga' and $model_specific ne '--cut_nc' Bit_cutoff is None and model_specific != '--cut_ga' and model_specific != '--cut_nc' 10.0 (defined $value and $value != $vdef) ? " -E $value" : "" ( "" , " -E " + str(value) )[ value is not None and value != vdef] 1 In the per-target output, report target profiles with an E-value of <= value. The default is 10.0, meaning that on average, about 10 false positives will be reported per query, so you can see the top of the 'noise' and decide for yourself if it's really noise. Bit_cutoff Bit score cutoff (-T) Float $E_value_cutoff == 10.0 and $model_specific ne '--cut_ga' and $model_specific ne '--cut_nc' E_value_cutoff == 10.0 and model_specific != '--cut_ga' and model_specific != '--cut_nc' (defined $value)? " -T $value" : "" ( "" , " -T " + str(value) )[ value is not None ] 1 Instead of thresholding per-profile output on E-value, instead report target profiles with a bit score of >= value. domE E-value cutoff for the per-domain ranked hit list (--domE) Float not defined $domT and $model_specific ne '--cut_ga' and $model_specific ne '--cut_nc' domT is None and model_specific != '--cut_ga' and model_specific != '--cut_nc' 10.0 (defined $value and $value != $vdef) ? " --domE $value" : "" ( "" , " --domE " + str(value) )[ value is not None and value !=vdef ] In the per-domain output, for target profiles that have already satisfied the perprofile reporting threshold, report individual domains with a conditional E-value of <= value. The default is 10.0. A 'conditional' E-value means the expected number of additional false positive domains in the smaller search space of those comparisons that already satisfied the per-profile reporting threshold (and thus must have at least one homologous domain already). domT Bit score cutoff for the per-domain ranked hit list (--domT) Float $domE == 10.0 and $model_specific ne '--cut_ga' and $model_specific ne '--cut_nc' domE == 10.0 and model_specific != '--cut_ga' and model_specific != '--cut_nc' (defined $value) ? " --domT $value" : "" ( "" , " --domT " + str(value) )[ value is not None ] Instead of thresholding per-domain output on E-value, instead report domains with a bit score of >= value. thresholds_inclusion Options controlling inclusion (significance) thresholds. 1 'Inclusion' thresholds are stricter than reporting thresholds. Inclusion thresholds control which hits are considered to be reliable enough to be included in an output alignment or a subsequent search round. In hmmscan, which does not have any alignment output nor any iterative search steps, inclusion thresholds have little effect. They only affect what domains get marked as significant ('!') or questionable ('?') in domain output. incE Include sequences lower than this E-value threshold (--incE) Float not defined $incT and $model_specific ne '--cut_ga' incT is None and model_specific != '--cut_ga' 0.01 (defined $value and value != vdef) ? " --incE $value" : "" ( "" , " --incE " + str(value) )[ value is not None and value != vdef] Use an E-value of <= value as the per-target inclusion threshold. The default is 0.01, meaning that on average, about 1 false positive would be expected in every 100 searches with different query sequences. incdomE Include domains lower than this E-value threshold (--incdomE) Float defined $incdomT and not defined model_specific incdomT is not None and model_specific is None 0.01 (defined $value and value != vdef) ? " --incdomE $value" : "" ( "" , " --incdomE " + str(value) )[ value is not None and value != vdef] Use a conditional E-value of <= value as the per-domain inclusion threshold, in targets that have already satisfied the overall per-target inclusion threshold. The default is 0.01. incT Include sequences upper than this score threshold (--incT) Float $incE == 0.01 and $model_specific ne '--cut_ga' incE == 0.01 and model_specific != '--cut_ga' (defined $value) ? " --incT $value" : "" ( "" , " --incT " + str(value) )[ value is not None ] Instead of using E-values for setting the inclusion threshold, instead use a bit score of >= the value as the per-target inclusion threshold. It would be unusual to use bit score thresholds with hmmscan, because you don't expect a single score threshold to work for different profiles; different profiles have slightly different expected score distributions. incdomT Include domans upper than this score threshold (--incdomT) Float $incdomE == 0.01 and not defined $model_specific incdomE == 0.01 and model_specific is None (defined $value) ? " --incdomT $value" : "" ( "" , " --incdomT " + str(value) )[ value is not None ] Instead of using E-values, instead use a bit score of >= value as the per-domain inclusion threshold. As with --incT above, it would be unusual to use a single bit score threshold in hmmscan. model_specific Options for model-specific thresholding Choice not defined $Bit_cutoff and not $E_value_cutoff == 10.0 and not defined $incdomT and $incdomE == 0.01 not Bit_cutoff and E_value_cutoff == 10.0 and incdomT is None and incdomE == 0.01 null null --cut_ga --cut_nc --cut_tc (defined $value and $value ne $vdef) ? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] Curated profile databases may define specific bit score thresholds for each profile, superseding any thresholding based on statistical significance alone. To use these options, the profile must contain the appropriate (GA, TC, and/or NC) optional score threshold annotation; this is picked up by hmmbuild from Stockholm format alignment files. Each thresholding option has two scores: the per-sequence threshold x1 value and the per-domain threshold x2 value. These act as if -T x1 --incT x1 --domT x2 --incdomT x2 has been applied specifically using each model's curated thresholds. cut ga: Use the GA (gathering) bit scores in the model to set per-sequence (GA1) and per-domain (GA2) reporting and inclusion thresholds. GA thresholds are generally considered to be the reliable curated thresholds defining family membership; for example, in Pfam, these thresholds define what gets included in Pfam Full alignments based on searches with Pfam Seed models. cut_nc: Use the NC (noise cutoff) bit score thresholds in the model to set per-sequence (NC1) and per-domain (NC2) reporting and inclusion thresholds. NC thresholds are generally considered to be the score of the highest-scoring known false positive. cut_tc: Use the NC (trusted cutoff) bit score thresholds in the model to set per-sequence (TC1) and per-domain (TC2) reporting and inclusion thresholds. TC thresholds are generally considered to be the score of the lowest-scoring known true positive that is above all known false positives. acceleration Options controlling acceleration heuristics 1 HMMER3 searches are accelerated in a three-step filter pipeline: the MSV filter, the Viterbi filter, and the Forward filter. The first filter is the fastest and most approximate; the last is the full Forward scoring algorithm. There is also a 'bias filter' step between MSV and Viterbi. Targets that pass all the steps in the acceleration pipeline are then subjected to 'postprocessing' -- domain identification and scoring using the Forward/Backward algorithm. Changing filter thresholds only removes or includes targets from consideration; changing filter thresholds does not alter bit scores, E-values, or alignments, all of which are determined solely in 'postprocessing'. max Turn all heuristic filters off (less speed, more power) (--max) Boolean 0 ($value) ? " --max" : "" ( "" , " --max " )[ value ] Turn off all filters, including the bias filter, and run full Forward/Backward postprocessing on every target. This increases sensitivity somewhat, at a large cost in speed. F1 Stage 1 (MSV) threshold Float not max not max 0.02 (defined $value and $value != $vdef ) ? " --F1 $value" : "" ( "" , " --F1 " + str(value) )[ value is not None and value != vdef] Set the P-value threshold for the MSV filter step. The default is 0.02, meaning that roughly 2% of the highest scoring nonhomologous targets are expected to pass the filter. F2 Stage 1 (Vit) threshold Float not max not max 0.001 (defined $value and $value != $vdef ) ? " --F2 $value" : "" ( "" , " --F2 " + str(value) )[ value is not None and value != vdef] Set the P-value threshold for the Viterbi filter step. The default is 0.001. F3 Stage 1 (Fwd) threshold Float not max not max 0.00001 (defined $value and $value != $vdef ) ? " --F3 $value" : "" ( "" , " --F3 " + str(value) )[ value is not None and value != vdef] Set the P-value threshold for the Forward filter step. The default is 1e-5. nobias Turn off composition bias filter (--nobias) Boolean not max not max 0 ($value) ? " --nobias" : "" ( "" , " --nobias " )[ value ] Turn off the bias filter. This increases sensitivity somewhat, but can come at a high cost in speed, especially if the query has biased residue composition (such as a repetitive sequence region, or if it is a membrane protein with large regions of hydrophobicity). Without the bias filter, too many sequences may pass the filter with biased queries, leading to slower than expected performance as the computationally intensive Forward/Backward algorithms shoulder an abnormally heavy load. expert Other expert options 1 nonull2 Turn off biased composition score corrections (--nonull2) Boolean 0 ($value) ? " --nonull2" : "" ( "" , " --nonull2 " )[ value ] Turn off the 'null2' score corrections for biased composition. E_value_calculation Control of E_value calculation (-Z) Integer (defined $value) ? " -Z $value" : "" ( "" , " -Z " + str(value) )[ value is not None ] 1 Assert that the total number of targets in your searches is the value, for the purposes of per-sequence E-value calculations, rather than the actual number of targets seen. domZ Set Z score of significant sequences, for domain E-value calculation (--domZ) Float (defined $value) ? " --domZ $value" : "" ( "" , " --domZ " + str(value) )[ value is not None ] Assert that the total number of targets in your searches is the value, for the purposes of per-domain conditional E-value calculations, rather than the number of targets that passed the reporting thresholds. seed Set RNG seed number (--seed) Integer 42 (defined $value and $value != $vdef) ? " --seed $value " : "" ( "" , " --seed " + str(value) )[ value is not None and value !=vdef ] Set the random number seed to value. Some steps in postprocessing require Monte Carlo simulation. The default is to use a fixed seed (42), so that results are exactly reproducible. Any other positive integer will give different (but also reproducible) results. A choice of 0 uses a 'randomly chosen' seed. Enter a value >= 0 0 <= $value 0 <= value controlOutput Options controlling output outfile_name Name of the sequence(s) file (-o) Filename (defined $value ) ? " -o $value" : "" ( " " , " -o " + str(value) )[ value is not None ] 1 output_file_name Output file Text defined $outfile_name outfile_name is not None $outfile_name str(outfile_name) perseqfile_name File name of parseable table of per-sequence hits (--tblout) Filename (defined $value) ? " --tblout $value" : "" ( "" , " --tblout " + str(value) )[ value is not None ] Save a simple tabular (space-delimited) file summarizing the 'per-target' output, with one data line per homologous target model found 1 output_perseqfile_name Output parseable table of per-sequence hits Text $perseqfile_name perseqfile_name $perseqfile_name str(perseqfile_name) perdomfile_name File name of parseable table of per-domain hits (--domtblout) Filename (defined $value) ? " --domtblout $value" : "" ( "" , " --domtblout " + str(value) )[ value is not None ] 1 Save a simple tabular (space-delimited) file summarizing the 'per-domain' output, with one data line per homologous domain detected in a query sequence for each homologous model. acc Prefer accessions over names in output Boolean 0 ($value) ? " --acc " : "" ( "" , " --acc " )[ value ] Use accessions instead of names in the main output, where available for profiles and/or sequences noali Don't output alignments, so output is smaller Boolean 0 ($value) ? " --noali " : "" ( "" , " --noali " )[ value ] Omit the alignment section from the main output. This can greatly reduce the output volume. notextw Unlimit ASCII text output line width (--notextw) Boolean textw == 120 textw == 120 0 ($value) ? " --notextw " : "" ( "" , " --notextw " )[ value ] Unlimit the length of each line in the main output. The default is a limit of 120 characters per line, which helps in displaying the output cleanly on terminals and in editors, but can truncate target profile description lines. textw Set max width of ASCII text output lines (--textw) Integer 120 (defined $value and $value != $vdef) ? " --textw $value " : "" ( "" , " --textw " + str(value) )[ value is not None and value !=vdef ] Set the main output's line length limit to value> characters per line. The default is 120. output_perdomfile_name Output parseable table of per-domain hits Text $perdomfile_name perdomfile_name $perdomfile_name str(perdomfile_name) Programs-5.1.1/hmmemit.xml0000644000175000001560000002517011767572177014365 0ustar bneronsis hmmemit HMMEMIT Generate sequences from a profile HMM hmmemit reads an HMM file from hmmfile and generates a number of sequences from it; or, if the -c option is selected, generate a single majority-rule consensus. This can be useful for various applications in which one needs a simulation of sequences consistent with a sequence family consensus.By default, hmmemit generates 10 sequences and outputs them in FASTA (unaligned) format. hmm:building hmmcmd HMM emit command String "hmmemit" "hmmemit" 0 hmmfile HMM file HmmProfile AbstractText " $hmmfile" " " + str(hmmfile) 2 outfile_name Name of the synthetic sequences file (-o) Filename (defined $value) ? " -o $value" : "" ( "" , " -o " + str(value) )[ value is not None ] 1 Save the synthetic sequences to file rather than writing them to stdout. output_file Output file Sequence FASTA defined $outfile_name outfile_name is not None $outfile_name str(outfile_name) out_file Output file Sequence FASTA not defined $outfile_name outfile_name is None "hmmemit.out" "hmmemit.out" consensus Consensus sequence (-c) Boolean $number == 1 and not sample number == 1 and not sample 0 ($value) ? " -c" : "" ( "" , " -c" )[ value ] 1 Emit a consensus sequence, instead of sampling a sequence from the profile HMM's probability distribution. The consensus sequence is formed by selecting the maximum probability residue at each match state. number Number of sequences to sample (-N) Integer 1 (defined $value and $value != $vdef) ? " -N $value " : "" ( "" , " -N " + str(value) )[ value is not None and value != vdef] 1 Enter a value > 0 0 < $value 0 < value Sample x sequences, rather than just one. sample Sample from profile, not core model (-p) Boolean 0 ( $value ) ? " -p " : "" ( "" , " -p " )[ value ] 1 Sample sequences from the implicit profile, not from the core model. The core model consists only of the homologous states (between the begin and end states of a HMMER Plan7 model). The profile includes the nonhomologous N, C, and J states, local/glocal and uni/multihit algorithm configuration, and the target length model. Therefore sequences sampled from a profile may include nonhomologous as well as homologous sequences, and may contain more than one homologous sequence segment. By default, the profile is in multihit local mode, and the target sequence length is configured for L=400. To change these defaults, see Options Controlling Emission from Profiles, below. controlP Options controlling emission from profiles (with -p) define $sample sample len Set expected length from profile (-L) Integer 400 (defined $value and $value != $vdef) ? " -L $value " : "" ( "" , " -L " + str(value) )[ value is not None and value != vdef] 1 Configure the profile's target sequence length model to generate a mean length of approximately the value rather than the default of 400. local Configure profile mode Choice null null local unilocal glocal uniglocal (defined $value) ? " --$value " : "" ( "" , " --" + str(value) )[ value is not None ] seed Seed number (--seed) Integer 0 (defined $value and $value != $vdef) ? " --seed $value " : "" ( "" , " --seed " + str(value) )[ value is not None and value !=vdef ] 1 Seed the random number generator with the value, an integer >= 0. If the value is nonzero, any stochastic simulations will be reproducible; the same command will give the same results. If the value is 0, the random number generator is seeded arbitrarily, and stochastic simulations will vary from run to run of the same command. The default is 0: use an arbitrary seed, so different hmmemit runs will generate different samples. Enter a value >= 0 0 <= $value 0 <= value Programs-5.1.1/cons.xml0000644000175000001560000004301412072525233013641 0ustar bneronsis cons EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net cons Create a consensus sequence from a multiple alignment http://bioweb2.pasteur.fr/docs/EMBOSS/cons.html http://emboss.sourceforge.net/docs/themes alignment:consensus cons e_input Input section e_sequence sequence option Alignment FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH 1,n ("", " -sequence=" + str(value))[value is not None] 1 File containing a sequence alignment. e_datafile Scoring matrix Choice mobyle_null mobyle_null EBLOSUM30 EBLOSUM35 EBLOSUM40 EBLOSUM45 EBLOSUM50 EBLOSUM55 EBLOSUM60 EBLOSUM62 EBLOSUM62-12 EBLOSUM65 EBLOSUM70 EBLOSUM75 EBLOSUM80 EBLOSUM85 EBLOSUM90 EBLOSUMN EDNAFULL EDNAMAT EDNASIMPLE EPAM10 EPAM100 EPAM110 EPAM120 EPAM130 EPAM140 EPAM150 EPAM160 EPAM170 EPAM180 EPAM190 EPAM20 EPAM200 EPAM210 EPAM220 EPAM230 EPAM240 EPAM250 EPAM260 EPAM270 EPAM280 EPAM290 EPAM30 EPAM300 EPAM310 EPAM320 EPAM330 EPAM340 EPAM350 EPAM360 EPAM370 EPAM380 EPAM390 EPAM40 EPAM400 EPAM410 EPAM420 EPAM430 EPAM440 EPAM450 EPAM460 EPAM470 EPAM480 EPAM490 EPAM50 EPAM500 EPAM60 EPAM70 EPAM80 EPAM90 SSSUB ("", " -datafile=" + str(value))[value is not None and value!=vdef] 2 This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. e_additional Additional section e_plurality Plurality check value Float ("", " -plurality=" + str(value))[value is not None] 3 Set a cut-off for the number of positive matches below which there is no consensus. The default plurality is taken as half the total weight of all the sequences in the alignment. e_identity Required number of identities at a position (value greater than or equal to 0) Integer 0 ("", " -identity=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 4 Provides the facility of setting the required number of identities at a site for it to give a consensus at that position. Therefore, if this is set to the number of sequences in the alignment only columns of identities contribute to the consensus. e_setcase Define a threshold above which the consensus is given in uppercase Float ("", " -setcase=" + str(value))[value is not None] 5 Sets the threshold for the positive matches above which the consensus is is upper-case and below which the consensus is in lower-case. e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename cons.e_outseq ("" , " -outseq=" + str(value))[value is not None] 6 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 7 e_outseq_out outseq_out option Sequence e_outseq e_name Name of the consensus sequence String ("", " -name=" + str(value))[value is not None] 8 auto Turn off any prompting String " -auto -stdout" 9 Programs-5.1.1/blast2.xml0000644000175000001560000012715211772052071014075 0ustar bneronsis blast2 BLAST2 NCBI BLAST, with gaps Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaeffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res. 25:3389-3402. http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=handbook.chapter.ch16 http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/tut1.html database:search:homology blast_init Blast initiation String "blastall" "blastall" 1 blast2 Blast program (-p) Choice null null blastn blastx tblastx blastp tblastn " -p $value" " -p "+ str(value) 2 - Blastp compares an amino acid query sequence against a protein sequence database; - Blastn compares a nucleotide query sequence against a nucleotide sequence database; - Blastx compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database; - tBlastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands). - tBlastx compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. - psitBlastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands) using a position specific matrix created by PSI-BLAST. db Database 3 protein_db Protein db (-d) Choice $blast2 =~ /^blast[px]$/ blast2 in [ "blastx" , "blastp" ] null " -d $value" " -d "+ str(value) Choose a protein db for blastp or blastx. Please note that Swissprot usage by and for commercial entities requires a license agreement. nucleotid_db Nucleotid db (-d) Choice $blast2 =~ /^(blastn|tblast[nx]|psitblastn)$/ blast2 in [ "blastn" , "tblastx", "tblastn" , "psitblastn" ] null " -d $value" " -d "+ str(value) Choose a nucleotide db for blastn, tblastn or tblastx query Query Sequence 4 query_seq Query (-i) Sequence FASTA 1,n " -i $query" " -i "+ str(query_seq) Read (first, query) sequence or set from file start_region Start of required region in query sequence (-L) Integer Location on query sequence end_region End of required region in query sequence (-L) Integer defined $start_region start_region is not None (defined $value) ? " -L \"$start_region $value\"" : " -L \"$start_region\"" (' -L "%s"' % (str(start_region)), ' -L "%s %s"' % (str(start_region), str(value)))[value is not None] Location on query sequence concat Number of concatenated queries (blastn or tblastn) (-B) Integer $blast2 =~ /^t?blastn$/ blast2 in [ "blastn" , "tblastn" ] (defined $value) ? " -B $value" : "" ("" , " -B "+str(value))[value is not None] scoring_opt Scoring options 5 open_a_gap Cost to open a gap (-G) Float (defined $value) ? " -G $value" : "" ("" , " -G "+str(value))[value is not None] -1 invokes default behavior: non-affine if greedy, 5 if using dynamic programming extend_a_gap Cost to extend a gap (-E) Float (defined $value) ? " -E $value" : "" ("" , " -E "+str(value))[value is not None] Default: 2 for blastn; 1 for blastp, blastx and tblastn Limited values for gap existence and extension are supported for these programs. Some supported and suggested values are: Existence Extension 10 -- 1 10 -- 2 11 -- 1 8 -- 2 9 -- 2 scoring_blast Protein penalty (not for blastn) $blast2 ne "blastn" blast2 != "blastn" matrix Similarity matrix (-M) Choice BLOSUM62 BLOSUM62 BLOSUM45 BLOSUM80 PAM30 PAM70 (defined $value and $value ne $vdef) ? " -M $value" : "" ("" , " -M "+str(value))[value is not None and value != vdef] scoring_blastn Blastn penalty $blast2 eq "blastn" blast2 == "blastn" mismatch Penalty for a nucleotide mismatch (-q) Float -3 (defined $value and $value != $vdef) ? " -q $value" : "" ("" , " -q "+str(value))[value is not None and value != vdef] match Reward for a nucleotide match (-r) Float 1 (defined $value and $value != $vdef) ? " -r $value" : "" ("" , " -r "+str(value))[value is not None and value != vdef] frameshift Frame shift penalty (-w) Float (defined $value) ? " -w $value" : "" ("", " -w "+str(value))[value is not None] filter_opt Filtering and masking options 6 BLAST 2.0 uses the dust low-complexity filter for blastn and seg for the other programs. If one uses '-F T' then normal filtering by seg or dust (for blastn) occurs (likewise '-F F' means no filtering whatsoever). filter Filter or Masking query sequence (DUST with blastn, SEG with others) (-F) Boolean 1 ($value) ? "" : " -F F" (" -F F" , "")[value] other_filters Filtering options (Filter must be true) Choice $filter and not $other_masking filter and not other_masking null null "" "" v1 " -F C" " -F C" v2 " -F \"C;S\"" " -F \"C;S\"" v3 " -F D" " -F D" A coiled-coiled filter, based on the work of Lupas et al. (Science, vol 252, pp. 1162-4 (1991)) written by John Kuzio (Wilson et al., J Gen Virol, vol. 76, pp. 2923-32 (1995)) other_masking Masking options (Filter must be true) Choice $filter and not $other_filters filter and not other_filters null null v1 " -F \"m S\"" " -F \"m S\"" v2 " -F \"m D\"" " -F \"m D\"" v3 " -F \"m C\"" " -F \"m C\"" v4 " -F m" " -F m" For Lower-case masking the lower case filtering must be select. ($value eq 'null' or $value eq 'v1' or $value eq 'v2' or $value eq 'v3']) or ($value eq 'v4' and $lower_case) value in ['null', 'v1', 'v2', 'v3'] or (value == 'v4' and lower_case) A coiled-coiled filter, based on the work of Lupas et al. (Science, vol 252, pp. 1162-4 (1991)) written by John Kuzio (Wilson et al., J Gen Virol, vol. 76, pp. 2923-32 (1995)). It is possible to specify that the masking should only be done during the process of building the initial words . If the -U option (to mask any lower-case sequence in the input FASTA file) is used and one does not wish any other filtering, but does wish to mask when building the lookup tables then one should specify: -F 'm' lower_case Use lower case filtering (-U) Boolean 0 ($value) ? " -U T" : "" ("", " -U T")[value] This option specifies that any lower-case letters in the input FASTA file should be masked. selectivity_opt Selectivity options 7 The programs blastn and blastp offer fully gapped alignments. blastx and tblastn have 'in-frame' gapped alignments and use sum statistics to link alignments from different frames. tblastx provides only ungapped alignments. Expect Expected value (-e) Float 10 (defined $value and $value != $vdef) ? " -e $value" : "" ("" , " -e "+str(value))[value is not None and value != vdef] The statistical significance threshold for reporting matches against database sequences; the default value is 10, such that 10 matches are expected to be found merely by chance, according to the stochastic model of Karlin and Altschul (1990). If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Lower EXPECT thresholds are more stringent, leading to fewer chance matches being reported. Fractional values are acceptable. word_size Word Size (-W) Integer (defined $value) ? " -W $value" : "" ("" , " -W "+str(value))[value is not None] Use words of size N. Zero invokes default behavior Default values: - 11 for blastn - 3 for others dist_hits Multiple Hits window size (-A) Integer (defined $value) ? " -A $value" : "" ("" , " -A "+str(value))[value is not None] Generally defaults to 0 (for single-hit extensions), but defaults to 40 when using discontiguous templates. extend_hit Threshold for extending hits (-f) Float (defined $value)? " -f $value" : "" ("" , " -f " + str(value))[ value is not None ] Blast seeks first short word pairs whose aligned score reaches at least this value. Default values: - 0 for blastn - 11 for blastp - 12 for blastx - 13 for tblastn and tblastx dropoff_extent X dropoff value for gapped alignment (-X) Float (defined $value) ? " -X $value" : "" ("" , " -X "+str(value))[value is not None] This is the value that control the path graph region explored by Blast during a gapped extension (Xg in the NAR paper) (default for blastp is 15). Default values: - 30 for blastn - 0 for tblastx - 15 for others dropoff_extent_ungapped X dropoff value for ungapped extention (-y) Float (defined $value and $value != $vdef ) ? " -y $value" : "" ("" , " -y "+str(value))[value is not None and value != vdef] - 0.0: default behavior: - 20 for blastn - 7 for others dropoff_final X dropoff value for final gapped alignment (-Z) Float (defined $value) ? " -Z $value" : "" ("" , " -Z "+str(value))[value is not None] Default values: - 100 for blastn - 0 for tblastx - 25 for others eff_len Effective length of the search space (-Y) Integer (defined $value) ? " -Y $value" : "" ("" , " -Y "+str(value))[value is not None] Use zero for the real size keep_hits Number of best hits from a region to keep (-K) Integer (defined $value) ? " -K $value" : "" ("" , " -K "+str(value))[value is not None] If this option is used, a value of 100 is recommended. gapped_alig Perform or not gapped alignment (not available with tblastx) (-g) Boolean $blast2 ne "tblastx" blast2 != "tblastx" 1 ($value) ? "" : " -g F " (" -g F " , "")[value] mode Single-hit or multiple-hit mode (-P) Choice $blast2 ne "blastn" blast2 != "blastn" 0 0 1 ($value ne "0") ? " -P $value" : "" ("" , " -P "+str(value))[value != "0"] translation_opt Translation options 8 gc_query Genetic code used for query translation (-Q) Choice $blast2 =~ /^t?blastx$/ blast2 in [ "blastx" , "tblastx" ] 1 1 2 3 4 5 6 9 10 11 12 13 14 15 (defined $value and $value ne $vdef) ? " -Q $value" : "" ("" , " -Q "+str(value))[value is not None and value != vdef] gc_db Genetic code used for database translation (-D) Choice $blast2 =~ /^tblast[nx]$/ blast2 in [ "tblastn", "tblastx" ] 1 1 2 3 4 5 6 9 10 11 12 13 14 15 ($value ne $vdef) ? " -D $value" : "" ("" , " -D "+str(value))[value != vdef] strand Query strands to search against database (-S) Choice $blast2 =~ /^(blastn|t?blastx)$/ blast2 in [ "blastn" ,"blastx" , "tblastx" ] 3 1 2 3 (defined $value and $value ne $vdef) ? " -S $value" : "" ("" , " -S "+str(value))[value is not None and value != vdef] affichage Report options 9 Descriptions Number of one-line descriptions to show (-v) Integer 500 (defined $value and $value != $vdef) ? " -v $value" : "" ("" , " -v "+str(value))[value is not None and value != vdef] Maximum number of database sequences for which one-line descriptions will be reported. Alignments Number of database sequences to show alignments (-b) Integer 250 (defined $value and $value != $vdef) ? " -b $value" : "" ("" , " -b "+str(value))[value is not None and value != vdef] Maximum number of database sequences for which high-scoring segment pairs will be reported (-b). view_alignments Alignment view options (-m) Choice 0 0 1 2 3 4 5 6 7 8 (defined $value and $value ne $vdef) ? " -m $value" : "" ("" , " -m "+str(value))[value is not None and value != vdef] txtoutput Text output file String $view_alignments ne "7" view_alignments != "7" " -o blast2.txt" " -o blast2.txt" 10 xmloutput Xml output file String $view_alignments eq "7" view_alignments == "7" " -o blast2.xml" " -o blast2.xml" 10 htmloutput Html output Boolean $view_alignments !~ /^[78]$/ view_alignments not in [ "7" , "8" ] 1 ($value) ? " && html4blast -g -o blast2.html blast2.txt" : "" ("" , " && html4blast -g -o blast2.html blast2.txt")[value] 11 txtfile Blast text report BlastTextReport Report $view_alignments ne "7" view_alignments != "7" "blast2.txt" "blast2.txt" xmlfile Blast xml report BlastXmlReport Report $view_alignments eq "7" view_alignments == "7" "blast2.xml" "blast2.xml" htmlfile Blast html report BlastHtmlReport Report $view_alignments !~ /^[78]$/ view_alignments not in [ "7" , "8" ] "blast2.html" "blast2.html" imgfile Picture Binary $view_alignments !~ /^[78]$/ view_alignments not in ["7", "8"] "*.png" "*.gif" "*.png" "*.gif" Programs-5.1.1/redata.xml0000644000175000001560000001235011672346320014141 0ustar bneronsis redata EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net redata Retrieve information from REBASE restriction enzyme database http://bioweb2.pasteur.fr/docs/EMBOSS/redata.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:restriction redata e_input Input section e_enzyme Restriction enzyme name String BamHI ("", " -enzyme=" + str(value))[value is not None and value!=vdef] 1 Enter the name of the restriction enzyme that you wish to get details of. The names often have a 'I' in them - this is a capital 'i', not a '1' or an 'l'. The names are case-independent ('AaeI' is the same as 'aaei') e_output Output section e_isoschizomers Show isoschizomers Boolean 1 (" -noisoschizomers", "")[ bool(value) ] 2 Show other enzymes with this specificity. (Isoschizomers) e_references Show references Boolean 1 (" -noreferences", "")[ bool(value) ] 3 e_suppliers Show suppliers Boolean 1 (" -nosuppliers", "")[ bool(value) ] 4 e_outfile Name of the output file (e_outfile) Filename redata.e_outfile ("" , " -outfile=" + str(value))[value is not None] 5 e_outfile_out outfile_out option RedataReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 6 Programs-5.1.1/profit.xml0000644000175000001560000001047412072525233014206 0ustar bneronsis profit EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net profit Scan one or more sequences with a simple frequency matrix http://bioweb2.pasteur.fr/docs/EMBOSS/profit.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:profiles sequence:protein:profiles profit e_input Input section e_infile Profile or weight matrix file ProfileOrMatrix AbstractText ("", " -infile=" + str(value))[value is not None] 1 e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 2 e_output Output section e_outfile Name of the output file (e_outfile) Filename profit.e_outfile ("" , " -outfile=" + str(value))[value is not None] 3 e_outfile_out outfile_out option ProfitReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 4 Programs-5.1.1/wordmatch.xml0000644000175000001560000003541112072525233014671 0ustar bneronsis wordmatch EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net wordmatch Finds regions of identity (exact matches) of two sequences http://bioweb2.pasteur.fr/docs/EMBOSS/wordmatch.html http://emboss.sourceforge.net/docs/themes alignment:pairwise:local wordmatch e_input Input section e_asequence asequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 2,n ("", " -asequence=" + str(value))[value is not None] 1 e_bsequence bsequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -bsequence=" + str(value))[value is not None] 2 e_required Required section e_wordsize Word size (value greater than or equal to 2) Integer 4 ("", " -wordsize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 2 is required value >= 2 3 e_output Output section e_dumpalign Dump matches as alignments Boolean 1 (" -nodumpalign", "")[ bool(value) ] 4 e_outfile Name of the output alignment file Filename wordmatch.align ("" , " -outfile=" + str(value))[value is not None] 5 e_aformat_outfile Choose the alignment output format Choice MATCH FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH ("", " -aformat=" + str(value))[value is not None and value!=vdef] 6 e_outfile_out outfile_out option Alignment e_aformat_outfile in ['FASTA', 'MSF'] e_outfile e_outfile_out2 outfile_out2 option Text e_aformat_outfile in ['PAIR', 'MARKX0', 'MARKX1', 'MARKX2', 'MARKX3', 'MARKX10', 'SRS', 'SRSPAIR', 'SCORE', 'UNKNOWN', 'MULTIPLE', 'SIMPLE', 'MATCH'] e_outfile e_logfile logfile option Filename wordmatch.e_logfile ("" , " -logfile=" + str(value))[value is not None] 7 Statistics on distribution of kmers and matches e_logfile_out logfile_out option WordmatchLog Report e_logfile e_dumpfeat Dump matches as feature files Boolean 1 (" -nodumpfeat", "")[ bool(value) ] 8 e_aoutfeat Name of the output feature file (e_aoutfeat) Filename wordmatch.e_aoutfeat ("" , " -aoutfeat=" + str(value))[value is not None] 9 e_offormat_aoutfeat Choose the feature output format Choice GFF GFF EMBL SWISSPROT NBRF CODATA ("", " -offormat=" + str(value))[value is not None and value!=vdef] 10 e_aoutfeat_out aoutfeat_out option Feature AbstractText e_aoutfeat e_boutfeat Name of the output feature file (e_boutfeat) Filename wordmatch.e_boutfeat ("" , " -boutfeat=" + str(value))[value is not None] 11 e_offormat_boutfeat Choose the feature output format Choice GFF GFF EMBL SWISSPROT NBRF CODATA ("", " -offormat=" + str(value))[value is not None and value!=vdef] 12 e_boutfeat_out boutfeat_out option Feature AbstractText e_boutfeat auto Turn off any prompting String " -auto -stdout" 13 Programs-5.1.1/combat.xml0000644000175000001560000002467711767572177014205 0ustar bneronsis combat 1.0 COMBAT Comparison of coding DNA Pedersen, Lyngso,Hein Christian N. S. Pedersen, Rune B. Lyngso and Jotun Hein. Comparison of coding DNA in Proceedings of the 9th Annual Symposium of Combinatorial Pattern Matching (CPM), 1998. http://www.daimi.au.dk/~cstorm/combat/ http://www.daimi.au.dk/~cstorm/combat/ alignment:pairwise combat String defined $sequence1 and defined $sequence2 sequence1 is not None and sequence2 is not None "cat $sequence1 $sequence2 >> sequence && combat combat.params >distance.out && combine sequence combat.aln" "cat " + str(sequence2) + " " + str(sequence1) + " >> sequence.data && combat combat.params > distance.out && combine sequence.data combat.aln" 0 sequence1 First Sequence Sequence FASTA ">inputfile\n\"sequence.data\"\n" '>inputfile\n"sequence.data"\n' 1 Sequences must describe an integer number of codon, i.e. the length of sequence must be a multiple of three. combat.params sequence2 Second Sequence Sequence FASTA Sequences must describe an integer number of codon, i.e. the length of sequence must be a multiple of three. output_aln Output file Text ">outputfile\n\"combat.aln\"\n" '>outputfile\n"combat.aln"\n' 2 combat.params "combat.aln" "combat.aln" protein_distance_matrix Amino-acid distance matrix Choice Blosum62_distance.m PAM60_distance.m PAM120_distance.m PAM250_distance.m.m PAM350_distance.m Blosum30_distance.m Blosum62_distance.m Blosum90_distance.m ">distance matrix\n\"$value\"\n" ">distance matrix\n\"" + str(value) + '"\n' 3 combat.params nucleotide_distance_matrix Nucleotid distance matrix Choice nucleotide_distance1.m nucleotide_distance1.m nucleotide_distance2.m nucleotide_distance3.m ">nucleotide matrix\n\"$value\"\n" ">nucleotide matrix\n\"" + str(value) + '"\n' 4 combat.params protein_gap_open Gap open cost for protein Integer 20 ">gap functions\\nprotein: $value" ">gap functions\nprotein: " + str(value) 5 combat.params protein_gap_ext Gap extension cost for protein Integer 8 " + $value*k\\n" " + " + str(value) + "*k\n" 6 combat.params dna_gap_open Gap open cost for dna Integer 8 "dna: $value" "dna: " + str(value) 7 combat.params dna_gap_ext Gap extension cost for dna Integer 2 " + $value*k" " + " + str(value) + "*k" 8 combat.params combat_out Alignment file Alignment FASTA "combat.out" "combat.out" distance_files Distance file Text "distance.out" "distance.out" ps_files Postscript file PostScript Binary "combat.ps" "combat.ps" gnuplot_call String " && gnuplot <gnuplot.params" " && gnuplot <gnuplot.params" 100 gnuplot_commands String "set xtics 12,5.,1000\nset ytics 12,5.,1000\nset grid\nset terminal postscript\nset output \"combat.ps\"\nplot \"combat.aln\" with lines\n" 'set xtics 12,5.,1000\nset ytics 12,5.,1000\nset grid\nset terminal postscript\nset output "combat.ps"\nplot "combat.aln" with lines\n' 1 gnuplot.params Programs-5.1.1/newseq.xml0000644000175000001560000001645012072525233014205 0ustar bneronsis newseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net newseq Create a sequence file from a typed-in sequence http://bioweb2.pasteur.fr/docs/EMBOSS/newseq.html http://emboss.sourceforge.net/docs/themes sequence:edit newseq e_input Input section e_name Name of the sequence String ("", " -name=" + str(value))[value is not None] 1 The name of of the sequence should be a single word that you will use to identify the sequence. It should have no (or few) punctuation characters in it. e_description Description of the sequence String ("", " -description=" + str(value))[value is not None] 2 Enter any description of the sequence that you require. e_type Type of sequence Choice N N P ("", " -type=" + str(value))[value is not None and value!=vdef] 3 e_sequence Enter the sequence String ("", " -sequence=" + str(value))[value is not None] 4 The sequence itself. Because of the limitation of the operating system, you will only be able to type in a short sequence of (typically) 250 characters, or so. The keyboard will beep at you when you have reached this limit and you will not be able to press the RETURN/ENTER key until you have deleted a few characters. e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename newseq.e_outseq ("" , " -outseq=" + str(value))[value is not None] 5 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 6 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/water.xml0000644000175000001560000004773012072525233014032 0ustar bneronsis water EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net water Smith-Waterman local alignment of sequences http://bioweb2.pasteur.fr/docs/EMBOSS/water.html http://emboss.sourceforge.net/docs/themes alignment:pairwise:local water e_input Input section e_asequence asequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -asequence=" + str(value))[value is not None] 1 e_bsequence bsequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -bsequence=" + str(value))[value is not None] 2 e_datafile Matrix file Choice mobyle_null mobyle_null EBLOSUM30 EBLOSUM35 EBLOSUM40 EBLOSUM45 EBLOSUM50 EBLOSUM55 EBLOSUM60 EBLOSUM62 EBLOSUM62-12 EBLOSUM65 EBLOSUM70 EBLOSUM75 EBLOSUM80 EBLOSUM85 EBLOSUM90 EBLOSUMN EDNAFULL EDNAMAT EDNASIMPLE EPAM10 EPAM100 EPAM110 EPAM120 EPAM130 EPAM140 EPAM150 EPAM160 EPAM170 EPAM180 EPAM190 EPAM20 EPAM200 EPAM210 EPAM220 EPAM230 EPAM240 EPAM250 EPAM260 EPAM270 EPAM280 EPAM290 EPAM30 EPAM300 EPAM310 EPAM320 EPAM330 EPAM340 EPAM350 EPAM360 EPAM370 EPAM380 EPAM390 EPAM40 EPAM400 EPAM410 EPAM420 EPAM430 EPAM440 EPAM450 EPAM460 EPAM470 EPAM480 EPAM490 EPAM50 EPAM500 EPAM60 EPAM70 EPAM80 EPAM90 SSSUB ("", " -datafile=" + str(value))[value is not None and value!=vdef] 3 This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. e_required Required section e_gapopen Gap opening penalty (value from 0.0 to 100.0) Float ("", " -gapopen=" + str(value))[value is not None] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 100.0 is required value <= 100.0 4 The gap open penalty is the score taken away when a gap is created. The best value depends on the choice of comparison matrix. The default value assumes you are using the EBLOSUM62 matrix for protein sequences, and the EDNAFULL matrix for nucleotide sequences. e_gapextend Gap extension penalty (value from 0.0 to 10.0) Float ("", " -gapextend=" + str(value))[value is not None] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 10.0 is required value <= 10.0 5 The gap extension penalty is added to the standard gap penalty for each base or residue in the gap. This is how long gaps are penalized. Usually you will expect a few long gaps rather than many short gaps, so the gap extension penalty should be lower than the gap penalty. An exception is where one or both sequences are single reads with possible sequencing errors in which case you would expect many single base gaps. You can get this result by setting the gap open penalty to zero (or very low) and using the gap extension penalty to control gap scoring. e_output Output section e_brief Brief identity and similarity Boolean 1 (" -nobrief", "")[ bool(value) ] 6 Brief identity and similarity e_outfile Name of the output alignment file Filename water.align ("" , " -outfile=" + str(value))[value is not None] 7 e_aformat_outfile Choose the alignment output format Choice SRS FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH ("", " -aformat=" + str(value))[value is not None and value!=vdef] 8 e_outfile_out outfile_out option Alignment e_aformat_outfile in ['FASTA', 'MSF'] e_outfile e_outfile_out2 outfile_out2 option Text e_aformat_outfile in ['PAIR', 'MARKX0', 'MARKX1', 'MARKX2', 'MARKX3', 'MARKX10', 'SRS', 'SRSPAIR', 'SCORE', 'UNKNOWN', 'MULTIPLE', 'SIMPLE', 'MATCH'] e_outfile auto Turn off any prompting String " -auto -stdout" 9 Programs-5.1.1/tranalign.xml0000644000175000001560000001763312072525233014666 0ustar bneronsis tranalign EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net tranalign Generate an alignment of nucleic coding regions from aligned proteins http://bioweb2.pasteur.fr/docs/EMBOSS/tranalign.html http://emboss.sourceforge.net/docs/themes alignment:multiple tranalign e_input Input section e_asequence asequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -asequence=" + str(value))[value is not None] 1 e_bsequence bsequence option Protein Alignment FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH 1,n ("", " -bsequence=" + str(value))[value is not None] 2 e_additional Additional section e_table Genetic codes Choice 0 0 1 2 3 4 5 6 9 10 11 12 13 14 15 16 21 22 23 ("", " -table=" + str(value))[value is not None and value!=vdef] 3 e_output Output section e_outseq Name of the output sequence file (e_outseq) DNA Filename tranalign.e_outseq ("" , " -outseq=" + str(value))[value is not None] 4 e_outseq_out outseq_out option DNA Text e_outseq auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/hmoment.xml0000644000175000001560000002660612072525233014356 0ustar bneronsis hmoment EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net hmoment Calculate and plot hydrophobic moment for protein sequence(s) http://bioweb2.pasteur.fr/docs/EMBOSS/hmoment.html http://emboss.sourceforge.net/docs/themes sequence:protein:2D_structure structure:2D_structure hmoment e_input Input section e_seqall seqall option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -seqall=" + str(value))[value is not None] 1 e_additional Additional section e_window Window Integer 10 ("", " -window=" + str(value))[value is not None and value!=vdef] 2 e_aangle Alpha helix angle (degrees) Integer 100 ("", " -aangle=" + str(value))[value is not None and value!=vdef] 3 e_bangle Beta sheet angle (degrees) Integer 160 ("", " -bangle=" + str(value))[value is not None and value!=vdef] 4 e_advanced Advanced section e_baseline Graph marker line Float 0.35 ("", " -baseline=" + str(value))[value is not None and value!=vdef] 5 e_output Output section e_plot Produce graphic Boolean 0 ("", " -plot")[ bool(value) ] 6 e_double Plot two graphs Boolean 0 ("", " -double")[ bool(value) ] 7 e_graph Choose the e_graph output format Choice e_plot png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 8 xy_goutfile Name of the output graph Filename e_plot hmoment_xygraph ("" , " -goutfile=" + str(value))[value is not None] 9 xy_outgraph_png Graph file Picture Binary e_plot and e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_plot and e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_plot and e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_plot and e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_plot and e_graph == "data" "*.dat" e_outfile Name of the output file (e_outfile) Filename not e_plot hmoment.e_outfile ("" , " -outfile=" + str(value))[value is not None] 10 e_outfile_out outfile_out option HmomentReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 11 Programs-5.1.1/repeats.xml0000644000175000001560000001522511767572177014370 0ustar bneronsis repeats 1.1 repeats Search repeats in DNA sequence

The program scans a dna sequence file, looking for tandemly repeated patterns where the period of the repeat has a user specified *size* from 1 to 32 nucleotides. A possible repeat is found if *lookcount* characters are repeated at a separation of *size*.

Example: Suppose size is 7 and lookcount is 3. Then the sequence

                ACGTGTCCGTA 
                 ^^^   ^^^
                
could be part of a possible repeat of the pattern CGTGTC because the first 3 characters CGT are repeated at a separation of 7.

Once a possible pattern is found, the program uses dynamic programming to compute a similarity score of the pattern versus the sequence in the area where the pattern was found. The dynamic programming uses weights for single indels rather than gap functions. This is so that the program quickly identifies the repeats rather than producing an optimal alignment score.

If the similarity score exceeds a threshold, then a consensus pattern is computed. This consensus is aligned with the sequence and the alignment is displayed.

G. Benson A method for fast database search for all k-nucleotide repeats, by Gary Benson and Michael S. Waterman, Nucleic Acids Research (1994) Vol. 22, No. 22, pp 4828-4836.
sequence:nucleic:pattern repeats seq Sequence File DNA Sequence GENBANK " $value" " "+str(value) 1 The data file must conform to the GenBank format. alpha Match bonus (input as positive) (Alpha) Integer 2 " $value" " "+str(value) Value must be positive $value >= 0 value >= 0 2 beta Mismatch penalty (input as positive) (Beta) Integer 6 " $value" " "+str(value) Value must be positive $value > 0 value > 0 3 delta Indel penalty (input as positive) (Delta) Integer 9 " $value" " " + str(value) Value must be positive $value >= 0 value >= 0 4 reportmax Threshold score to report an alignment (Reportmax) Integer 30 " $value" " " + str(value) 5 Size Pattern size (Size) Integer " $value" " " + str(value) 6 lookcount Number of characters to match to trigger dynamic programming (Lookcount) Integer " $value" " " + str(value) 7 A possible repeat is found if *lookcount* characters are repeated at a separation of *size*. Recommended to use values between 3 and 8 noshortperiods Patterns with shorter periods are excluded ? (Noshortperiods) Boolean 0 ($value)? " 1 ":" 0" (" 0" , " 1 ")[ value ] 8
Programs-5.1.1/maskseq.xml0000644000175000001560000002022612072525233014343 0ustar bneronsis maskseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net maskseq Write a sequence with masked regions http://bioweb2.pasteur.fr/docs/EMBOSS/maskseq.html http://emboss.sourceforge.net/docs/themes sequence:edit maskseq e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_regions Regions to mask (eg: 4-57,78-94) String ("", " -regions=" + str(value))[value is not None] 2 Regions to mask. A set of regions is specified by a set of pairs of positions. The positions are integers. They are separated by any non-digit, non-alpha character. Examples of region specifications are: 24-45, 56-78 1:45, 67=99;765..888 1,5,8,10,23,45,57,99 e_additional Additional section e_tolower Change masked region to lower-case Boolean 0 ("", " -tolower")[ bool(value) ] 3 The region can be 'masked' by converting the sequence characters to lower-case, some non-EMBOSS programs e.g. fasta can interpret this as a masked region. The sequence is unchanged apart from the case change. You might like to ensure that the whole sequence is in upper-case before masking the specified regions to lower-case by using the '-supper' flag. e_maskchar Character to mask with String not e_tolower ("", " -maskchar=" + str(value))[value is not None] 4 Character to use when masking. Default is 'X' for protein sequences, 'N' for nucleic sequences. If the mask character is set to be the SPACE character or a null character, then the sequence is 'masked' by changing it to lower-case, just as with the '-lowercase' flag. e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename maskseq.e_outseq ("" , " -outseq=" + str(value))[value is not None] 5 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 6 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/vectorstrip.xml0000644000175000001560000002442012072525233015263 0ustar bneronsis vectorstrip EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net vectorstrip Removes vectors from the ends of nucleotide sequence(s) http://bioweb2.pasteur.fr/docs/EMBOSS/vectorstrip.html http://emboss.sourceforge.net/docs/themes sequence:edit vectorstrip e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_vectorfilesection Vector input options e_readfile Are your vector sequences in a file? Boolean 1 (" -noreadfile", "")[ bool(value) ] 2 e_vectorsfile Cloning vector definition file (optional) Vector AbstractText e_readfile ("", " -vectorsfile=" + str(value))[value is not None] 3 e_required Required section e_mismatch Max allowed % mismatch Integer 10 ("", " -mismatch=" + str(value))[value is not None and value!=vdef] 4 e_besthits Show only the best hits (minimise mismatches)? Boolean 1 (" -nobesthits", "")[ bool(value) ] 5 e_alinker The 5' sequence String not e_readfile ("", " -alinker=" + str(value))[value is not None] 6 e_blinker The 3' sequence String not e_readfile ("", " -blinker=" + str(value))[value is not None] 7 e_output Output section e_allsequences Show all sequences in output Boolean 0 ("", " -allsequences")[ bool(value) ] 8 e_outfile Name of the output file (e_outfile) Filename vectorstrip.e_outfile ("" , " -outfile=" + str(value))[value is not None] 9 e_outfile_out outfile_out option VectorstripReport Report e_outfile e_outseq Name of the output sequence file (e_outseq) Filename vectorstrip.e_outseq ("" , " -outseq=" + str(value))[value is not None] 10 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 11 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 12 Programs-5.1.1/mview_blast.xml0000644000175000001560000011424112073003734015212 0ustar bneronsis mview_blast 1.49 MVIEW Blast report pretty viewer N. P. Brown Brown, N.P., Leroy C., Sander C. (1998). MView: A Web compatible database search or multiple alignment viewer. Bioinformatics. 14(4):380-381. http://bioweb2.pasteur.fr/docs/mview/index.html http://bio-mview.sourceforge.net/ database:search:display display:database:search mview blast Blast text Report BlastTextReport Report " -in blast $value" " -in blast "+str(value) outformat Output format (-out) Choice html html msf pearson pir rdb (defined $value and $value ne $vdef) ? "-out $value" : "" ( "", " -out " + str(value) )[ value is not None and value != vdef] 1 main_formatting_options Main formatting options 2 ruler Attach a ruler (-ruler) Boolean 0 ($value) ? " -ruler on" : "" ( "" , " -ruler on" )[ value ] alignment Show alignment (-alignment) Boolean 1 ($value) ? "" : " -alignment off" ( " -alignment off" , "" )[ value ] consensus Show consensus (-consensus) Boolean 0 ($value) ? " -consensus on" : "" ( "" , " -consensus on" )[ value ] dna Use DNA/RNA colormaps and/or consensus groups (-dna) Boolean 0 ($value) ? " -dna" : "" ( "" , " -dna" )[ value ] alignment_options Alignment options 3 coloring Colour scheme (-coloring) Choice none none any identity consensus group ($value and $value ne $vdef) ? " -coloring $value" : "" ( "" , " -coloring " + str(value) )[ value !=vdef ] 3
  • Colour all the residues, will colour every residue according to the currently selected palette.
  • Colouring by identity to the first sequence, will colour only those residues that are identical to some reference sequence (usually the query or first row).
  • Colour only when above a given percent similarity, will colour only those residues that belong to a specified physicochemical class that is conserved in at least a specified percentage of all rows for a given column. This defaults to 70% and and may be set to another threshold, eg., -coloring consensus -threshold 80 would specify 80%. Note that the physicochemical classes in question can be confined to individual residues.
  • Colours residues by the colour of the class to which they belong, is like -coloring consensus, but colours residues by the colour of the class to which they belong.

By default, the consensus computation counts gap characters, so that sections of the alignment may be uncolored where the presence of gaps prevents the non-gap count from reaching the threshold. Setting -con_gaps off prevents this, allowing sequence-only based consensus thresholding.

threshold Threshold percentage for consensus coloring (-threshold) Float 70.0 (defined $value and $value != $vdef) ? " -threshold $value" : "" ( "" , " -threshold " + str(value) )[ value is not None and value != vdef] 3 ignore Ignore singleton or class group (-ignore) Choice none class none singleton (defined $value and $value ne $vdef) ? " -ignore $value" : "" ( "" , " -ignore " + str(value) )[ value is not None and value != vdef] 3 Tip: If you want to see only the conserved residues above the threshold (ie., only one type of conserved residue per column), add the option -ignore class.
consensus_options Consensus options $consensus consensus 3 con_coloring Basic style of coloring (-con_coloring) Choice none any identity none (defined $value) ? " -con_coloring $value" : "" ( "" , " -con_coloring " + str(value) )[ value is not None ] 3

Colouring of an alignment by consensus determines which residues to colour and the colours to use based on

  1. the consensus threshold chosen for the colouring operation,
  2. a consideration of the common physicochemical properties of the residues in that column,
  3. the chosen colour scheme.
con_threshold Consensus line thresholds (in range 50..100) (separated by commas) (-con_threshold) String 100,90,80,70 (defined $value and $value ne $vdef) ? " -con_threshold $value" : "" ( "" , " -con_threshold " + str(value) )[ value is not None and value != vdef] 3 con_ignore Ignore singleton or class group (-con_ignore) Choice none class none singleton (defined $value and $value ne $vdef) ? " -con_ignore $value" : "" ( "" , " -con_ignore " + str(value) )[ value is not None and value != vdef] 3
hybrid_alignment_consensus_options Hybrid alignment and consensus options 3 con_gaps Count gaps during consensus computations (-con_gaps) Boolean 1 ($value) ? "" : " -con_gaps off" ( " -con_gaps off" , "" )[ value ] 3 general_row_column_filters General row/column filters 3 top Report top N hits (-top) Integer (defined $value) ? " -top $value" : "" ( "" , " -top " + str(value) )[ value is not None ] range Display column range M:N (-range) String (defined $value and $value =~ s/,/:/g) ? " -range $value" : "" ( "" , " -range " + str(value).replace(',',':') )[ value is not None ] You must enter a string composed of two numbers separated by comma maxident Only report sequences with percent identity <= N (-maxident) Integer 100 (defined $value and $value != $vdef) ? " -maxident $value" : "" ( "" , " -maxident " + str(value) )[ value is not None and value != vdef] ref Use row N or row identifier as percent identity reference (-ref) Integer (defined $value) ? " -ref $value" : "" ( "" , " -ref " + str(value) )[ value is not None ] keep_only Keep only the rows from start to end (separated by comma) (-keep) String (defined $value ) ? " -keep $value" : "" ( "" , " -keep " + str(value) )[ value is not None ] You must enter a string composed of two numbers separated by comma disc Discard rows from start to end (separated by comma) (-disc) String (defined $value and $value =~ s/,/../ ) ? " -disc $value" : "" ( "" , " -disc " + str(value).replace(',','..') )[ value is not None ] You must enter a string composed of two numbers separated by comma nops Display rows unprocessed (separated by comma ) (-nops) String (defined $value and $value =~ s/,/../) ? " -nops $value" : "" ( "" , " -nops " + str(value).replace(',','..' ))[ value is not None ] You must enter a string composed of two numbers separated by comma general_formatting_options General formatting options 3 width Paginate in N columns of alignment (-width) Integer (defined $value) ? " -width $value" : "" ( "" , " -width " + str(value) )[ value is not None ] gap Use this character as the gap (-gap) String (defined $value) ? " -gap $value" : " " ( " " , " -gap " + str(value) )[ value is not None] label0 Switch off label: row number (-label0) Boolean 0 ($value) ? " -label0" : "" ( "" , " -label0" )[ value ] label1 Switch off label: identifier (-label1) Boolean 0 ($value) ? " -label1" : "" ( "" , " -label1" )[ value ] label2 Switch off label: description (-label2) Boolean 0 ($value) ? " -label2" : "" ( "" , " -label2" )[ value ] label3 Switch off label: scores (-label3) Boolean 0 ($value) ? " -label3" : "" ( "" , " -label3" )[ value ] label4 Switch off label: percent identity (-label4) Boolean 0 ($value) ? " -label4" : "" ( "" , " -label4" )[ value ] register Output multi-pass alignments with columns in register (-register) Boolean 1 ($value) ? "" : " -register off" ( " -register off" , "" )[ value ] blast_options BLAST options 3 hsp HSP tiling method (-hsp) Choice ranked all discrete ranked ($value ne $vdef) ? " -hsp $value":"" (""," -hsp " + str(value))[value !=vdef] strand Report only these query strand orientations (-strand) Choice all p m all ($value ne $vdef) ? " -strand $value":"" ("", " -strand " +str(value))[value !=vdef] blast1_options BLAST series 1 options maxpval Ignore hits with p-value more than N (-maxpval) Float (defined $value) ? " -maxpval $value":"" ("", " -maxpval "+str(value))[value is not None] minscore Ignore hits with score less than N (-minscore) Float (defined $value) ? " -minscore $value":"" ("", " -minscore "+str(value))[value is not None] blast2_options BLAST series 2 options maxeval Ignore hits with p-value more than N -- Blast2 only (-maxeval) Float (defined $value) ? " -maxeval $value":"" ("", " -maxeval "+str(value))[value is not None] minbits Ignore hits with bits less than N (-minbits) Integer (defined $value) ? " -minbits $value":"" ("", " -minbits "+str(value))[value is not None] psi_options PSI-BLAST options cycle Process the Nth cycle of a multipass search (-cycle) String (defined $value) ? " -cycle $value" : "" ("", " -cycle "+str(value))[value is not None] html_markup_options HTML markup options $outformat eq "html" outformat == "html" 3 html_output Amount of HTML markup (-html) Choice full full head body data css off (defined $value]) ? "-html $value":"" ("", " -html " + str(value))[value is not None] Title Page title string String (defined $value) ? " -title $value" : "" ( "" , " -title " + str(value) )[ value is not None ] pagecolor Page background color (-pagecolor) String white (defined $value and $value ne $vdef) ? " -pagecolor $value" : "" ( "" , " -pagecolor " + str(value) )[ value is not None and value != vdef] textcolor Page text color (-textcolor) String black (defined $value and $value ne $vdef) ? " -textcolor $value" : "" ( "" , " -textcolor " + str(value) )[ value is not None and value != vdef] linkcolor Link color (-linkcolor) String blue (defined $value and $value ne $vdef) ? " -linkcolor $value" : "" ( "" , " -linkcolor " + str(value) )[ value is not None and value != vdef] 3 alinkcolor Active link color (-alinkcolor) String red (defined $value and $value ne $vdef) ? " -alinkcolor $value" : "" ( "" , " -alinkcolor " + str(value) )[ value is not None and value != vdef] vlinkcolor Visited link color (-vlinkcolor) String purple (defined $value and $value ne $vdef) ? " -vlinkcolor $value" : "" ( "" , " -vlinkcolor " + str(value) )[ value is not None and value != vdef] alncolor Alignment background color (-alncolor) String white (defined $value and $value ne $vdef) ? " -alncolor $value" : "" ( "" , " -alncolor " + str(value) )[ value is not None and value != vdef] labcolor Alignment label color (-labcolor) String black (defined $value and $value ne $vdef) ? " -labcolor $value" : "" ( "" , " -labcolor " + str(value) )[ value is not None and value != vdef] symcolor Alignment default text color (-symcolor) String (defined $value) ? " -symcolor $value" : " " ( " " , " -symcolor " + str(value) )[ value is not None ] gapcolor Alignment gap color (-gapcolor) String (defined $value) ? " -gapcolor $value" : "" ( "" , " -gapcolor " + str(value) )[ value is not None ] bold Use bold emphasis for coloured residues (-bold) Boolean 0 ($value) ? " -bold" : "" ( "" , " -bold" )[ value ] css Use Cascading Style Sheets (-css) Choice off off on (defined $value and $value eq $vdef) ? " -css on" : "" ( "" , " -css on" )[ value is not None and value != vdef] alig_file Alignment output file Alignment FASTA MSF CODATA $outformat eq "msf" or $outformat eq "pearson" or $outformat eq "pir" outformat in [ "msf", "pearson" , "pir"] "mview_blast.out" "mview_blast.out" html_file Alignment output file Report HTML $outformat eq "html" outformat == "html" " && ln -s mview_blast.out mview_blast.html" " && ln -s mview_blast.out mview_blast.html" "mview_blast.html" "mview_blast.html" 10000 rdb_file Alignment output in RDB format Report RDB $outformat eq "rdb" outformat == "rdb" "mview_blast.out" "mview_blast.out"
Programs-5.1.1/phmmer.xml0000644000175000001560000014060511767572177014216 0ustar bneronsis phmmer PHMMER Search a protein sequence(s) against a protein database hmm:database:search database:search:hmm phmmer qsequence Query sequence(s) Protein Sequence FASTA " $value" " "+str(value) 10 db Choose a protein sequence database Choice null null " $value" " "+str(value) 11 output Directing output 1 The output format is designed to be human-readable, but is often so voluminous that reading it is impractical, and parsing it is a pain. The --tblout and --domtblout options save output in simple tabular formats that are concise and easier to parse. outfile Direct output to file (-o) Boolean 0 ($value != $vdef) ? " -o phmmer.output" : "" ("", " -o phmmer.output") [ value != vdef] Direct the main "human-readable" output to a file instead of the default stdout. aligfile Save multiple alignment of hits to file (-A) Boolean 0 ($value != $vdef) ? " -A phmmer.alig" : "" ("", " -A phmmer.alig") [ value != vdef] Save a multiple alignment of all significant hits (those satisfying inclusion thresholds) to a file (Stockholm format). seqtab Save parseable table of per-sequence hits to file (--tblout) Boolean 0 $value != $vdef) ? " --tblout phmmer.tblout" : "" ("", " --tblout phmmer.tblout") [ value != vdef] Save a simple tabular (space-delimited) file summarizing the "per-target" output, with one data line per homologous target sequence found. domaintab Save parseable table of per-domain hits to file (--domtblout) Boolean 0 ($value != $vdef) ? " --domtblout phmmer.domtblout" : "" ("", " --domtblout phmmer.domtblout") [ value != vdef] Save a simple tabular (space-delimited) file summarizing the "per-domain" output, with one data line per homologous domain detected in a query sequence for each homologous model. acc Prefer accessions over names in output (--acc) Boolean 0 ($value != $vdef) ? " --acc" : "" ("", " --acc") [ value != vdef] Use accessions instead of names in the main output, where available for profiles and/or sequences. noali Don't output alignments, so output is smaller (--noali) Boolean 0 ($value != $vdef) ? " --noali" : "" ("", " --noali") [ value != vdef] Omit the alignment section from the main output. This can greatly reduce the output volume. notextw Unlimit ASCII text output line width (--notextw) Boolean 0 ($value != $vdef) ? " --notextw" : "" ("", " --notextw" ) [ value != vdef] Unlimit the length of each line in the main output. The default is a limit of 120 characters per line, which helps in displaying the output cleanly on terminals and in editors, but can truncate target profile description lines. textw Max width of ASCII text output lines (--textw) Integer 120 $notextw == 0 notextw == 0 ($value != $vdef) ? " --textw $value" : "" ("", " --textw " + str(value) ) [ value != vdef] Enter a value >=120. 120 <=$value 120 <=value scoringsys Controlling scoring system 2 The probability model in phmmer is constructed by inferring residue probabilities from a standard 20x20 substitution score matrix, plus two additional parameters for position-independent gap open and gap extend probabilities. popen Gap open probability (--popen) Float 0.02 ($value != $vdef) ? " --popen $value" : "" ("", " --popen " + str(value)) [ value != vdef] Enter a value >= 0 and <0.5 0 <= $value <0.5 0 <= value <0.5 The probability has to be >= 0 and <0.5. Default value: 0.02. pextend Gap extend probability (--pextend) Float 0.4 ($value != $vdef) ? " --pextend $value" : "" ("", " --pextend " + str(value)) [ value != vdef] Enter a value >= 0 and <1 0 <= $value <1 0 <= value <1 The probability has to be >= 0 and <1. Default value: 0.4. matrix Substitution score matrix (--mxfile) Choice BLOSUM62 BLOSUM30 BLOSUM35 BLOSUM40 BLOSUM45 BLOSUM50 BLOSUM55 BLOSUM60 BLOSUM62 BLOSUM65 BLOSUM70 BLOSUM75 BLOSUM80 BLOSUM85 BLOSUM90 PAM10 PAM20 PAM30 PAM40 PAM50 PAM60 PAM70 PAM80 PAM90 PAM100 PAM110 PAM120 PAM130 PAM140 PAM150 PAM160 PAM170 PAM180 PAM190 PAM200 PAM210 PAM220 PAM230 PAM240 PAM250 PAM260 PAM270 PAM280 PAM290 PAM300 PAM310 PAM320 PAM330 PAM340 PAM350 PAM360 PAM370 PAM380 PAM390 PAM400 ($value != $vdef) ? " --mxfile $value" : "" ("", " --mxfile " + str(value)) [ value != vdef] To obtain residue alignment probabilities from a substitution matrix. The default score matrix is BLOSUM62 report Controlling significance thresholds for reporting 3 "Reporting" thresholds control which hits are reported in output files (the main output, --tblout, and -- domtblout). Sequence hits and domain hits are ranked by statistical significance (E-value) and output is generated in two sections called "per-target" and "per-domain" output. The following options allow you to change the default E-value reporting thresholds, or to use bit score thresholds instead. e_threshold Thresholds for Sequences: E-value (-E) Float 10.0 $s_threshold is None s_threshold is None ($value != $vdef) ? " -E $value" : "" ("", " -E " + str(value)) [ value != vdef] Enter a value > 0. 0 <$value 0 < value In the per-target output, report target sequences <= this E-value threshold. The default is 10.0, meaning that on average, about 10 false positives will be reported per query, so you can see the top of the "noise" and decide for yourself if it's really noise. s_threshold Score (-T) Float ($value) ? " -T $value" : "" ("", " -T " + str(value)) [ value is not None] Enter a value > 0. 0 <$value 0 < value Instead of thresholding per-profile output on E-value, report target sequences with a bit score of >= this score threshold. d_e_threshold Thresholds for Domains: E-value (--domE) Float 10.0 $d_s_threshold is None d_s_threshold is None ($value != $vdef) ? " --domE $value" : "" ("", " --domE " + str(value)) [ value is not None and value != vdef] Enter a value > 0. 0 <$value 0 < value In the per-domain output, for target sequences that have already satisfied the perprofile reporting threshold, report individual domains with a conditional E-value < or = this threshold. The default is 10.0. A "conditional" E-value means the expected number of additional false positive domains in the smaller search space of those comparisons that already satisfied the per-target reporting threshold (and thus must have at least one homologous domain already). d_s_threshold Score (--domT) Float ($value) ? " --domT $value" : "" ("", " --domT " + str(value)) [ value is not None] Enter a value > 0. 0 <$value 0 < value Instead of thresholding per-domain output on E-value, report domains with a bit score of >= this score threshold in output. e_threshold s_threshold d_e_threshold d_s_threshold inclusion_A Controlling significance thresholds for inclusion in Output alignment 4 Inclusion thresholds are stricter than reporting thresholds. They control which hits are included in any output multiple alignment (the -A option) and which domains are marked as significant ("!") as opposed to questionable ("?") in domain output. Available if the option -A is selected. $aligfile==1 aligfile==1 a_e_threshold Thresholds for Sequences: E-value (--incE) Float 0.01 $a_s_threshold is None a_s_threshold is None ($value != $vdef) ? " --incE $value" : "" ("", " --incE " + str(value)) [ value is not None and value != vdef] Include sequences < or = this E-value threshold in output alignment. The default is 0.01, meaning that on average, about 1 false positive would be expected in every 100 searches with different query sequences. a_s_threshold Score (--incT) Float ($value) ? " --incT $value" : "" ("", " --incT " + str(value)) [ value is not None] Instead of using E-values for setting the inclusion threshold in output alignment, use a bit score of >= this number as the per-target inclusion threshold. By default this option is unset. a_d_e_threshold Thresholds for Domains: E-value (--incdomE) Float 0.01 $a_d_s_threshold is None a_d_s_threshold is None ($value != $vdef) ? " --incdomE $value" : "" ("", " --incdomE " + str(value)) [ value is not None and value != vdef ] Use a conditional E-value of <= this number as the per-domain inclusion threshold, in targets that have already satisfied the overall per-target inclusion threshold. The default is 0.01. a_d_s_threshold Score (--incdomT) Float ($value) ? " --incdomT $value" : "" ("", " --incdomT " + str(value)) [ value is not None] Instead of using E-values, use a bit score of >= this number as the per-domain inclusion threshold. By default this option is unset. a_e_threshold a_s_threshold a_d_e_threshold a_d_s_threshold heuristic Controlling the acceleration pipeline 5 HMMER3 searches are accelerated in a three-step filter pipeline: - the MSV filter (the fastest and most approximate), - the Viterbi filter, - and the Forward filter (full Forward scoring algorithm, slowest but most accurate), + There is also a "bias filter" step between MSV and Viterbi. Targets that pass all the steps in the acceleration pipeline are then subjected to "postprocessing" (domain identification and scoring using the Forward/Backward algorithm). Essentially the only free parameters that control HMMER's heuristic filters are the P-value thresholds controlling the expected fraction of non-homologous sequences that pass the filters. - Setting the default thresholds higher will pass a higher proportion of non-homologous sequence, increasing sensitivity at the expense of speed, - Setting lower P-value thresholds will pass a smaller proportion, decreasing sensitivity and increasing speed, - Setting a filter's P-value threshold to 1.0 means it will passing all sequences, and effectively disables the filter. Changing filter thresholds only removes or includes targets from consideration; it does not alter bit scores, E-values, or alignments, all of which are determined solely in "postprocessing". max Turn all heuristic filters off (less speed, more power) (--max) Boolean 0 ($value != $vdef) ? " --max" : "" ("", " --max") [ value != vdef ] Maximum sensitivity. Turn off all filters, including the bias filter, and run full Forward/ Backward postprocessing on every target. This increases sensitivity slightly, at a large cost in speed. F1 Stage 1 (MSV) threshold (--F1) Float 0.02 $max==0 max==0 ($value != $vdef) ? " --F1 $value" : "" ("", " --F1 " + str(value) ) [ value != vdef ] First filter threshold; set the P-value threshold for the MSV filter step. The default is 0.02, meaning that roughly 2% of the highest scoring non-homologous targets are expected to pass the filter. F2 Stage 2 (Vit) threshold (--F2) Float 0.001 $max==0 max==0 ($value != $vdef) ? " --F2 $value" : "" ("", " --F2 " + str(value) ) [ value != vdef ] Second filter threshold; set the P-value threshold for the Viterbi filter step. The default is 0.001. F3 Stage 3 (Fwd) threshold (--F3) Float 0.00001 $max==0 max==0 ($value != $vdef) ? " --F3 $value" : "" ("", " --F3 " + str(value) ) [ value != vdef ] Third filter threshold; set the P-value threshold for the Forward filter step. The default is 1e-5. nobias Turn off composition bias filter (--nobias) Boolean 0 $max==0 max==0 ($value != $vdef) ? " --nobias" : "" ("", " --nobias" ) [ value != vdef ] Turn off the bias filter increases sensitivity somewhat, but can come at a high cost in speed, especially if the query has biased residue composition (such as a repetitive sequence region, or if it is a membrane protein with large regions of hydrophobicity). Without the bias filter, too many sequences may pass the filter with biased queries, leading to slower than expected performance as the computationally intensive Forward/Backward algorithms shoulder an abnormally heavy load. MSV Controlling E-value calibration for Stage 1 - MSV Gumbel mu fit 6 Estimating the location parameters for the expected score distributions for MSV filter scores, Viterbi filter scores, and Forward scores requires three short random sequence simulations. eml Length of sequences (--EmL) Integer 200 ($value != $vdef) ? " --EmL $value" : "" ("", " --EmL " + str(value) ) [ value != vdef ] Enter a value > 0. 0 <$value 0 < value Sets the sequence length in simulation that estimates the location parameter mu for MSV filter E-values. Default is 200. emn Number of sequences (--EmN) Integer 200 ($value != $vdef) ? " --EmN $value" : "" ("", " --EmN " + str(value) ) [ value != vdef ] Enter a value > 0. 0 <$value 0 < value Sets the number of sequences in simulation that estimates the location parameter mu for MSV filter E-values. Default is 200. eml emn Ecalibration2 Controlling E-value calibration for Stage 2 - Viterbi Gumbel mu fit 7 Estimating the location parameters for the expected score distributions for MSV filter scores, Viterbi filter scores, and Forward scores requires three short random sequence simulations. evl Length of sequences (--EvL) Integer 200 ($value != $vdef) ? " --EvL $value" : "" ("", " --EvL " + str(value) ) [ value != vdef ] Enter a value > 0. 0 <$value 0 < value Sets the sequence length in simulation that estimates the location parameter mu for Viterbi filter E-values. Default is 200. evn Number of sequences (--EvN) Integer 200 ($value != $vdef) ? " --EvN $value" : "" ("", " --EvN " + str(value) ) [ value != vdef ] Enter a value > 0. 0 <$value 0 < value Sets the number of sequences in simulation that estimates the location parameter mu for Viterbi filter E-values. Default is 200. evl evn Ecalibration3 Controlling E-value calibration for Stage 3 - Forward exponential tail tau fit 8 Estimating the location parameters for the expected score distributions for MSV filter scores, Viterbi filter scores, and Forward scores requires three short random sequence simulations. efl Length of sequences (--EfL) Integer 100 ($value != $vdef) ? " --EfL $value" : "" ("", " --EfL " + str(value) ) [ value != vdef ] Enter a value > 0. 0 <$value 0 < value Sets the sequence length in simulation that estimates the location parameter tau for Forward E-values. Default is 100. efn Number of sequences (--EfN) Integer 200 ($value != $vdef) ? " --EfN $value" : "" ("", " --EfN " + str(value) ) [ value != vdef ] Enter a value > 0. 0 <$value 0 < value Sets the number of sequences in simulation that estimates the location parameter tau for Forward E-values. Default is 200. eft Tail mass (--Eft) Float 0.04 ($value != $vdef) ? " --Eft $value" : "" ("", " --Eft " + str(value) ) [ value != vdef ] Enter a value > 0 and <1. 0 <$value 0 < value Sets the tail mass fraction to fit in the simulation that estimates the location parameter tau for Forward evalues. Default is 0.04. efl efn eft other Expert options 9 nonull Turn off biased composition score corrections (--nonull2) Boolean 0 $max==0 max==0 ($value != $vdef) ? " --nonull2" : "" ("", " --nonull2" ) [ value != vdef] Turn off the "null2" score corrections for biased composition. z Number of comparisons done, for E-value calculation (-Z) Integer ($value) ? " -Z $value" : "" ("", " -Z " + str(value)) [ value is not None] Enter a value > 0. 0 <$value 0 < value Assert that the total number of targets in your searches is this number, for the purposes of per-sequence E-value calculations, rather than the actual number of targets seen. d_z Number of significant sequences, for domain E-value calculation (--domZ) Integer ($value) ? " --domZ $value" : "" ("", " --domZ " + str(value)) [ value is not None] Enter a value > 0. 0 <$value 0 < value Assert that the total number of targets in your searches is this number, for the purposes of per-domain conditional E-value calculations, rather than the number of targets that passed the reporting thresholds. seed Set Random Number Generator seed to (--seed) Integer 42 ($value != $vdef) ? " --seed $value" : "" ("", " --seed " + str(value) ) [ value != vdef] Seed the random number generator with this, an integer >= 0. The default seed is 42. If >0, any stochastic simulations will be reproducible; the same command will give the same results. If = 0, the random number generator is seeded arbitrarily, and stochastic simulations will vary from run to run of the same command. out_file Output file Text $outfile==1 outfile==1 *.output "*.output" ali_file Alignment file Protein Alignment STOCKHOLM $aligfile==1 aligfile==1 *.alig "*.alig" seq_file Parseable table of per-sequence hits Text $seqtab==1 seqtab==1 *.tblout "*.tblout" dom_file Parseable table of per-domain hits Text $domaintab==1 domaintab==1 *.domtblout "*.domtblout" Programs-5.1.1/dca.xml0000644000175000001560000002621711767572177013457 0ustar bneronsis dca 1.1 DCA Divide-and-Conquer Multiple Sequence Alignment J. Stoye A.W.M. Dress, G. Fullen, S.W. Perrey, A Divide and Conquer Approach to Multiple Alignment, Proceedings of the Third International Conference on Intelligent Systems for Molecular Biology (ISMB 95), AAAI Press, Menlo Park, CA, USA, 107-113, 1995. J. Stoye, Multiple Sequence Alignment with the Divide-and-Conquer Method, Gene 211(2), GC45-GC56, 1998. (Gene-COMBIS) Divide-and-Conquer Multiple Sequence Alignment (DCA) is a program for producing fast, high quality simultaneous multiple sequence alignments of amino acid, RNA, or DNA sequences. The program is based on the DCA algorithm, a heuristic approach to sum-of-pairs (SP) optimal alignment that has been developed at the FSPM over the years 1995-97. http://bibiserv.techfak.uni-bielefeld.de/dca/ http://bibiserv.techfak.uni-bielefeld.de/download/tools/dca.html alignment:multiple dca seq Sequences File Sequence FASTA " $value" " "+str(value) 100 control Control parameters cost Cost matrix (-c) Choice null null blosum30 blosum45 blosum62 pam160 pam250 unitcost dna rna dnarna (defined $value and $value ne $vdef)? " -c $value" : "" ( "" , " -c " + str(value) )[ value is not None and value != vdef] 1 gaps Penalize end gaps as internal gaps (-g) Boolean 0 ($value)? " -g ":"" ("" , " -g ")[ value ] 1 Default: free shift approximate Use approximate cut positions (-a) Boolean 0 ($value)? " -a":"" ("" , " -a")[ value ] 1 On: FastDCA (use approximate cut positions); Off: slower, more accurate algorithm (search for exact cut positions) intensity Weight intensity (-b) Float 0.0 (defined $value and $value != $vdef)? " -b $value" : "" ( "" , " -b " + str(value) )[ value is not None and value != vdef] Weight intensity must be >= 0.0 and <= 1.0 $intensity >= 0.0 and $intensity <= 1.0 intensity >= 0.0 and intensity <= 1.0 1 recursion Recursion stop size (-l) Integer 30 (defined $value and $value != $vdef)? " -l $value" : "" ( "" , " -l " + str(value) )[ value is not None and value != vdef] 1 5 ... 100 recommended; small: faster algorithm, maybe worse. window Window size (-w) Integer 0 (defined $value and $value != $vdef)? " -w $value" : "" ( "" , " -w " + str(value) )[ value is not None and value != vdef] 1 To correct the alignment in the proximity of division sites, the sequences can be re-aligned inside a window of size w >= 0 placed across each slicing site. output Output parameters quiet String " -q" " -q" 1 output_format Output format (-f) Choice 2 1 2 3 4 (defined $value and $value ne $vdef)? " -f $value" : "" ( "" , " -f " + str(value) )[ value is not None and value != vdef] 1 suppress_output Suppress output about progress of the program (-o) String " -o" " -o" 1 fasta_outfile Alignment file Alignment FASTA NEXUS $output_format eq "2" or $output_format eq "3" output_format == "2" or output_format == "3" "dca.out" "dca.out" aln_outfile DCA alignment file Dcalignment AbstractText CLUSTAL DCA $output_format eq "1" or $output_format eq "4" output_format == "1" or output_format == "4" "dca.out" "dca.out" Programs-5.1.1/scangen.xml0000644000175000001560000004015411767572177014342 0ustar bneronsis scangen 1.0 scangen Genomewide identification of cisregulatory motifs and modules H Rouault, K Mazouni, L Couturier, V Hakim and V Schweisguth Genomewide identification of cis regulatory motifs and modules underlying gene coregulation using statistics and phylogeny, National Academy of Sciences of the United States of America. August 17, 2010 vol. 107 no. 33 14615-14620 sequence:nucleic:regulation scangen model Execution mode Choice null null --motgen (defined $value and $value ne $vdef) ? " $value " : "" ( "" , " " + str(value) )[ value is not None and value !=vdef] general General options 1 width Width of the motif Integer 10 (defined $value) ? " -w $value" : "" ( "" , " -w " + str( value ) )[value is not None] threeshold Threshold used for motif scanning Float 13.0 (defined $value) ? " -t $value" : "" ( "" , " -t " + str( value ) )[value is not None] extent Extent of the motif search within an alignment Integer 20 (defined $value) ? " -x $value" : "" ( "" , " -x " + str( value ) )[value is not None] motgen Modgen options $model == '--motgen' model == '--motgen' 2 evolutionary Evolutionary model used for motif generation Choice 1 1 2 (defined $value and $value ne $vdef) ? " -e $value " : "" ( "" , " -e " + str(value) )[ value is not None and value !=vdef] coord_file File of enhancer coordinates Coordinates AbstractText $align_file is not defined align_file is None (defined $value) ? " --coord-file $value" : "" ( "" , " --coord-file " + str( value ) )[value is not None] list of sequence coordinates in the format: sequence_name chromosome_arm start_pos stop_pos seq_name1 X 10000 11000 seq_name2 2L 20000 21000 seq_name3 3L 30000 31000 filter Data filter Filename " && gawk '$3<1000 {print $2,$0}' motmeldb.txt | sort -g | sed 's/^.* //' > bestmotspval.dat" && distinfo -t 9.0 -w 10 bestmotspval.dat > finalMotifs.dat" " && gawk '$3<1000 {print $2,$0}' motmeldb.txt | sort -g | sed -e 's,^.* ,,' > bestmotspval.dat && distinfo -t 9.0 -w 10 bestmotspval.dat > finalMotifs.dat" filter2graph Filer to Graphic Filename " && gawk '{print $8,$9,$10,$11}' finalMotifs.dat > matrices.txt " && gawk '{print $8,$9,$10,$11}' finalMotifs.dat > matrices.txt" graph_format Graph output format Choice png png pdf graph_word Graph output width motif Integer 10 graphic Graphic plot Filename " && motpic-warg -w %s -f %s matrices.txt" % (graph_word,graph_format) " && motpic-warg -w %s -f %s matrices.txt" % (graph_word,graph_format) distinfo Distinfo output file ScangenMotifDefinition AbstractText $model == '--motgen' model == '--motgen' "finalMotifs.dat" "finalMotifs.dat" GraphOutput0 Graph output Picture Binary $model == '--motgen' model == '--motgen' "mat-auto-0*" "mat-auto-0*" GraphOutput1 Graph output Picture Binary $model == '--motgen' model == '--motgen' "mat-auto-1*" "mat-auto-1*" scangen Scangen options $model == ' ' model == ' ' 2 scanwidth Width of selected enhancers Integer 1000 (defined $value) ? " -s $value" : "" ( "" , " -s " + str( value ) )[value is not None and value !=vdef] scanstep Step of scanned genome Integer 50 (defined $value) ? " --scanstep=$value" : "" ( "" , " --scanstep=" + str( value ) )[value is not None and value !=vdef] nbmots Number of motifs to consider Integer 20 (defined $value) ? " -n $value" : "" ( "" , " -n " + str( value ) )[value is not None and value !=vdef] Value greater than or equal to 15 is required value >= 15 phenotype File containing a list of genes annotated with a relevant phenotype Phenotype AbstractText " -p pheno.txt" " -p pheno.txt" motifs File containing a list of motif definitons ScangenMotifDefinition AbstractText (defined $value) ? " -m $value" : "" ( "" , " -m " + str( value ) )[value is not None] scangenOutfile Scangen output file ScnagenReport AbstractText "result*" "result*" "seqs*" "hist*" Programs-5.1.1/cutseq.xml0000644000175000001560000001523212072525233014204 0ustar bneronsis cutseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net cutseq Removes a section from a sequence http://bioweb2.pasteur.fr/docs/EMBOSS/cutseq.html http://emboss.sourceforge.net/docs/themes sequence:edit cutseq e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_from Start of region to delete Integer ("", " -from=" + str(value))[value is not None] 2 This is the start position (inclusive) of the section of the sequence that you wish to remove. e_to End of region to delete Integer ("", " -to=" + str(value))[value is not None] 3 This is the end position (inclusive) of the section of the sequence that you wish to remove. e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename cutseq.e_outseq ("" , " -outseq=" + str(value))[value is not None] 4 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 5 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 6 Programs-5.1.1/consensus.xml0000644000175000001560000005541211767572177014747 0ustar bneronsis consensus 6d CONSENSUS Identification of consensus patterns in unaligned DNA and protein sequences Gerald Z.Hertz, G.D. Stormo G.Z. Hertz and G.D. Stormo. Identification of consensus patterns in unaligned DNA and protein sequences: a large-deviation statistical basis for penalizing gaps. In: Proceedings of the Third International Conference on Bioinformatics and Genome Research (H.A. Lim, and C.R. Cantor, editors). World Scientific Publishing Co., Ltd. Singapore, 1995. pages 201--216. http://gzhertz.home.comcast.net/~gzhertz/ http://gzhertz.home.comcast.net/~gzhertz/CONSENSUS_2004-04-14.TAR.gz sequence:protein:pattern sequence:nucleic:pattern prog Program to run Choice consensus consensus wconsensus "fasta-consensus <$sequence >$sequence.wcons ; $prog " "fasta-consensus <" + str(sequence) + " >" + str(sequence) + ".wcons ; " + str(prog) 0 sequence Sequences file (-f) Sequence FASTA " -f $sequence.wcons" " -f " + str(sequence) + ".wcons" 1 required Required parameter 2 width Width of pattern for consensus program (-L) Integer $prog eq "consensus" prog == "consensus" (defined $value) ? " -L$value" : "" ( "" , " -L" + str(value) )[ value is not None ] standard_deviation Number of standard deviations to lower the information content at each position before identifying information peaks for wconsensus program (-s) Float $prog eq "wconsensus" prog == "wconsensus" 1 " -s$value" " -s" + str(value) A range of values should be tried. For example, try values of 0.5, 1, 1.5, and 2. The overall best alignment is the one having the smallest e-value. basic_options Basic options 2 alphabet_options Alphabet options alphabet Choose an alphabet Choice null null dna-alphabet prot-alphabet users ($value eq "dna-alphabet" or $value eq "prot-alphabet" )? "" : " -a $value " ("", " -a "+ value)[ value in( "dna-alphabet" , "prot-alphabet" )] For 'User file' choice: A user alphabet file is mandatory ($value eq "users" and not defined $ascii_alphabet) or ($value eq "prot-alphabet" or $value eq "dna-alphabet" or $value eq "null") (value == "users" and ascii_alphabet is not None) or (value in ("prot-alphabet","dna-alphabet", "null") ) ascii_alphabet User Alphabet file (-a) ConsensusAlphabet AbstractText $alphabet eq "users" alphabet == "users" (defined $value) ? " -a $value" : "" ( "" , " -a " + str(value) )[ value is not None ] Each line contains a letter (a symbol in the alphabet) followed by an optional normalization number (default: 1.0). The normalization is based on the relative prior probabilities of the letters. For nucleic acids, this might be the genomic frequency of the bases; however, if the -d option is not used, the frequencies observed in your own sequence data are used. In nucleic acid alphabets, a letter and its complement appear on the same line, separated by a colon (a letter can be its own complement, e.g. when using a dimer alphabet). Complementary letters may use the same normalization number. Only the standard 26 letters are permissible; however, when the -CS option is used, the alphabet is case sensitive so that a total of 52 different characters are possible. POSSIBLE LINE FORMATS WITHOUT COMPLEMENTARY LETTERS: letter letter normalization POSSIBLE LINE FORMATS WITH COMPLEMENTARY LETTERS: letter:complement letter:complement normalization letter:complement normalization:complement's_normalization prior Use the designated prior probabilities of the letters to override the observed frequencies (-d) Boolean 0 ($value) ? " -d" : "" ( "" , " -d" )[ value ] By default, the program uses the frequencies observed in your own sequence data for the prior probabilities of the letters. However, if the -d option is set, the prior probabilities designated by the alphabet options. If the -d option is not set, they are still used to determine the sequence alphabet, but any prior probability information is ignored. complement Complement of nucleic acid sequences (-c) Choice 0 0 1 2 3 (defined $value and $value ne $vdef) ? " -c$value" : "" ( "" , " -c" + str(value) )[ value is not None and value != vdef] Symmetrical pattern (3) is for the consensus program only ($value eq "3" and $prog eq "consensus") or ($value eq "0" or $value eq "1" or $value eq "2" ) (value == "3" and prog == "consensus") or (value == "0" or value == "1" or value == "2" ) max_cycle How many words per matrix for each sequence to contribute Choice null null -n -N ($value ne $vdef) ? " $value$max_cycle_nb" : "" ( "" , " " + str(value) + str(max_cycle_nb) )[ value != vdef ] For the -n/-N option, you must define a Maximum repeat of the matrix building cycle. ($value eq "-n" or $value eq "-N") and defined($max_cycle_nb) (value == "-n" or value =="-N") and max_cycle_nb is not None -n integer: repeat the matrix building cycle a maximum of "integer" times and allow each sequence to contribute zero or more words per matrix. -N integer: repeat the matrix building cycle a maximum of "integer" times and allow each sequence to contribute one or more words per matrix max_cycle_nb Maximum repeat of the matrix building cycle for -n or -N Integer "" "" advenced_options Advanced options 2 queue Maximum number of matrices to save between cycles of the program (-q) Integer 200 (defined $value ) ? " -q $value" : "" ( "" , " -q " + str(value) )[ value is not None ] distance Minimum distance between the starting points of words within the same matrix pattern (-m) Integer $max_cycle ne "null" max_cycle != "null" (defined $value and $value != $vdef) ? " -m$value " : "" ( "" , " -m" + str(value) )[ value is not None and value != vdef] The value must be positive and this option can only be used when the "-n" or "-N" option is also used. $value <= 0 and ($max_cycle == "-n" or $max_cycle == "-N") value > 0 and (max_cycle == "-n" or max_cycle == "-N") The minimum distance between the starting points of words within the same matrix pattern; must be a positive integer; can only be used when the "-n" or "-N" option is also used. For wconsensus, the default value is 1. For consensus, this number is indicated by the width (-L). terminate Terminate the program this number of cycles after the current most significant alignment is identified (-t) Integer (defined $value) ? " -t$value " : "" ( "" , " -t" + str(value) )[ value is not None ] default: terminate only when the maximum number of matrix building cycles is completed. progeny Save the top progeny matrices Choice -pr2 -pr1 -pr2 (defined $value and $value ne $vdef) ? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] -pr2 option prevents a strong pattern found in only a subset of the sequences from overwhelming the algorithm and eliminating other potential patterns. This undesirable situation can occur when a subset of the sequences share an evolutionary relationship not common to the majority of the sequences. linearly Seed with the first sequence and proceed linearly through the list (-l) Boolean 0 ($value) ? " -l" : "" ( "" , " -l" )[ value ] The -l and -n option are mutually exclusive ($value == 1 and $max_cycle != "-n") or $value == 0 (value == 1 and max_cycle != "-n") or value == 0 This option results in a significant speed up in the program, but the algorithm becomes dependent on the order of the sequence-file names. terminal_gap Permit terminal gaps for wconsensus program Choice $prog eq "wconsensus" prog == "wconsensus" -pg0 -pg0 -pg1 (defined $value and $value ne $vdef) ? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] output_options Output options 2 top_matrices Number of top matrices to print (-pt) Integer 4 (defined $value and $value != $vdef) ? " -pt$value" : "" ( "" , " -pt" + str(value) )[ value is not None and value != vdef] 2 A negative value means print all the top matrices. final_matrices Number of final matrices to print (-pf) Integer 4 (defined $value and $value != $vdef) ? " -pf$value" : "" ( "" , " -pf" + str(value) )[ value is not None and value != vdef] Default when NOT using -n or -N option: print 4 matrices; default when using -n or -N option: print no matrices. outfile Results file Text " > $prog.results" " > " + str(prog) + ".results" 50 "$prog.results" str(prog) + ".results" consensus_format wcons file Text "*.wcons" "*.wcons" Programs-5.1.1/helixturnhelix.xml0000644000175000001560000002451412072525233015757 0ustar bneronsis helixturnhelix EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net helixturnhelix Identify nucleic acid-binding motifs in protein sequences http://bioweb2.pasteur.fr/docs/EMBOSS/helixturnhelix.html http://emboss.sourceforge.net/docs/themes sequence:protein:2D_structure sequence:protein:motifs structure:2D_structure helixturnhelix e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_additional Additional section e_mean Mean value (value from 1. to 10000.) Float 238.71 ("", " -mean=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1. is required value >= 1. Value less than or equal to 10000. is required value <= 10000. 2 e_sd Standard deviation value (value from 1. to 10000.) Float 293.61 ("", " -sd=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1. is required value >= 1. Value less than or equal to 10000. is required value <= 10000. 3 e_minsd Minimum sd (value from 0. to 100.) Float 2.5 ("", " -minsd=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0. is required value >= 0. Value less than or equal to 100. is required value <= 100. 4 e_eightyseven Use the old (1987) weight data Boolean 0 ("", " -eightyseven")[ bool(value) ] 5 e_output Output section e_outfile Name of the report file Filename report.hth ("" , " -outfile=" + str(value))[value is not None] 6 e_rformat_outfile Choose the report output format Choice MOTIF DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 7 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 8 Programs-5.1.1/isochore.xml0000644000175000001560000002277412072525233014524 0ustar bneronsis isochore EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net isochore Plots isochores in DNA sequences http://bioweb2.pasteur.fr/docs/EMBOSS/isochore.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:composition isochore e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_additional Additional section e_window Window size (value greater than or equal to 1) Integer 1000 ("", " -window=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 2 e_shift Shift increment (value greater than or equal to 1) Integer 100 ("", " -shift=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 3 e_output Output section e_outfile Name of the output file (e_outfile) Filename outfile.iso ("" , " -outfile=" + str(value))[value is not None] 4 e_outfile_out outfile_out option IsochoreReport Report e_outfile e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 5 xy_goutfile Name of the output graph Filename isochore_xygraph ("" , " -goutfile=" + str(value))[value is not None] 6 xy_outgraph_png Graph file Picture Binary e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_graph == "data" "*.dat" auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/mview_alignment.xml0000644000175000001560000010165612073003734016071 0ustar bneronsis mview_alignment 1.49 MVIEW Multiple alignment pretty viewer N. P. Brown Brown, N.P., Leroy C., Sander C. (1998). MView: A Web compatible database search or multiple alignment viewer. Bioinformatics. 14(4):380-381. http://bioweb2.pasteur.fr/docs/mview/index.html http://bio-mview.sourceforge.net/ alignment:multiple:display display:alignment:multiple mview alig Alignment File Alignment CLUSTAL " -in clustal $value" " -in clustal "+str(value) 1000 outformat Output format (-out) Choice html html msf pearson pir rdb (defined $value and $value ne $vdef) ? "-out $value" : "" ( "", " -out " + str(value) )[ value is not None and value != vdef] 1 main_formatting_options Main formatting options 2 ruler Attach a ruler (-ruler) Boolean 0 ($value) ? " -ruler on" : "" ( "" , " -ruler on" )[ value ] alignment Show alignment (-alignment) Boolean 1 ($value) ? "" : " -alignment off" ( " -alignment off" , "" )[ value ] consensus Show consensus (-consensus) Boolean 0 ($value) ? " -consensus on" : "" ( "" , " -consensus on" )[ value ] dna Use DNA/RNA colormaps and/or consensus groups (-dna) Boolean 0 ($value) ? " -dna" : "" ( "" , " -dna" )[ value ] alignment_options Alignment options 3 coloring Colour scheme (-coloring) Choice none none any identity consensus group ($value and $value ne $vdef) ? " -coloring $value" : "" ( "" , " -coloring " + str(value) )[ value !=vdef ] 3
  • Colour all the residues, will colour every residue according to the currently selected palette.
  • Colouring by identity to the first sequence, will colour only those residues that are identical to some reference sequence (usually the query or first row).
  • Colour only when above a given percent similarity, will colour only those residues that belong to a specified physicochemical class that is conserved in at least a specified percentage of all rows for a given column. This defaults to 70% and and may be set to another threshold, eg., -coloring consensus -threshold 80 would specify 80%. Note that the physicochemical classes in question can be confined to individual residues.
  • Colours residues by the colour of the class to which they belong, is like -coloring consensus, but colours residues by the colour of the class to which they belong.

By default, the consensus computation counts gap characters, so that sections of the alignment may be uncolored where the presence of gaps prevents the non-gap count from reaching the threshold. Setting -con_gaps off prevents this, allowing sequence-only based consensus thresholding.

threshold Threshold percentage for consensus coloring (-threshold) Float 70.0 (defined $value and $value != $vdef) ? " -threshold $value" : "" ( "" , " -threshold " + str(value) )[ value is not None and value != vdef] 3 ignore Ignore singleton or class group (-ignore) Choice none class none singleton (defined $value and $value ne $vdef) ? " -ignore $value" : "" ( "" , " -ignore " + str(value) )[ value is not None and value != vdef] 3 Tip: If you want to see only the conserved residues above the threshold (ie., only one type of conserved residue per column), add the option -ignore class.
consensus_options Consensus options $consensus consensus 3 con_coloring Basic style of coloring (-con_coloring) Choice none any identity none (defined $value) ? " -con_coloring $value" : "" ( "" , " -con_coloring " + str(value) )[ value is not None ] 3

Colouring of an alignment by consensus determines which residues to colour and the colours to use based on

  1. the consensus threshold chosen for the colouring operation,
  2. a consideration of the common physicochemical properties of the residues in that column,
  3. the chosen colour scheme.
con_threshold Consensus line thresholds (in range 50..100) (separated by commas) (-con_threshold) String 100,90,80,70 (defined $value and $value ne $vdef) ? " -con_threshold $value" : "" ( "" , " -con_threshold " + str(value) )[ value is not None and value != vdef] 3 con_ignore Ignore singleton or class group (-con_ignore) Choice none class none singleton (defined $value and $value ne $vdef) ? " -con_ignore $value" : "" ( "" , " -con_ignore " + str(value) )[ value is not None and value != vdef] 3
hybrid_alignment_consensus_options Hybrid alignment and consensus options 3 con_gaps Count gaps during consensus computations (-con_gaps) Boolean 1 ($value) ? "" : " -con_gaps off" ( " -con_gaps off" , "" )[ value ] 3 general_row_column_filters General row/column filters 3 top Report top N hits (-top) Integer (defined $value) ? " -top $value" : "" ( "" , " -top " + str(value) )[ value is not None ] range Display column range M:N (-range) String (defined $value and $value =~ s/,/:/g) ? " -range $value" : "" ( "" , " -range " + str(value).replace(',',':') )[ value is not None ] You must enter a string composed of two numbers separated by comma maxident Only report sequences with percent identity <= N (-maxident) Integer 100 (defined $value and $value != $vdef) ? " -maxident $value" : "" ( "" , " -maxident " + str(value) )[ value is not None and value != vdef] ref Use row N or row identifier as percent identity reference (-ref) Integer (defined $value) ? " -ref $value" : "" ( "" , " -ref " + str(value) )[ value is not None ] keep_only Keep only the rows from start to end (separated by comma) (-keep) String (defined $value ) ? " -keep $value" : "" ( "" , " -keep " + str(value) )[ value is not None ] You must enter a string composed of two numbers separated by comma disc Discard rows from start to end (separated by comma) (-disc) String (defined $value and $value =~ s/,/../ ) ? " -disc $value" : "" ( "" , " -disc " + str(value).replace(',','..') )[ value is not None ] You must enter a string composed of two numbers separated by comma nops Display rows unprocessed (separated by comma ) (-nops) String (defined $value and $value =~ s/,/../) ? " -nops $value" : "" ( "" , " -nops " + str(value).replace(',','..' ))[ value is not None ] You must enter a string composed of two numbers separated by comma general_formatting_options General formatting options 3 width Paginate in N columns of alignment (-width) Integer (defined $value) ? " -width $value" : "" ( "" , " -width " + str(value) )[ value is not None ] gap Use this character as the gap (-gap) String (defined $value) ? " -gap $value" : " " ( " " , " -gap " + str(value) )[ value is not None] label0 Switch off label: row number (-label0) Boolean 0 ($value) ? " -label0" : "" ( "" , " -label0" )[ value ] label1 Switch off label: identifier (-label1) Boolean 0 ($value) ? " -label1" : "" ( "" , " -label1" )[ value ] label2 Switch off label: description (-label2) Boolean 0 ($value) ? " -label2" : "" ( "" , " -label2" )[ value ] label3 Switch off label: scores (-label3) Boolean 0 ($value) ? " -label3" : "" ( "" , " -label3" )[ value ] label4 Switch off label: percent identity (-label4) Boolean 0 ($value) ? " -label4" : "" ( "" , " -label4" )[ value ] register Output multi-pass alignments with columns in register (-register) Boolean 1 ($value) ? "" : " -register off" ( " -register off" , "" )[ value ] html_markup_options HTML markup options $outformat eq "html" outformat == "html" 3 html_output Amount of HTML markup (-html) Choice full full head body data css off (defined $value]) ? "-html $value":"" ("", " -html " + str(value))[value is not None] Title Page title string String (defined $value) ? " -title $value" : "" ( "" , " -title " + str(value) )[ value is not None ] pagecolor Page background color (-pagecolor) String white (defined $value and $value ne $vdef) ? " -pagecolor $value" : "" ( "" , " -pagecolor " + str(value) )[ value is not None and value != vdef] textcolor Page text color (-textcolor) String black (defined $value and $value ne $vdef) ? " -textcolor $value" : "" ( "" , " -textcolor " + str(value) )[ value is not None and value != vdef] linkcolor Link color (-linkcolor) String blue (defined $value and $value ne $vdef) ? " -linkcolor $value" : "" ( "" , " -linkcolor " + str(value) )[ value is not None and value != vdef] 3 alinkcolor Active link color (-alinkcolor) String red (defined $value and $value ne $vdef) ? " -alinkcolor $value" : "" ( "" , " -alinkcolor " + str(value) )[ value is not None and value != vdef] vlinkcolor Visited link color (-vlinkcolor) String purple (defined $value and $value ne $vdef) ? " -vlinkcolor $value" : "" ( "" , " -vlinkcolor " + str(value) )[ value is not None and value != vdef] alncolor Alignment background color (-alncolor) String white (defined $value and $value ne $vdef) ? " -alncolor $value" : "" ( "" , " -alncolor " + str(value) )[ value is not None and value != vdef] labcolor Alignment label color (-labcolor) String black (defined $value and $value ne $vdef) ? " -labcolor $value" : "" ( "" , " -labcolor " + str(value) )[ value is not None and value != vdef] symcolor Alignment default text color (-symcolor) String (defined $value) ? " -symcolor $value" : " " ( " " , " -symcolor " + str(value) )[ value is not None ] gapcolor Alignment gap color (-gapcolor) String (defined $value) ? " -gapcolor $value" : "" ( "" , " -gapcolor " + str(value) )[ value is not None ] bold Use bold emphasis for coloured residues (-bold) Boolean 0 ($value) ? " -bold" : "" ( "" , " -bold" )[ value ] css Use Cascading Style Sheets (-css) Choice off off on (defined $value and $value eq $vdef) ? " -css on" : "" ( "" , " -css on" )[ value is not None and value != vdef] alig_file Alignment output file Alignment FASTA MSF CODATA $outformat eq "msf" or $outformat eq "pearson" or $outformat eq "pir" outformat in [ "msf", "pearson" , "pir"] "mview_alignment.out" "mview_alignment.out" html_file Alignment output file Report HTML $outformat eq "html" outformat == "html" " && ln -s mview_alignment.out mview_alignment.html" " && ln -s mview_alignment.out mview_alignment.html" "mview_alignment.html" "mview_alignment.html" 10000 rdb_file Alignment output in RDB format Report RDB $outformat eq "rdb" outformat == "rdb" "mview_alignment.out" "mview_alignment.out"
Programs-5.1.1/patmatmotifs.xml0000644000175000001560000002026512072525233015412 0ustar bneronsis patmatmotifs EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net patmatmotifs Scan a protein sequence with motifs from the PROSITE database http://bioweb2.pasteur.fr/docs/EMBOSS/patmatmotifs.html http://emboss.sourceforge.net/docs/themes sequence:protein:motifs patmatmotifs e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_additional Additional section e_full Provide full documentation for matching patterns Boolean 0 ("", " -full")[ bool(value) ] 2 e_prune Ignore simple patterns Boolean 1 (" -noprune", "")[ bool(value) ] 3 Ignore simple patterns. If this is true then these simple post-translational modification sites are not reported: myristyl, asn_glycosylation, camp_phospho_site, pkc_phospho_site, ck2_phospho_site, and tyr_phospho_site. e_output Output section e_outfile Name of the report file Filename patmatmotifs.report ("" , " -outfile=" + str(value))[value is not None] 4 e_rformat_outfile Choose the report output format Choice DBMOTIF DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 5 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 6 Programs-5.1.1/trimest.xml0000644000175000001560000002516712072525233014377 0ustar bneronsis trimest EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net trimest Remove poly-A tails from nucleotide sequences http://bioweb2.pasteur.fr/docs/EMBOSS/trimest.html http://emboss.sourceforge.net/docs/themes sequence:edit trimest e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_additional Additional section e_minlength Minimum length of a poly-a tail (value greater than or equal to 1) Integer 4 ("", " -minlength=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 2 This is the minimum length that a poly-A (or poly-T) tail must have before it is removed. If there are mismatches in the tail than there must be at least this length of poly-A tail before the mismatch for the mismatch to be considered part of the tail. e_mismatches Number of contiguous mismatches allowed in a tail (value greater than or equal to 0) Integer 1 ("", " -mismatches=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 3 If there are this number or fewer contiguous non-A bases in a poly-A tail then, if there are '-minlength' 'A' bases before them, they will be considered part of the tail and removed . For example the terminal 4 A's of GCAGAAAA would be removed with the default values of -minlength=4 and -mismatches=1 (There are not at least 4 A's before the last 'G' and so only the A's after it are considered to be part of the tail). The terminal 9 bases of GCAAAAGAAAA would be removed; There are at least -minlength A's preceeding the last 'G', so it is part of the tail. e_reverse Write the reverse complement when poly-t is removed Boolean 1 (" -noreverse", "")[ bool(value) ] 4 When a poly-T region at the 5' end of the sequence is found and removed, it is likely that the sequence is in the reverse sense. This option will change the sequence to the forward sense when it is written out. If this option is not set, then the sense will not be changed. e_tolower Change poly-a tail to lower-case Boolean 0 ("", " -tolower")[ bool(value) ] 5 The poly-A region can be 'masked' by converting the sequence characters to lower-case. Some non-EMBOSS programs e.g. fasta can interpret this as a masked region. The sequence is unchanged apart from the case change. You might like to ensure that the whole sequence is in upper-case before masking the specified regions to lower-case by using the '-supper' sequence qualifier. e_advanced Advanced section e_fiveprime Remove poly-t tails at the 5' end of the sequence. Boolean 1 (" -nofiveprime", "")[ bool(value) ] 6 If this is set true, then the 5' end of the sequence is inspected for poly-T tails. These will be removed if they are longer than any 3' poly-A tails. If this is false, then the 5' end is ignored. e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename trimest.e_outseq ("" , " -outseq=" + str(value))[value is not None] 7 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 8 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 9 Programs-5.1.1/netNglyc.xml0000644000175000001560000001653611533171506014474 0ustar bneronsis netNglyc 1.0a netNglyc predict N-glycosylation sites in proteins. http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?netNglyc Ramneek Gupta, ramneek@cbs.dtu.dk Prediction of N-glycosylation sites in human proteins. R. Gupta, E. Jung and S. Brunak. In preparation, 2004. http://www.cbs.dtu.dk/services/NetNGlyc/ netNglyc predicts N-glycosylation sites in human proteins using artificial neural networks that examine the sequence context of Asn-Xaa-Ser/Thr sequons where Xaa is any amino acid but proline. sequence:protein:motifs sequence:protein:pattern sequence:protein:profiles netnglyc String "netNglyc " "netNglyc " sequence Input Sequence Sequence FASTA " $value" " " + str( value ) 50 >CBG_HUMAN MPLLLYTCLLWLPTSGLWTVQAMDPNAAYVNMSNHHRGLASANVDFAFSLYKHLVALSPK KNIFISPVSISMALAMLSLGTCGHTRAQLLQGLGFNLTERSETEIHQGFQHLHQLFAKSD TSLEMTMGNALFLDGSLELLESFSADIKHYYESEVLAMNFQDWATASRQINSYVKNKTQG KIVDLFSGLDSPAILVLVNYIFFKGTWTQPFDLASTREENFYVDETTVVKVPMMLQSSTI SYLHDSELPCQLVQMNYVGNGTVFFILPDKGKMNTVIAALSRDTINRWSAGLTSSQVDLY IPKVTISGVYDLGDVLEEMGIADLFTNQANFSRITQDAQLKSSKVVHKAVLQLNEEGVDT AGSTGVTLNLTSKPIILRFNQPFIIMIFDHFTWSSLFLARVMNPV graphics generate graphics (-g). Boolean 0 ($value)? "-g " : "" ( "" , "-g " )[ bool( value ) ] Generate graphics, plotting the N-glycosylation potential and the thresh†old(s) against the residue number of each predicted site. Two files will be produced for each input sequence, one in PostScript and the other in GIF. 10 threshold Show additional thresholds (0.32, 0.75, 0.90) in the graph(s). Boolean 0 ( defined $value )? "-a " : "" ( "" , "-a " )[ bool( value ) ] Show additional thresholds (0.32, 0.75 and 0.90) in the graphs. This option is ignored unless -g is also given. 20 aspargine Predict on all Asn residues (-f). Boolean 0 ( defined )? "-f " : "" ( "","-f ")[ bool( value ) ] Predict on all asparagines in the input. Note that asparagines that do not occur within the Asn-Xaa-Ser/Thr sequon are unlikely to be glycosylated, no matter what the prediction score. The default is to predict only on the asparagines in the Asn-Xaa-Ser/Thr triplet. 30 results netNglyc report. Report NetNglyc "netNglyc.out" "netNglyc.out"

Each input sequence is displayed with the predicted N-glycosylation sites highlighted. For each site the following is shown:

  • sequence name
  • position in the sequence
  • sequence motif
  • N-glycosylation potential
  • Jury agreement, 9 networks
  • Prediction strength (+, ++ or +++)
postscript graphic in PostScript Binary NetNGlyc_Graph Postscript graphics "*.ps" "*.ps" plotting the N-glycosylation potential and the threshold(s) against the residue number of each predicted site. gif graphic in GIF Binary NetNGlyc_Graph GIF $graphics graphics "*.gif" "*.gif" plotting the N-glycosylation potential and the threshold(s) against the residue number of each predicted site.
Programs-5.1.1/chips.xml0000644000175000001560000001057012072525233014006 0ustar bneronsis chips EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net chips Calculates Nc codon usage statistic http://bioweb2.pasteur.fr/docs/EMBOSS/chips.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:codon_usage chips e_input Input section e_seqall seqall option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -seqall=" + str(value))[value is not None] 1 e_advanced Advanced section e_sum Sum codons over all sequences Boolean 1 (" -nosum", "")[ bool(value) ] 2 e_output Output section e_outfile Name of the output file (e_outfile) Filename chips.e_outfile ("" , " -outfile=" + str(value))[value is not None] 3 e_outfile_out outfile_out option ChipsReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 4 Programs-5.1.1/garnier.xml0000644000175000001560000002075612072525233014336 0ustar bneronsis garnier EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net garnier Predicts protein secondary structure using GOR method http://bioweb2.pasteur.fr/docs/EMBOSS/garnier.html http://emboss.sourceforge.net/docs/themes sequence:protein:2D_structure structure:2D_structure garnier e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_advanced Advanced section e_idc Index decision constants parameter (value from 0 to 6) Integer 0 ("", " -idc=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 Value less than or equal to 6 is required value <= 6 2 In their paper, GOR mention that if you know something about the secondary structure content of the protein you are analyzing, you can do better in prediction. 'idc' is an index into a set of arrays, dharr[] and dsarr[], which provide 'decision constants' (dch, dcs), which are offsets that are applied to the weights for the helix and sheet (extend) terms. So, idc=0 says don't use the decision constant offsets, and idc=1 to 6 indicates that various combinations of dch,dcs offsets should be used. e_output Output section e_outfile Name of the report file Filename garnier.report ("" , " -outfile=" + str(value))[value is not None] 3 e_rformat_outfile Choose the report output format Choice TAGSEQ DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 4 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 5 Programs-5.1.1/cpgreport.xml0000644000175000001560000001665612072525233014720 0ustar bneronsis cpgreport EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net cpgreport Identify and report CpG-rich regions in nucleotide sequence(s) http://bioweb2.pasteur.fr/docs/EMBOSS/cpgreport.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:cpg_islands cpgreport e_input Input section e_sequence sequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_score Cpg score (value from 1 to 200) Integer 17 ("", " -score=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 Value less than or equal to 200 is required value <= 200 2 This sets the score for each CG sequence found. A value of 17 is more sensitive, but 28 has also been used with some success. e_output Output section e_outfile Name of the report file Filename cpgreport.e_outfile ("" , " -outfile=" + str(value))[value is not None] 3 e_outfile_out outfile_out option CpgreportReport Report e_outfile e_outfeat Name of the output feature file (e_outfeat) DNA Filename cpgreport.e_outfeat ("" , " -outfeat=" + str(value))[value is not None] 4 File for output features e_offormat_outfeat Choose the feature output format DNA Choice GFF GFF EMBL SWISSPROT NBRF CODATA ("", " -offormat=" + str(value))[value is not None and value!=vdef] 5 e_outfeat_out outfeat_out option DNA Feature AbstractText e_outfeat auto Turn off any prompting String " -auto -stdout" 6 Programs-5.1.1/infoseq.xml0000644000175000001560000003235112072525233014345 0ustar bneronsis infoseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net infoseq Display basic information about sequences http://bioweb2.pasteur.fr/docs/EMBOSS/infoseq.html http://emboss.sourceforge.net/docs/themes information infoseq e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_advanced Advanced section e_columns Print information in neat, aligned columns. Boolean 1 (" -nocolumns", "")[ bool(value) ] 2 Set this option on (Y) to print the sequence information into neat, aligned columns in the output file. Alternatively, leave it unset (N), in which case the information records will be delimited by a character, which you may specify by using the -delimiter option. In other words, if -columns is set on, the -delimiter option is overriden. e_delimiter Delimiter of records in text output file String | ("", " -delimiter=" + str(value))[value is not None and value!=vdef] 3 This string, which is usually a single character only, is used to delimit individual records in the text output file. It could be a space character, a tab character, a pipe character or any other character or string. e_output Output section e_outfile Name of the output file (e_outfile) Filename infoseq.e_outfile ("" , " -outfile=" + str(value))[value is not None] 4 If you enter the name of a file here then this program will write the sequence details into that file. e_outfile_out outfile_out option InfoseqReport Report e_outfile e_html Format output as an html table Boolean 0 ("", " -html")[ bool(value) ] 5 e_only Display the specified columns Boolean 0 ("", " -only")[ bool(value) ] 6 This is a way of shortening the command line if you only want a few things to be displayed. Instead of specifying: '-nohead -noname -noacc -notype -nopgc -nodesc' to get only the length output, you can specify '-only -length' e_heading Display column headings Boolean 1 (" -noheading", "")[ bool(value) ] 7 e_usa Display the usa of the sequence Boolean not e_only 0 ("", " -usa")[ bool(value) ] 8 e_database Display 'database' column Boolean not e_only 0 ("", " -database")[ bool(value) ] 9 e_name Display 'name' column Boolean not e_only 0 ("", " -name")[ bool(value) ] 10 e_accession Display 'accession' column Boolean not e_only 0 ("", " -accession")[ bool(value) ] 11 e_gi Display 'gi' column Boolean 0 ("", " -gi")[ bool(value) ] 12 e_seqversion Display 'version' column Boolean 0 ("", " -seqversion")[ bool(value) ] 13 e_type Display 'type' column Boolean not e_only 0 ("", " -type")[ bool(value) ] 14 e_length Display 'length' column Boolean not e_only 0 ("", " -length")[ bool(value) ] 15 e_pgc Display 'percent gc content' column Boolean not e_only 0 ("", " -pgc")[ bool(value) ] 16 e_organism Display 'organism' column Boolean not e_only 0 ("", " -organism")[ bool(value) ] 17 e_description Display 'description' column Boolean not e_only 0 ("", " -description")[ bool(value) ] 18 auto Turn off any prompting String " -auto -stdout" 19 Programs-5.1.1/digest.xml0000644000175000001560000003304712072525233014163 0ustar bneronsis digest EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net digest Reports on protein proteolytic enzyme or reagent cleavage sites http://bioweb2.pasteur.fr/docs/EMBOSS/digest.html http://emboss.sourceforge.net/docs/themes sequence:protein:motifs digest e_input Input section e_seqall seqall option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -seqall=" + str(value))[value is not None] 1 e_mwdata Molecular weights data file MolecularWeights AbstractText ("", " -mwdata=" + str(value))[value is not None ] 2 Molecular weight data for amino acids e_required Required section e_menu Enzymes and reagents Choice 1 1 2 3 4 5 6 7 8 ("", " -menu=" + str(value))[value is not None and value!=vdef] 3 e_mono Use monoisotopic weights Boolean 0 ("", " -mono")[ bool(value) ] 4 e_advanced Advanced section e_unfavoured Allow unfavoured cuts Boolean 0 ("", " -unfavoured")[ bool(value) ] 5 Trypsin will not normally cut after 'KR' if they are followed by any of 'KRIFLP'. Lys-C will not normally cut after 'K' if it is followed by 'P'. Arg-C will not normally cut after 'R' if it is followed by 'P'. V8-bicarb will not normally cut after 'E' if it is followed by any of 'KREP'. V8-phosph will not normally cut after 'DE' if they are followed by 'P'. Chymotrypsin will not normally cut after 'FYWLM' if they are followed by 'P'. Specifying unfavoured shows these unfavoured cuts as well as the favoured ones. e_ragging Allow ragging Boolean 0 ("", " -ragging")[ bool(value) ] 6 Allows semi-specific and non-specific digestion. This option is particularly useful for generating lists of peptide sequences for protein identification using mass-spectrometry. e_termini Ragging value (value from 1 to 4) Choice 1 1 2 3 4 ("", " -termini=" + str(value))[value is not None and value!=vdef] 7 e_output Output section e_overlap Show overlapping partials Boolean 0 ("", " -overlap")[ bool(value) ] 8 Used for partial digestion. Shows all cuts from favoured cut sites plus 1..3, 2..4, 3..5 etc but not (e.g.) 2..5. Overlaps are therefore fragments with exactly one potential cut site within it. e_allpartials Show all partials Boolean 0 ("", " -allpartials")[ bool(value) ] 9 As for overlap but fragments containing more than one potential cut site are included. e_outfile Name of the report file Filename digest.report ("" , " -outfile=" + str(value))[value is not None] 10 e_rformat_outfile Choose the report output format Choice SEQTABLE DASGFF DBMOTIF DIFFSEQ EMBL EXCEL FEATTABLE GENBANK GFF LISTFILE MOTIF NAMETABLE CODATA REGIONS SEQTABLE SIMPLE SRS SWISS TABLE TAGSEQ ("", " -rformat=" + str(value))[value is not None and value!=vdef] 11 e_outfile_out outfile_out option Text e_rformat_outfile in ['DASGFF', 'DBMOTIF', 'DIFFSEQ', 'EMBL', 'EXCEL', 'FEATTABLE', 'GENBANK', 'GFF', 'LISTFILE', 'MOTIF', 'NAMETABLE', 'CODATA', 'REGIONS', 'SEQTABLE', 'SIMPLE', 'SRS', 'SWISS', 'TABLE', 'TAGSEQ'] e_outfile auto Turn off any prompting String " -auto -stdout" 12 Programs-5.1.1/codcmp.xml0000644000175000001560000017574211672346320014165 0ustar bneronsis codcmp EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net codcmp Codon usage table comparison http://bioweb2.pasteur.fr/docs/EMBOSS/codcmp.html http://emboss.sourceforge.net/docs/themes sequence:nucleic:codon_usage codcmp e_input Input section e_first first option Choice mobyle_null mobyle_null Eacc.cut Eacica.cut Eadenovirus5.cut Eadenovirus7.cut Eagrtu.cut Eaidlav.cut Eanasp.cut Eani.cut Eani_h.cut Eanidmit.cut Earath.cut Easn.cut Eath.cut Eatu.cut Eavi.cut Eazovi.cut Ebacme.cut Ebacst.cut Ebacsu.cut Ebacsu_high.cut Ebja.cut Ebly.cut Ebme.cut Ebmo.cut Ebna.cut Ebommo.cut Ebov.cut Ebovin.cut Ebovsp.cut Ebpphx.cut Ebraja.cut Ebrana.cut Ebrare.cut Ebst.cut Ebsu.cut Ebsu_h.cut Ecac.cut Ecaeel.cut Ecal.cut Ecanal.cut Ecanfa.cut Ecaucr.cut Eccr.cut Ecel.cut Echi.cut Echick.cut Echicken.cut Echisp.cut Echk.cut Echlre.cut Echltr.cut Echmp.cut Echnt.cut Echos.cut Echzm.cut Echzmrubp.cut Ecloab.cut Ecpx.cut Ecre.cut Ecrigr.cut Ecrisp.cut Ectr.cut Ecyapa.cut Edayhoff.cut Eddi.cut Eddi_h.cut Edicdi.cut Edicdi_high.cut Edog.cut Edro.cut Edro_h.cut Edrome.cut Edrome_high.cut Edrosophila.cut Eeca.cut Eeco.cut Eeco_h.cut Eecoli.cut Eecoli_high.cut Eemeni.cut Eemeni_high.cut Eemeni_mit.cut Eerwct.cut Ef1.cut Efish.cut Efmdvpolyp.cut Ehaein.cut Ehalma.cut Ehalsa.cut Eham.cut Ehha.cut Ehin.cut Ehma.cut Ehorvu.cut Ehum.cut Ehuman.cut Ekla.cut Eklepn.cut Eklula.cut Ekpn.cut Elacdl.cut Ella.cut Elyces.cut Emac.cut Emacfa.cut Emaize.cut Emaize_chl.cut Emam_h.cut Emammal_high.cut Emanse.cut Emarpo_chl.cut Emedsa.cut Emetth.cut Emixlg.cut Emouse.cut Emsa.cut Emse.cut Emta.cut Emtu.cut Emus.cut Emussp.cut Emva.cut Emyctu.cut Emze.cut Emzecp.cut Encr.cut Eneigo.cut Eneu.cut Eneucr.cut Engo.cut Eoncmy.cut Eoncsp.cut Eorysa.cut Eorysa_chl.cut Epae.cut Epea.cut Epet.cut Epethy.cut Epfa.cut Ephavu.cut Ephix174.cut Ephv.cut Ephy.cut Epig.cut Eplafa.cut Epolyomaa2.cut Epombe.cut Epombecai.cut Epot.cut Eppu.cut Eprovu.cut Epse.cut Epseae.cut Epsepu.cut Epsesm.cut Epsy.cut Epvu.cut Erab.cut Erabbit.cut Erabit.cut Erabsp.cut Erat.cut Eratsp.cut Erca.cut Erhile.cut Erhime.cut Erhm.cut Erhoca.cut Erhosh.cut Eric.cut Erle.cut Erme.cut Ersp.cut Esalsa.cut Esalsp.cut Esalty.cut Esau.cut Eschma.cut Eschpo.cut Eschpo_cai.cut Eschpo_high.cut Esco.cut Eserma.cut Esgi.cut Esheep.cut Eshp.cut Eshpsp.cut Esli.cut Eslm.cut Esma.cut Esmi.cut Esmu.cut Esoltu.cut Esoy.cut Esoybn.cut Espi.cut Espiol.cut Espn.cut Espo.cut Espo_h.cut Espu.cut Esta.cut Estaau.cut Estrco.cut Estrmu.cut Estrpn.cut Estrpu.cut Esty.cut Esus.cut Esv40.cut Esyhsp.cut Esynco.cut Esyncy.cut Esynsp.cut Etbr.cut Etcr.cut Eter.cut Etetsp.cut Etetth.cut Etheth.cut Etob.cut Etobac.cut Etobac_chl.cut Etobcp.cut Etom.cut Etrb.cut Etrybr.cut Etrycr.cut Evco.cut Evibch.cut Ewheat.cut Ewht.cut Exel.cut Exenla.cut Exenopus.cut Eyeast.cut Eyeast_cai.cut Eyeast_high.cut Eyeast_mit.cut Eyeastcai.cut Eyen.cut Eyeren.cut Eyerpe.cut Eysc.cut Eysc_h.cut Eyscmt.cut Eysp.cut Ezebrafish.cut Ezma.cut ("", " -first=" + str(value))[value is not None and value!=vdef] 1 First codon usage file e_second second option Choice mobyle_null mobyle_null Eacc.cut Eacica.cut Eadenovirus5.cut Eadenovirus7.cut Eagrtu.cut Eaidlav.cut Eanasp.cut Eani.cut Eani_h.cut Eanidmit.cut Earath.cut Easn.cut Eath.cut Eatu.cut Eavi.cut Eazovi.cut Ebacme.cut Ebacst.cut Ebacsu.cut Ebacsu_high.cut Ebja.cut Ebly.cut Ebme.cut Ebmo.cut Ebna.cut Ebommo.cut Ebov.cut Ebovin.cut Ebovsp.cut Ebpphx.cut Ebraja.cut Ebrana.cut Ebrare.cut Ebst.cut Ebsu.cut Ebsu_h.cut Ecac.cut Ecaeel.cut Ecal.cut Ecanal.cut Ecanfa.cut Ecaucr.cut Eccr.cut Ecel.cut Echi.cut Echick.cut Echicken.cut Echisp.cut Echk.cut Echlre.cut Echltr.cut Echmp.cut Echnt.cut Echos.cut Echzm.cut Echzmrubp.cut Ecloab.cut Ecpx.cut Ecre.cut Ecrigr.cut Ecrisp.cut Ectr.cut Ecyapa.cut Edayhoff.cut Eddi.cut Eddi_h.cut Edicdi.cut Edicdi_high.cut Edog.cut Edro.cut Edro_h.cut Edrome.cut Edrome_high.cut Edrosophila.cut Eeca.cut Eeco.cut Eeco_h.cut Eecoli.cut Eecoli_high.cut Eemeni.cut Eemeni_high.cut Eemeni_mit.cut Eerwct.cut Ef1.cut Efish.cut Efmdvpolyp.cut Ehaein.cut Ehalma.cut Ehalsa.cut Eham.cut Ehha.cut Ehin.cut Ehma.cut Ehorvu.cut Ehum.cut Ehuman.cut Ekla.cut Eklepn.cut Eklula.cut Ekpn.cut Elacdl.cut Ella.cut Elyces.cut Emac.cut Emacfa.cut Emaize.cut Emaize_chl.cut Emam_h.cut Emammal_high.cut Emanse.cut Emarpo_chl.cut Emedsa.cut Emetth.cut Emixlg.cut Emouse.cut Emsa.cut Emse.cut Emta.cut Emtu.cut Emus.cut Emussp.cut Emva.cut Emyctu.cut Emze.cut Emzecp.cut Encr.cut Eneigo.cut Eneu.cut Eneucr.cut Engo.cut Eoncmy.cut Eoncsp.cut Eorysa.cut Eorysa_chl.cut Epae.cut Epea.cut Epet.cut Epethy.cut Epfa.cut Ephavu.cut Ephix174.cut Ephv.cut Ephy.cut Epig.cut Eplafa.cut Epolyomaa2.cut Epombe.cut Epombecai.cut Epot.cut Eppu.cut Eprovu.cut Epse.cut Epseae.cut Epsepu.cut Epsesm.cut Epsy.cut Epvu.cut Erab.cut Erabbit.cut Erabit.cut Erabsp.cut Erat.cut Eratsp.cut Erca.cut Erhile.cut Erhime.cut Erhm.cut Erhoca.cut Erhosh.cut Eric.cut Erle.cut Erme.cut Ersp.cut Esalsa.cut Esalsp.cut Esalty.cut Esau.cut Eschma.cut Eschpo.cut Eschpo_cai.cut Eschpo_high.cut Esco.cut Eserma.cut Esgi.cut Esheep.cut Eshp.cut Eshpsp.cut Esli.cut Eslm.cut Esma.cut Esmi.cut Esmu.cut Esoltu.cut Esoy.cut Esoybn.cut Espi.cut Espiol.cut Espn.cut Espo.cut Espo_h.cut Espu.cut Esta.cut Estaau.cut Estrco.cut Estrmu.cut Estrpn.cut Estrpu.cut Esty.cut Esus.cut Esv40.cut Esyhsp.cut Esynco.cut Esyncy.cut Esynsp.cut Etbr.cut Etcr.cut Eter.cut Etetsp.cut Etetth.cut Etheth.cut Etob.cut Etobac.cut Etobac_chl.cut Etobcp.cut Etom.cut Etrb.cut Etrybr.cut Etrycr.cut Evco.cut Evibch.cut Ewheat.cut Ewht.cut Exel.cut Exenla.cut Exenopus.cut Eyeast.cut Eyeast_cai.cut Eyeast_high.cut Eyeast_mit.cut Eyeastcai.cut Eyen.cut Eyeren.cut Eyerpe.cut Eysc.cut Eysc_h.cut Eyscmt.cut Eysp.cut Ezebrafish.cut Ezma.cut ("", " -second=" + str(value))[value is not None and value!=vdef] 2 Second codon usage file for comparison e_output Output section e_outfile Name of the output file (e_outfile) Filename codcmp.e_outfile ("" , " -outfile=" + str(value))[value is not None] 3 e_outfile_out outfile_out option CodcmpReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 4 Programs-5.1.1/emowse.xml0000644000175000001560000002540312072525233014200 0ustar bneronsis emowse EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net emowse Search protein sequences by digest fragment molecular weight http://bioweb2.pasteur.fr/docs/EMBOSS/emowse.html http://emboss.sourceforge.net/docs/themes sequence:protein:composition emowse e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_infile Peptide molecular weight values file PeptideMolweights AbstractText ("", " -infile=" + str(value))[value is not None] 2 e_mwdata Molecular weights data file MolecularWeights AbstractText ("", " -mwdata=" + str(value))[value is not None ] 3 e_frequencies Amino acid frequencies data file Protein AminoAcidFrequencies AbstractText ("", " -frequencies=" + str(value))[value is not None ] 4 e_required Required section e_weight Whole sequence molwt Integer 0 ("", " -weight=" + str(value))[value is not None and value!=vdef] 5 e_mono Use monoisotopic weights Boolean 0 ("", " -mono")[ bool(value) ] 6 e_advanced Advanced section e_enzyme Enzymes and reagents Choice 1 1 2 3 4 5 6 7 8 ("", " -enzyme=" + str(value))[value is not None and value!=vdef] 7 e_pcrange Allowed whole sequence weight variability (value from 0 to 75) Integer 25 ("", " -pcrange=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0 is required value >= 0 Value less than or equal to 75 is required value <= 75 8 e_tolerance Tolerance (value from 0.1 to 1.0) Float 0.1 ("", " -tolerance=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.1 is required value >= 0.1 Value less than or equal to 1.0 is required value <= 1.0 9 e_partials Partials factor (value from 0.1 to 1.0) Float 0.4 ("", " -partials=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 0.1 is required value >= 0.1 Value less than or equal to 1.0 is required value <= 1.0 10 e_output Output section e_outfile Name of the output file (e_outfile) Filename emowse.e_outfile ("" , " -outfile=" + str(value))[value is not None] 11 e_outfile_out outfile_out option EmowseReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 12 Programs-5.1.1/megamerger.xml0000644000175000001560000002226712072525233015021 0ustar bneronsis megamerger EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net megamerger Merge two large overlapping DNA sequences http://bioweb2.pasteur.fr/docs/EMBOSS/megamerger.html http://emboss.sourceforge.net/docs/themes alignment:consensus megamerger e_input Input section e_asequence asequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -asequence=" + str(value))[value is not None] 1 e_bsequence bsequence option DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -bsequence=" + str(value))[value is not None] 2 e_required Required section e_wordsize Word size (value greater than or equal to 2) Integer 20 ("", " -wordsize=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 2 is required value >= 2 3 e_additional Additional section e_prefer Use the first sequence when there is a mismatch Boolean 0 ("", " -prefer")[ bool(value) ] 4 When a mismatch between the two sequence is discovered, one or other of the two sequences must be used to create the merged sequence over this mismatch region. The default action is to create the merged sequence using the sequence where the mismatch is closest to that sequence's centre. If this option is used, then the first sequence (seqa) will always be used in preference to the other sequence when there is a mismatch. e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename outseq.merged ("" , " -outseq=" + str(value))[value is not None] 5 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 6 e_outseq_out outseq_out option Sequence e_outseq e_outfile Name of the output file (e_outfile) Filename outfile.megamerger ("" , " -outfile=" + str(value))[value is not None] 7 e_outfile_out outfile_out option MegamergerReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 8 Programs-5.1.1/mafft.xml0000644000175000001560000012557311772051771014016 0ustar bneronsis mafft 6.849 mafft Multiple alignment program for amino acid or nucleotide sequences. http://mafft.cbrc.jp/alignment/software/source.html http://mafft.cbrc.jp/alignment/software/ Kazutaka Katoh Katoh, Toh 2010 (Bioinformatics 26:1899-1900) Parallelization of the MAFFT multiple sequence alignment program. (describes the multithread version; Linux only) Katoh, Asimenos, Toh 2009 (Methods in Molecular Biology 537:39-64) Multiple Alignment of DNA Sequences with MAFFT. In Bioinformatics for DNA Sequence Analysis edited by D. Posada (outlines DNA alignment methods and several tips including group-to-group alignment and rough clustering of a large number of sequences) Katoh, Toh 2008 (BMC Bioinformatics 9:212) Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework. (describes RNA structural alignment methods) Katoh, Toh 2008 (Briefings in Bioinformatics 9:286-298) Recent developments in the MAFFT multiple sequence alignment program. (outlines version 6; Fast Breaking Paper in Thomson Reuters' ScienceWatch) Katoh, Toh 2007 (Bioinformatics 23:372-374) Errata PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences. (describes the PartTree algorithm) Katoh, Kuma, Toh, Miyata 2005 (Nucleic Acids Res. 33:511-518) MAFFT version 5: improvement in accuracy of multiple sequence alignment. (describes [ancestral versions of] the G-INS-i, L-INS-i and E-INS-i strategies) Katoh, Misawa, Kuma, Miyata 2002 (Nucleic Acids Res. 30:3059-3066) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. (describes the FFT-NS-1, FFT-NS-2 and FFT-NS-i strategies) http://mafft.cbrc.jp/alignment/software/about.html alignment:multiple input_opt Input Options sequences Sequences File ( a file containing several sequences ). Sequence FASTA " $sequences" " " + str( sequences ) 1000 seq_type Sequences type Choice null null '' '' nuc " --nuc " " --nuc " amino " --amino " " --amino " seed Use structural alignment(s) seed_1 Structural alignment 1 Alignment FASTA (defined $value)? " --seed $value ": "" ( "" , " --seed "+str(value))[value is not None] These sequences will be aligned with the 'input' sequences above, being used as a constraint. seed_2 Structural alignment 1 Alignment FASTA (defined $value)? " --seed $value ": "" ( "" , " --seed "+str(value))[value is not None] These sequences will be aligned with the 'input' sequences above, being used as a constraint. seed_3 Structural alignment 1 Alignment FASTA (defined $value)? " --seed $value ": "" ( "" , " --seed "+str(value))[value is not None] These sequences will be aligned with the 'input' sequences above, being used as a constraint. Seed alignments given in alignment (fasta format) are aligned with sequences in input. The alignment within every seed is preserved. anysymbol Allow unusual symbols (Selenocysteine "U", Inosine "i", non-alphabetical characters, etc.) Boolean 0 ( value )? "" : " --anysymbol " ( "" , " --anysymbol ")[ value ]

If there are unusual characters (e.g., U as selenocysteine in protein sequence), use the --anysymbol option.

It accepts any printable characters (U, O, #, $, %, etc.; 0x21-0x7e in the ASCII code), execpt for > (0x3e). They are scored equivalently to X. Gap is - (0x2d), as in the default mode.

output_opt Output Options output_format Output format: Choice FASTA FASTA '' '' CLUSTAL ' --clustalout ' ' --clustalout ' PHYLIP ' --phylipout ' ' --phylipout ' out_order Output order: Choice reorder inputorder reorder ( value eq 'reorder')? " --reorder " : "" ( '' , ' --reorder ' )[ value == 'reorder' ] advanced_settings Advanced settings strategy Strategy: Choice auto auto "mafft --auto" "mafft --auto" fftns1 "mafft-fftns --retree 1 " "mafft-fftns --retree 1 " fftns2 "mafft-fftns " "mafft-fftns " fftnsi2 "mafft-fftnsi " "mafft-fftnsi " fftnsi1000 "mafft-fftnsi --maxiterate 1000 " "mafft-fftnsi --maxiterate 1000 " einsi "mafft-einsi " "mafft-einsi " linsi "mafft-linsi " "mafft-linsi " ginsi "mafft-ginsi " "mafft-ginsi " qinsi "mafft-qinsi " "mafft-qinsi "

Algorithms and parameters (unfinished)

MAFFT offers various multiple alignment strategies. They are classified into three types, (a) the progressive method, (b) the iterative refinement method with the WSP score, and (c) the iterative refinment method using both the WSP and consistency scores. In general, there is a tradeoff between speed and accuracy. The order of speed is a > b > c, whereas the order of accuracy is a < b < c. The results of benchmarks can be seen here. The following are the detailed procedures for the major options of MAFFT.

(a) FFT-NS-1, FFT-NS-2 — Progressive methods

prog.png
These are simple progressive methods like ClustalW. By using the several new techniques described below, these options can align a large number of sequences (up to âĽ5,000) on a standard desktop computer. The qualities of the resulting alignments are shown here. The detailed algorithms are described in Katoh et al. (2002).
  • FFT-NS-1
    mafft --retree 1 input_file > output_file
    or
    fftns --retree 1 input_file > output_file
    is the simplest progressive option in MAFFT and one of the fastest methods currently available. The procedure is: (1) make a rough distance matrix by counting the number of shared 6-tuples (see below) between every sequence pair, (2) build a guide tree and (3) align the sequences according to the branching order.

  • FFT-NS-2
    mafft --retree 2 input_file > output_file
    or
    fftns input_file > output_file
    The distance matrix used in FFT-NS-1 is very approximate and unreliable. In FFT-NS-2, (4) the guide tree is re-computed from the FFT-NS-1 alignment, and (5) the second progressive alignment is carried out.
The following techniques are used to improve the performance.

FFT approximation. (Not yet written) See Katoh et al. (2002).

k -mer counting. To accelerate the initial calculation of the distance matrix, which requires a CPU time of O ( N 2 ) steps, a rough method similar to the 'quicktree' option of ClustalW is adopted, in which the number of k -mers shared by a pair of sequences is counted and regarded as an approximation of the degree of similarity. MAFFT uses the very rapid method proposed by Jones et al. (1992) with a minor modification (Katoh et al. 2002): (1) The 20 amino acids are compressed to 6 alphabets, according to Dayhoff et al. (1978), and (2) MAFFT performs the second progressive alignment (FFT-NS-2) in order to improve the accuracy.

Modified UPGMA. A modified version of UPGMA is used to construct a guide tree, which works well for handling fragment sequences.

The second progressive alignment. The accuracy of the second progressive alignment (FFT-NS-2) is slightly higher than that of the first progressive alignment (FFT-NS-1) according to the BAliBASE test , but the amount CPU time required by FFT-NS-2 is approximately two times longer than that by FFT-NS-1.

(b) FFT-NS-i, NW-NS-i — Iterative refinement method

iter.png
The accuracy of progressive alignment can be improved by the iterative refinement method (Berger and Munson 1991, Gotoh 1993). A simplified version of PRRN is implemented as the FFT-NS-i option of MAFFT. In FFT-NS-i, an initial alignment by FFT-NS-2 is subjected to an iterative refienment process.
  • FFT-NS-i (max. 1,000 cycles)
    mafft --maxiterate 1000 input_file > output_file
    or
    fftnsi --maxiterate 1000 input_file > output_file
    The iterative refinement is repeated until no more improvement in the WSP score is made or the number of cycles reaches 1,000.

  • FFT-NS-i (max. 2 cycles)
    mafft --maxiterate 1000 input_file > output_file
    or
    fftnsi input_file > output_file
    As most of the quality of improvement is obtained in the early stage of the iteration, this option is also useful (default of the fftnsi script).

Objective function. The weighted sum-of-pairs (WSP) score proposed by Gotoh is used.

Tree-dependent partitioning. (Not yet written) See Hirosawa et al.

Effect of FFT. To test the effect of the FFT approximation, we also implemented the NW-NS-x options, in which the FFT approximation is disabled, but the other procedures are the same as those in the corresponding FFT-NS-x. There was no significant reduction in the accuracy by introducing the FFT approximation (Katoh et al. 2002).

(c) L-INS-i, E-INS-i, G-INS-i — Iterative refinement methods using WSP and consistency scores

cons.png
In order to obtain more accurate alignments in extremely difficult cases, three new options, L-INS-i, G-INS-i and E-INS-i, have been added to recent versions (v.≥5) of MAFFT. These options use a new objective function combining the WSP score (Gotoh) explained above and the COFFEE-like score (Notredame et al.), which evaluates the consistency between a multiple alignment and pairwise alignments (Katoh et al. 2005).

For pairwise alignment, three different types of algorithms are implemented, global alignment (Needleman-Wunsch), local alignment (Smith-Waterman) with affine gap costs (Gotoh) and local alignment with generalized affine gap costs (Altschul). The differences in the accuracy values among these methods are small for the currently available benchmarks, as shown here . However, each of them has different characteristics, according to the algorithm in the pairwise alignment stage:

  • E-INS-i
    mafft --genafpair --maxiterate 1000 input_file > output_file
    or
    einsi input_file > output_file
    is suitable for alignments like this:
     oooooooooXXX------XXXX---------------------------------XXXXXXXXXXX-XXXXXXXXXXXXXXXooooooooooooo
     ---------XXXXXXXXXXXXXooo------------------------------XXXXXXXXXXXXXXXXXX-XXXXXXXX-------------
     -----ooooXXXXXX---XXXXooooooooooo----------------------XXXXX----XXXXXXXXXXXXXXXXXXooooooooooooo
     ---------XXXXX----XXXXoooooooooooooooooooooooooooooooooXXXXX-XXXXXXXXXXXX--XXXXXXX-------------
     ---------XXXXX----XXXX---------------------------------XXXXX---XXXXXXXXXX--XXXXXXXooooo--------
                                      
    where ' X 's indicate alignable residues, ' o 's indicate unalignable residues and ' - 's indicate gaps. Unalignable residues are left unaligned at the pairwise alignment stage, because of the use of the generalized affine gap cost. Therefore E-INS-i is applicable to a difficult problem such as RNA polymerase, which has several conserved motifs embedded in long unalignable regions. As E-INS-i has the minimum assumption of the three methods, this is recommended if the nature of sequences to be aligned is not clear. Note that E-INS-i assumes that the arrangement of the conserved motifs is shared by all sequences.
  • L-INS-i
    mafft --localpair --maxiterate 1000 input_file > output_file
    or
    linsi input_file > output_file
    is suitable to:
     ooooooooooooooooooooooooooooooooXXXXXXXXXXX-XXXXXXXXXXXXXXX------------------
     --------------------------------XX-XXXXXXXXXXXXXXX-XXXXXXXXooooooooooo-------
     ------------------ooooooooooooooXXXXX----XXXXXXXX---XXXXXXXooooooooooo-------
     --------ooooooooooooooooooooooooXXXXX-XXXXXXXXXX----XXXXXXXoooooooooooooooooo
     --------------------------------XXXXXXXXXXXXXXXX----XXXXXXX------------------
                              
    L-INS-i can align a set of sequences containing sequences flanking around one alignable domain. Flanking sequences are ignored in the pairwise alignment by the Smith-Waterman algorithm. Note that the input sequences are assumed to have only one alignable domain. In benchmark tests, the ref4 of BAliBASE corresponds to this. The other categories of BAliBASE also correspond to similar situations, because they have flanking sequences. L-INS-i also shows higher accuracy values for a part of SABmark and HOMSTRAD than G-INS-i, but we have not identified the reason for this.
  • G-INS-i
    mafft --globalpair --maxiterate 1000 input_file > output_file
    or
    ginsi input_file > output_file
    is suitable to:
     XXXXXXXXXXX-XXXXXXXXXXXXXXX
     XX-XXXXXXXXXXXXXXX-XXXXXXXX
     XXXXX----XXXXXXXX---XXXXXXX
     XXXXX-XXXXXXXXXX----XXXXXXX
     XXXXXXXXXXXXXXXX----XXXXXXX
                              
    G-INS-i assumes that entire region can be aligned and tries to align them globally using the Needleman-Wunsch algorithm; that is, a set of sequences of one domain must be extracted by truncating flanking sequences. In benchmark tests, SABmark and HOMSTRAD correspond to this.

Consistency score. The COFFEE objective function was originally proposed by Notredame et al. (1998), and the extended versions are used in TCoffee and ProbCons. MAFFT also adopts a similar objective function, as described in Katoh et al. (2005). However, the consistency among three sequences (called 'library extension' in TCoffee) is currently not calculated in MAFFT, because the improvement in accuracy by library extension was limited to alignments consisting of a small number (<10) of sequences in our preliminary tests. If library extention is needed, then please use TCoffee or ProbCons.

Consistency + WSP. Instead, the WSP score is summed with the consistency score in the objective function of MAFFT. The use of the WSP score has the merit that a pattern of gaps can be incorporated into the objective function. This is probably the reason why MAFFT achieves higher accuracy than ProbCons and TCoffee for alignments consisting of many (âĽ10 - âĽ100) sequences. This suggests that the pattern of gaps within a group to be aligned is important information when aligning two groups of proteins (and evaluating homology between distantly related protein families).

0
amino_scm Scoring matrix for amino acid sequences: Choice BLOSUM62 BLOSUM30 " --bl 30 " " --bl 30 " BLOSUM45 " --bl 45 " " --bl 45 " BLOSUM62 "" "" BLOSUM80 " --bl 80 " " --bl 80 " JT100 " --jtt 100 " " --jtt 100 " JT200 " --jtt 200 " " --jtt 200 " The BLOSUM62 matrix is adopted as a default scoring matrix, because this showed slightly higher accuracy values than the BLOSUM80, 45, JTT200PAM, 100PAM and Gonnet matrices in SABmark tests. nuc_scm Scoring matrix for nucleotide sequences: Choice 200 200 20 1 ( defined $value and $value ne $vdef )" --kimura $value " : "" ( "" , " --kimura "+str( value ) )[ value is not None and value!= vdef ]

Switch it to '1PAM / Îş=2' when aligning closely related DNA sequences.

The default scoring matrix is derived from Kimura's two-parameter model. The ratio of transitions to transversions is set at 2 by default. Other parameters can be used, but have not yet been tested.

gap_open_penalty Gap opening penalty (1.0 - 3.0): Float 1.53 (defined $value and $value != $vdef)? " --op $value " : "" ( "" , " --op "+str( value ) )[ value is not None and value != vdef ] You must provide a value between 1.0 < value < 3.0 1.0 < $value < 3.0 1.0 < value < 3.0

Gap penalties for proteins. The default gap penalties for amino acid alignments have been changed in v.4.0. Note that the current version of MAFFT returns an entirely different alignment from v.<4.0. In v.4.0, two major gap penalties (--op [gap open penalty] and --ep [offset value, which functions like a gap extension penalty, see the mafft3 paper for definition]) were tuned by applying the FFT-NS-2 option to a part of the SABmark benchmark. We adopted the parameter set (--op 1.53 --ep 0.123) optimized for SABmark, because this works better for other benchmark (HOMSTRAD, PREFAB and BAliBASE) tests than the previous one (--ep 2.4 --ep 0.06). Other parameters might work better in other situations. Consistency-based options have more parameters (L-INS-i has four more parameters and E-INS-i has six more parameters). We determined these additional parameters so that the Smith-Waterman alignment function used in L-INS-i returns a local alignment similar to that generated by FASTA, but we have not closely tuned them yet. In our tests using SABmark, the accuracy values can be improved by 2-3% by tuning these parameters, but this improvement may result from overfitting.

Gap penalties for RNAs. The default gap penalties for nucleotide alignment have changed in v.5.6. Note that the current version of MAFFT returns an entirely different alignment from v.<5.6. In the former versions (v.<5.6), the default gap penalties for nucleotide alignments were set at the same values as those for amino acid alignments. According to BRAliBASE , these penalties result in very bad alignments for RNAs. The newer versions (v.≥5.6) use a different penalties for nucleotide alignment; the penalty values are set to three times larger than those for amino acids. This is not yet the optimal value for BRAliBASE. The BRAliBASE score can be improved by closely tuning the penalty values, but we have not adopted the optimized penalties, because we are not sure whether they are applicable to a wide range of problems.

offset Offset value (0.0 - 1.0): Float 0.0 (defined $value and $value != $vdef)? " --ep $value " : "" ( "" , " --ep "+str( value ) )[ value is not None and value != 0.123 ] You must provide a value between 0.0 < value < 1.0 (default 0.123) 0.0 < $value < 1.0 0.0 < value < 1.0

If long gaps are not expected, set it as 0.1 or larger value.

result Alignment file Alignment "mafft.out" "mafft.out"
Programs-5.1.1/pepwheel.xml0000644000175000001560000003212612072525233014512 0ustar bneronsis pepwheel EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net pepwheel Draw a helical wheel diagram for a protein sequence http://bioweb2.pasteur.fr/docs/EMBOSS/pepwheel.html http://emboss.sourceforge.net/docs/themes display:protein:2D_structure structure:2D_structure pepwheel e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_output Output section e_wheel Plot the wheel Boolean 1 (" -nowheel", "")[ bool(value) ] 2 e_steps Number of steps (value from 2 to 100) Integer 18 ("", " -steps=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 2 is required value >= 2 Value less than or equal to 100 is required value <= 100 3 The number of residues plotted per turn is this value divided by the 'turns' value. e_turns Number of turns (value from 1 to 100) Integer 5 ("", " -turns=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 Value less than or equal to 100 is required value <= 100 4 The number of residues plotted per turn is the 'steps' value divided by this value. e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 5 e_goutfile Name of the output graph Filename pepwheel_graph ("" , " -goutfile=" + str(value))[value is not None] 6 outgraph_png Graph file Picture Binary e_graph == "png" "*.png" outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" outgraph_data Graph file Text e_graph == "data" "*.dat" e_markupsection Markup section e_amphipathic Prompt for amphipathic residue marking Boolean 0 ("", " -amphipathic")[ bool(value) ] 7 If this is true then the residues ACFGILMVWY are marked as squares and all other residues are unmarked. This overrides any other markup that you may have specified using the qualifiers '-squares', '-diamonds' and '-octags'. e_squares Mark as squares String not e_amphipathic ILVM ("", " -squares=" + str(value))[value is not None and value!=vdef] 8 By default the aliphatic residues ILVM are marked with squares. e_diamonds Mark as diamonds String not e_amphipathic DENQST ("", " -diamonds=" + str(value))[value is not None and value!=vdef] 9 By default the residues DENQST are marked with diamonds. e_octags Mark as octagons String not e_amphipathic HKR ("", " -octags=" + str(value))[value is not None and value!=vdef] 10 By default the positively charged residues HKR are marked with octagons. auto Turn off any prompting String " -auto -stdout" 11 Programs-5.1.1/Entities/0000755000175000001560000000000012175673303013745 5ustar bneronsisPrograms-5.1.1/Entities/ClustalO_package.xml0000644000175000001560000000523012105216576017666 0ustar bneronsis Clustal Omega 1.1.0 Clustal-omega CLUSTAL-OMEGA is a general purpose multiple sequence alignment program. Fabian Sievers, Andreas Wilm, David Dineen and Des Higgins http://www.clustal.org/#Download http://www.clustal.org Clustal-Omega is a general purpose multiple sequence alignment (MSA) program for proteins. It produces high quality MSAs and is capable of handling data-sets of hundreds of thousands of sequences in reasonable time. In default mode, users give a file of sequences to be aligned and these are clustered to produce a guide tree and this is used to guide a "progressive alignment" of the sequences. There are also facilities for aligning existing alignments to each other, aligning a sequence to an alignment and for using a hidden Markov model (HMM) to help guide an alignment of new sequences that are homologous to the sequences used to make the HMM. This latter procedure is referred to as "external profile alignment" or EPA. Clustal-Omega uses HMMs for the alignment engine, based on the HHalign package from Johannes Soeding [1]. Guide trees are made using an enhanced version of mBed [2] which can cluster very large numbers of sequences in O(N*log(N)) time. Multiple alignment then proceeds by aligning larger and larger alignments using HHalign, following the clustering given by the guide tree. In its current form Clustal-Omega can only align protein sequences but not DNA/RNA sequences. DNA/RNA support has been added since version 1.1.0. Molecular Systems Biology 7 Article number: 539 Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega [1] Johannes Soding (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21 (7): 951–960. [2] Blackshields G, Sievers F, Shi W, Wilm A, Higgins DG. Sequence embedding for fast construction of guide trees for multiple sequence alignment. Algorithms Mol Biol. 2010 May 14;5:21. Programs-5.1.1/Entities/penncnv_scripts_package.xml0000644000175000001560000000020111441651470021345 0ustar bneronsis penncnv_scripts 1.0 penncnv_scripts Programs-5.1.1/Entities/ClustalW_package.xml0000644000175000001560000000131511441651470017674 0ustar bneronsis ClustalW 2.0.12 ClustalW ClustalW is a general purpose multiple alignment program for DNA or proteins. Des Higgins Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22:4673-4680. http://www.clustal.org/ http://www.clustal.org/download/current/ Programs-5.1.1/Entities/squizz_package.xml0000644000175000001560000000046212105210041017464 0ustar bneronsis squizz 0.99b squizz Sequence/Alignment format checker/converter. ftp://ftp.pasteur.fr/pub/gensoft/projects/squizz/ Programs-5.1.1/Entities/ViennaRNA_package.xml0000644000175000001560000000106611441651470017722 0ustar bneronsis ViennaRNA 1.8.4 ViennaRNA RNA Secondary Structure Prediction and Comparison. Ivo L Hofacker http://www.tbi.univie.ac.at/RNA/ http://www.tbi.univie.ac.at/RNA/ http://www.tbi.univie.ac.at/RNA/ http://bioweb2.pasteur.fr/gensoft/sequence/nucleic/2D_structure.html#ViennaRNa Programs-5.1.1/Entities/blast_package.xml0000644000175000001560000000077011441651470017247 0ustar bneronsis blast 2.2.21 blast The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. http://www.ncbi.nlm.nih.gov/BLAST/ http://www.ncbi.nlm.nih.gov/BLAST/download.shtml Altschul, Madden, Schaeffer, Zhang, Miller, Lipman Programs-5.1.1/Entities/cbs_package.xml0000644000175000001560000000035711533171506016711 0ustar bneronsis CBS CBS tools CBS prediction tools. http://www.cbs.dtu.dk/services/ Programs-5.1.1/Entities/phylip_package.xml0000644000175000001560000000146711441651470017453 0ustar bneronsis phylip 3.67 phylip PHYLIP is a package of programs for inferring phylogenies. http://evolution.gs.washington.edu/phylip.html http://evolution.gs.washington.edu/phylip/getme.html Felsenstein Joe Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Department of Genetics, University of Washington, Seattle. Felsenstein, J. 1989. PHYLIP -- Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166. http://bioweb2.pasteur.fr/docs/phylip/phylip.html Programs-5.1.1/Entities/hmmer_package.xml0000644000175000001560000000151011473157672017255 0ustar bneronsis hmmer 3.0 hmmer HMMER is an implementation of profile HMM methods for sensitive database searches using multiple sequence alignments as queries. http://hmmer.janelia.org/ ftp://selab.janelia.org/pub/software/hmmer3/ S. Eddy Eddy, S. R. (1998). Profile hidden Markov models. Bioinformatics, 14:755-763. Eddy, S. R. (2008). A probabilistic model of local sequence alignment that simplifies statistical significance estimation. PLoS Comput. Biol., 4:e1000069 http://bioweb2.pasteur.fr/docs/hmmer/Userguide.pdf Programs-5.1.1/Entities/penncnv_package.xml0000644000175000001560000000256711441651470017617 0ustar bneronsis penncnv 2009.08.27 penncnv CNV detection from Illumina whole-genome SNP genotyping arrays. It has been extended to handle candidate gene SNP arrays, to handle recent high-density arrays with non-polymorphic markers (so-called CN markers), and to handle Affymetrix genome-wide arrays. Wang K Wang K, Li M, Hadley D, Liu R, Glessner J, Grant S, Hakonarson H, Bucan M. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data Genome Research 17:1665-1674, 2007 Diskin SJ, Li M, Hou C, Yang S, Glessner J, Hakonarson H, Bucan M, Maris JM, Wang K. Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms Nucleic Acids Research 36:e126, 2008 Wang K, Chen Z, Tadesse MG, Glessner J, Grant SFA, Hakonarson H, Bucan M, Li M. Modeling genetic inheritance of copy number variations Nucleic Acids Research 36:e138, 2008 http://www.openbioinformatics.org/penncnv/ http://www.openbioinformatics.org/penncnv/penncnv_download.html Programs-5.1.1/Entities/bank_id.xml0000644000175000001560000000164511672707410016062 0ustar bneronsis

You must provide a list of identifier in USA format.
databank:Acc
( one item per line )

List of bank available (name: description)

  • embl: EMBL Nucleotide Sequence Database
  • genbank: Genbank NIH DNA sequence database
  • imgt: MGT - ImMunoGeneTics sequence database
  • rdpii: RDPII - Ribosomal Database Project II database
  • refseq: NCBI Reference Sequence (RefSeq) Database
  • genpept: GenBank gene products
  • uniprot: Universal Protein Resource
Programs-5.1.1/rnaduplex.xml0000644000175000001560000001546311672710655014721 0ustar bneronsis rnaduplex RNAduplex Compute the structure upon hybridization of two RNA strands Ivo Hofacker RNAduplex reads two RNA sequences from file and computes optimal and suboptimal secondary structures for their hybridization. The calculation is simplified by allowing only inter-molecular base pairs. sequence:nucleic:2D_structure structure:2D_structure RNAduplex seq RNA Sequence File DNA Sequence FASTA " < $value" " < "+ str(value) 1000 control Control options 2 suboptimal Suboptimal structures (-e) Integer 0 (defined $value and $value!=$vdef)? " -e $value" : "" ( "" , " -e " + str(value))[ value is not None and value != vdef ] Compute suboptimal structures with energy with range kcal/mol of the optimum. Default is calculation of mfe structure only. temperature Rescale energy parameters to a temperature of temp C. (-T) Integer 37 (defined $value and $value != $vdef)? " -T $value" : "" ( "" , " -T " + str(value) )[ value is not None and value != vdef] tetraloops Do not include special stabilizing energies for certain tetraloops (-4) Boolean 0 ($value)? " -4" : "" ( "" , " -4" )[ value ] input Input parameters 2 noGU Do not allow GU pairs (-noGU) Boolean 0 ($value)? " -noGU" : "" ( "" , " -noGU" )[ value ] noCloseGU Do not allow GU pairs at the end of helices (-noCloseGU) Boolean 0 ($value)? " -noCloseGU" : "" ( "" , " -noCloseGU" )[ value ] parameter Energy parameter file (-P) EnergyParameterFile AbstractText (defined $value)? " -P $value" : "" ( "" , " -P " + str(value) )[ value is not None ] Read energy parameters from paramfile, instead of using the default parameter set. A sample parameterfile should accompany your distribution. See the RNAlib documentation for details on the file format. readseq String "readseq -f=19 -a $seq > $seq.tmp && (cp $seq $seq.orig && mv $seq.tmp $seq) ; " "readseq -f=19 -a "+ str(seq) + " > "+ str(seq) +".tmp && (cp "+ str(seq) +" "+ str(seq) +".orig && mv "+ str(seq) +".tmp "+ str(seq) +") ; " -10 psfiles Postscript file PostScript Binary "*.ps" "*.ps" Programs-5.1.1/needle.xml0000644000175000001560000005571112072525233014142 0ustar bneronsis needle EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net needle Needleman-Wunsch global alignment of two sequences http://bioweb2.pasteur.fr/docs/EMBOSS/needle.html http://emboss.sourceforge.net/docs/themes alignment:pairwise:global needle e_input Input section e_asequence asequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -asequence=" + str(value))[value is not None] 1 e_bsequence bsequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -bsequence=" + str(value))[value is not None] 2 e_datafile Matrix file Choice mobyle_null mobyle_null EBLOSUM30 EBLOSUM35 EBLOSUM40 EBLOSUM45 EBLOSUM50 EBLOSUM55 EBLOSUM60 EBLOSUM62 EBLOSUM62-12 EBLOSUM65 EBLOSUM70 EBLOSUM75 EBLOSUM80 EBLOSUM85 EBLOSUM90 EBLOSUMN EDNAFULL EDNAMAT EDNASIMPLE EPAM10 EPAM100 EPAM110 EPAM120 EPAM130 EPAM140 EPAM150 EPAM160 EPAM170 EPAM180 EPAM190 EPAM20 EPAM200 EPAM210 EPAM220 EPAM230 EPAM240 EPAM250 EPAM260 EPAM270 EPAM280 EPAM290 EPAM30 EPAM300 EPAM310 EPAM320 EPAM330 EPAM340 EPAM350 EPAM360 EPAM370 EPAM380 EPAM390 EPAM40 EPAM400 EPAM410 EPAM420 EPAM430 EPAM440 EPAM450 EPAM460 EPAM470 EPAM480 EPAM490 EPAM50 EPAM500 EPAM60 EPAM70 EPAM80 EPAM90 SSSUB ("", " -datafile=" + str(value))[value is not None and value!=vdef] 3 This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. e_required Required section e_gapopen Gap opening penalty (Floating point number from 1.0 to 100.0) Float ("", " -gapopen=" + str(value))[value is not None] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 100.0 is required value <= 100.0 4 The gap open penalty is the score taken away when a gap is created. The best value depends on the choice of comparison matrix. The default value assumes you are using the EBLOSUM62 matrix for protein sequences, and the EDNAFULL matrix for nucleotide sequences. e_gapextend Gap extension penalty (Floating point number from 0.0 to 10.0) Float ("", " -gapextend=" + str(value))[value is not None] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 10.0 is required value <= 10.0 5 The gap extension, penalty is added to the standard gap penalty for each base or residue in the gap. This is how long gaps are penalized. Usually you will expect a few long gaps rather than many short gaps, so the gap extension penalty should be lower than the gap penalty. An exception is where one or both sequences are single reads with possible sequencing errors in which case you would expect many single base gaps. You can get this result by setting the gap open penalty to zero (or very low) and using the gap extension penalty to control gap scoring. e_additional Additional section e_endweight Apply end gap penalties. Boolean 0 ("", " -endweight")[ bool(value) ] 6 e_endopen End gap opening penalty (Floating point number from 1.0 to 100.0) Float ("", " -endopen=" + str(value))[value is not None] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 100.0 is required value <= 100.0 7 The end gap open penalty is the score taken away when an end gap is created. The best value depends on the choice of comparison matrix. The default value assumes you are using the EBLOSUM62 matrix for protein sequences, and the EDNAFULL matrix for nucleotide sequences. e_endextend End gap extension penalty (Floating point number from 0.0 to 10.0) Float ("", " -endextend=" + str(value))[value is not None] Value greater than or equal to 0.0 is required value >= 0.0 Value less than or equal to 10.0 is required value <= 10.0 8 The end gap extension, penalty is added to the end gap penalty for each base or residue in the end gap. This is how long end gaps are penalized. e_output Output section e_brief Brief identity and similarity Boolean 1 (" -nobrief", "")[ bool(value) ] 9 Brief identity and similarity e_outfile Name of the output alignment file Filename needle.align ("" , " -outfile=" + str(value))[value is not None] 10 e_aformat_outfile Choose the alignment output format Choice SRS FASTA MSF PAIR MARKX0 MARKX1 MARKX2 MARKX3 MARKX10 SRS SRSPAIR SCORE UNKNOWN MULTIPLE SIMPLE MATCH ("", " -aformat=" + str(value))[value is not None and value!=vdef] 10 e_outfile_out outfile_out option Alignment e_aformat_outfile in ['FASTA', 'MSF'] e_outfile e_outfile_out2 outfile_out2 option Text e_aformat_outfile in ['PAIR', 'MARKX0', 'MARKX1', 'MARKX2', 'MARKX3', 'MARKX10', 'SRS', 'SRSPAIR', 'SCORE', 'UNKNOWN', 'MULTIPLE', 'SIMPLE', 'MATCH'] e_outfile auto Turn off any prompting String " -auto -stdout" 11 Programs-5.1.1/pepwindow.xml0000644000175000001560000002206612072525233014717 0ustar bneronsis pepwindow EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net pepwindow Draw a hydropathy plot for a protein sequence http://bioweb2.pasteur.fr/docs/EMBOSS/pepwindow.html http://emboss.sourceforge.net/docs/themes sequence:protein:composition pepwindow e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -sequence=" + str(value))[value is not None] 1 e_datafile Aaindex entry data file Protein AaindexData AbstractText ("", " -datafile=" + str(value))[value is not None ] 2 e_additional Additional section e_length Window size (value from 1 to 200) Integer 19 ("", " -length=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 1 is required value >= 1 Value less than or equal to 200 is required value <= 200 3 e_normalize Normalize data values Boolean 0 ("", " -normalize")[ bool(value) ] 4 e_output Output section e_graph Choose the e_graph output format Choice png png gif cps ps meta data (" -graph=" + str(vdef), " -graph=" + str(value))[value is not None and value!=vdef] 5 xy_goutfile Name of the output graph Filename pepwindow_xygraph ("" , " -goutfile=" + str(value))[value is not None] 6 xy_outgraph_png Graph file Picture Binary e_graph == "png" "*.png" xy_outgraph_gif Graph file Picture Binary e_graph == "gif" "*.gif" xy_outgraph_ps Graph file PostScript Binary e_graph == "ps" or e_graph == "cps" "*.ps" xy_outgraph_meta Graph file Picture Binary e_graph == "meta" "*.meta" xy_outgraph_data Graph file Text e_graph == "data" "*.dat" auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/weighbor.xml0000644000175000001560000001171211767572177014530 0ustar bneronsis weighbor 1.2.1 Weighbor Weighted neighbor joining Bruno, Halpern, Socci W. J. Bruno, N. D. Socci, and A. L. Halpern. Weighted Neighbor Joining: A Likelihood-Based Approach to Distance-Based Phylogeny Reconstruction, Mol. Biol. Evol. 17 (1): 189-197 (2000). Weighbor takes an input file of pairwise distances in Phylip format and computes the phylogentic tree that best corresponds to those distances. http://www.is.titech.ac.jp/~shimo/prog/consel/ http://www.is.titech.ac.jp/~shimo/prog/consel/ phylogeny:distance weighbor infile Distances matrix File (-i) PhylipDistanceMatrix AbstractText " -i $value" " -i " + str(value) 1 Length Length of the sequences (-L) Integer (defined $value) ? " -L $value" : "" ( "" , " -L " + str(value) )[ value is not None ] 2 Default is 500. This is the effective sequence length equal to the number of varying sites. Note if the -L option is not used then the program will print a warning message to stderr indicating that it is using this default length. size Size of the alphabet (-b) Integer 4 (defined $value and $value != $vdef) ? " -b $value" : "" ( "" , " -b " + str(value) )[ value is not None and value != vdef] 2 Sets the size of the alphabet of characters (number of bases) b. 1/b is equal to the probability that there will be a match for infinite evolution time. The default value for b is 4. verbose Verbose output (-v) Choice Null Null -v -vv -vvv (defined $value and $value ne $vdef) ? " $value" : "" ( "" , " " + str(value) )[ value is not None and value != vdef] 2 outfile Output file (-o) String " -o weighbor.treefile" " -o weighbor.treefile" 3 treefile Tree output file Tree NEWICK "weighbor.treefile" "weighbor.treefile" Programs-5.1.1/est2genome.xml0000644000175000001560000003475512072525233014763 0ustar bneronsis est2genome EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net est2genome Align EST sequences to genomic DNA sequence http://bioweb2.pasteur.fr/docs/EMBOSS/est2genome.html http://emboss.sourceforge.net/docs/themes alignment:pairwise:global est2genome e_input Input section e_estsequence Spliced est nucleotide sequence(s) DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -estsequence=" + str(value))[value is not None] 1 e_genomesequence Additional section DNA Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,1 ("", " -genomesequence=" + str(value))[value is not None] 2 e_match Score for matching two bases Integer 1 ("", " -match=" + str(value))[value is not None and value!=vdef] 3 e_mismatch Cost for mismatching two bases Integer 1 ("", " -mismatch=" + str(value))[value is not None and value!=vdef] 4 e_gappenalty Gap penalty Integer 2 ("", " -gappenalty=" + str(value))[value is not None and value!=vdef] 5 Cost for deleting a single base in either sequence, excluding introns e_intronpenalty Intron penalty Integer 40 ("", " -intronpenalty=" + str(value))[value is not None and value!=vdef] 6 Cost for an intron, independent of length. e_splicepenalty Splice site penalty Integer 20 ("", " -splicepenalty=" + str(value))[value is not None and value!=vdef] 7 Cost for an intron, independent of length and starting/ending on donor-acceptor sites e_minscore Minimum accepted score Integer 30 ("", " -minscore=" + str(value))[value is not None and value!=vdef] 8 Exclude alignments with scores below this threshold score. e_advanced Advanced section e_reverse Reverse orientation Boolean 0 ("", " -reverse")[ bool(value) ] 9 Reverse the orientation of the EST sequence e_usesplice Use donor and acceptor splice sites Boolean 1 (" -nousesplice", "")[ bool(value) ] 10 Use donor and acceptor splice sites. If you want to ignore donor-acceptor sites then set this to be false. e_mode Comparison mode Choice both both forward reverse ("", " -mode=" + str(value))[value is not None and value!=vdef] 11 This determines the comparison mode. The default value is 'both', in which case both strands of the est are compared assuming a forward gene direction (ie GT/AG splice sites), and the best comparison redone assuming a reversed (CT/AC) gene splicing direction. The other allowed modes are 'forward', when just the forward strand is searched, and 'reverse', ditto for the reverse strand. e_best Print out only best alignment Boolean 1 (" -nobest", "")[ bool(value) ] 12 You can print out all comparisons instead of just the best one by setting this to be false. e_space Space threshold (in megabytes) Float 10.0 ("", " -space=" + str(value))[value is not None and value!=vdef] 13 For linear-space recursion. If product of sequence lengths divided by 4 exceeds this then a divide-and-conquer strategy is used to control the memory requirements. In this way very long sequences can be aligned. If you have a machine with plenty of memory you can raise this parameter (but do not exceed the machine's physical RAM) e_shuffle Shuffle Integer ("", " -shuffle=" + str(value))[value is not None] 14 e_seed Random number seed Integer 20825 ("", " -seed=" + str(value))[value is not None and value!=vdef] 15 e_output Output section e_outfile Name of the output file (e_outfile) Filename est2genome.e_outfile ("" , " -outfile=" + str(value))[value is not None] 16 e_outfile_out outfile_out option Est2genomeReport Report e_outfile e_align Show the alignment Boolean 0 ("", " -align")[ bool(value) ] 17 Show the alignment. The alignment includes the first and last 5 bases of each intron, together with the intron width. The direction of splicing is indicated by angle brackets (forward or reverse) or ???? (unknown). e_width Alignment width Integer 50 ("", " -width=" + str(value))[value is not None and value!=vdef] 18 auto Turn off any prompting String " -auto -stdout" 19 Programs-5.1.1/oddcomp.xml0000644000175000001560000001701112072525233014322 0ustar bneronsis oddcomp EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net oddcomp Identify proteins with specified sequence word composition http://bioweb2.pasteur.fr/docs/EMBOSS/oddcomp.html http://emboss.sourceforge.net/docs/themes sequence:protein:motifs oddcomp e_input Input section e_sequence sequence option Protein Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_infile Program compseq output file CompseqReport Report ("", " -infile=" + str(value))[value is not None] 2 This is a file in the format of the output produced by 'compseq' that is used to set the minimum frequencies of words in this analysis. e_required Required section e_fullwindow Set window size to length of current protein Boolean 0 ("", " -fullwindow")[ bool(value) ] 3 Set this option on (Y) if you want the window size to be set to the length of the current protein. Otherwise, leave this option unset, in which case you'll be prompted for a window size to use. e_window Window size to consider (e.g. 30 aa) (value greater than or equal to 10) Integer not e_fullwindow 30 ("", " -window=" + str(value))[value is not None and value!=vdef] Value greater than or equal to 10 is required value >= 10 4 This is the size of window in which to count. Thus if you want to count frequencies in a 40 aa stretch you should enter 40 here. e_advanced Advanced section e_ignorebz Ignore the amino acids b and z and just count them as 'other' Boolean 1 (" -noignorebz", "")[ bool(value) ] 5 The amino acid code B represents Asparagine or Aspartic acid and the code Z represents Glutamine or Glutamic acid. These are not commonly used codes and you may wish not to count words containing them, just noting them in the count of 'Other' words. e_output Output section e_outfile Name of the output file (e_outfile) Filename outfile.oddcomp ("" , " -outfile=" + str(value))[value is not None] 6 This is the results file. e_outfile_out outfile_out option OddcompReport Report e_outfile auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/bl2seq.xml0000644000175000001560000006376111554246132014104 0ustar bneronsis bl2seq BL2SEQ Comparison between two sequences with Blast (NCBI) alignment:pairwise blast_init Blast initiation String "bl2seq" "bl2seq" 1 bl2seq Blast program (-p) Choice null null blastp blastn blastx tblastn tblastx " -p $value" " -p " + str(value) 2 - Blastp compares amino acid query sequences - Blastn compares nucleotide query sequences - tBlastx compares translated a nucleotide sequence and an amino acid sequence - tBlastn compares an amino acid sequence translated and a nucleotide sequence - tBlastx compares translated nucleotide sequences first_sequence First sequence (-i) Protein DNA Sequence FASTA " -i $value" " -i " + str( value ) 3 first_start_region Start of required region in first query sequence (-I) Integer Location on query sequence first_end_region End of required region in first sequence (-I) Integer defined $first_start_region first_start_region is not None (defined $value) ? " -I \"$first_start_region $value\"" : " -I \"$first_start_region\"" ( ' -I "%s "' % (str(first_start_region)), ' -I "%s %s"' % (str(first_start_region), str(value)))[value is not None] 3 second_sequence Second sequence (-j) Protein DNA Sequence FASTA " -j $value" " -j " + str(value) 4 second_start_region Start of required region in second sequence (-J) Integer 4 second_end_region End of required region in second sequence (-J) Integer defined $second_start_region second_start_region is not None (defined $value) ? " -J \"$second_start_region $value\"" : " -J \"$second_region\"" ( ' -J "%s "' % (str(second_start_region)), ' -J "%s %s"' % (str(second_start_region), str(value)))[value is not None] 4 scoring_opt Scoring options 5 open_a_gap Cost to open a gap (-G) Integer (defined $value) ? " -G $value" : "" ("" , " -G "+str(value) )[value is not None] Default: 5 for blastn; 10 for blastp, blastx and 11 for tblastn extend_a_gap Cost to extend a gap (-E) Integer (defined $value) ? " -E $value" : "" ("" , " -E "+str(value) )[value is not None] Default: 2 for blastn; 1 for blastp, blastx and tblastn Limited values for gap existence and extension are supported for these programs. Existence -- Extension: BLOSUM90 9 -- 2, 8 -- 2, 7 -- 2, 6 -- 2 11 -- 1, 10 -- 1, 9 -- 1 BLOSUM80 25 -- 2, 13 -- 2, 9 -- 2, 8 -- 2, 7 -- 2, 6 -- 2 11 -- 1, 10 -- 1, 9 -- 1 BLOSUM62 11 -- 2, 10 -- 2, 9 -- 2, 8 -- 2, 7 -- 2, 6 -- 2 13 -- 1, 12 -- 1, 11 -- 1, 10 -- 1, 9 -- 1 BLOSUM45 13 -- 3, 12 -- 3, 11 -- 3, 10 -- 3 16 -- 2, 15 -- 2, 14 -- 2, 13 -- 2, 12 -- 2 19 -- 1, 18 -- 1, 17 -- 1, 16 -- 1 PAM30 7 -- 2, 6 -- 2, 5 -- 2 10 -- 1, 9 -- 1, 8 -- 1 PAM70 8 -- 2, 7 -- 2, 6 -- 2 11 -- 1, 10 -- 1, 9 -- 1 scoring_blast Protein penalty (not for blastn) $bl2seq ne "blastn" bl2seq != "blastn" matrix Similarity matrix (-M) Choice BLOSUM62 BLOSUM90 BLOSUM80 BLOSUM62 BLOSUM50 BLOSUM45 PAM30 PAM70 PAM250 (defined $value and $value ne $vdef) ? " -M $value" : "" ("" , " -M "+str(value) )[value is not None and value != vdef] scoring_blastn Blastn penalty $bl2seq eq "blastn" bl2seq == "blastn" mismatch Penalty for a nucleotide mismatch (-q) Integer -3 (defined $value and $value != $vdef) ? " -q $value" : "" ("" , " -q "+str(value) )[value is not None and value != vdef] match Reward for a nucleotide match (-r) Integer 1 (defined $value and $value != $vdef) ? " -r $value" : "" ("" , " -r "+str(value) )[value is not None and value != vdef] filter_opt Filtering and masking options 6 Mask off segments of the query sequence that have low compositional complexity, as determined by the SEG program of Wootton & Federhen (Computers and Chemistry, 1993) or, for BLASTN, by the DUST program of Tatusov and Lipman (in preparation). Filtering can eliminate statistically significant but biologically uninteresting reports from the blast output (e.g., hits against common acidic-, basic- or proline-rich regions), leaving the more biologically interesting regions of the query sequence available for specific matching against database sequences. Filtering is only applied to the query sequence (or its translation products), not to database sequences. Default filtering is DUST for BLASTN, SEG for other programs. It is not unusual for nothing at all to be masked by SEG, when applied to sequences in SWISS-PROT, so filtering should not be expected to always yield an effect. Furthermore, in some cases, sequences are masked in their entirety, indicating that the statistical significance of any matches reported against the unfiltered query sequence should be suspect. filter Filter or Masking query sequence (DUST with blastn, SEG with others) (-F) Boolean 1 ($value) ? "" : " -F F" (" -F F" , "")[ value ] other_filters Filtering options (Filter must be true) Choice $filter and not defined $other_masking filter and other_masking is None null null coil " -F C" " -F C" seg+coil " -F \"C;S\"" " -F \"C;S\"" dust " -F D" " -F D" A coiled-coiled filter, based on the work of Lupas et al. (Science, vol 252, pp. 1162-4 (1991)) written by John Kuzio (Wilson et al., J Gen Virol, vol. 76, pp. 2923-32 (1995)) other_masking Masking options (Filter must be true) Choice $filter == 1 and not defined $other_filters filter == 1 and other_filters is None null null maskSEG " -F \"m S\"" " -F \"m S\"" maskCoil " -F \"m D\"" " -F \"m D\"" maskDust " -F \"m C\"" " -F \"m C\"" lowerMask " -F m" " -F m" For Lower-case masking the lower case filtering must be select. ($value eq 'null' or $value eq 'maskSEG' or $value eq 'maskCoil' or $value eq 'maskDust']) or ($value eq 'lowerMask' and $lower_case) value in ['null', 'maskSEG', 'maskCoil', 'maskDust'] or (value == 'lowerMask' and lower_case) A coiled-coiled filter, based on the work of Lupas et al. (Science, vol 252, pp. 1162-4 (1991)) written by John Kuzio (Wilson et al., J Gen Virol, vol. 76, pp. 2923-32 (1995)). It is possible to specify that the masking should only be done during the process of building the initial words . If the -U option (to mask any lower-case sequence in the input FASTA file) is used and one does not wish any other filtering, but does wish to mask when building the lookup tables then one should specify: -F 'm' lower_case Use lower case filtering (-U) Boolean 0 ($value) ? " -U T" : "" ("", " -U T")[value] This option specifies that any lower-case letters in the input FASTA file should be masked. selectivity_opt Selectivity options 7 Expect Expected value (-e) Float 10 (defined $value and $value != $vdef) ? " -e $value" : "" ("" , " -e "+str(value) )[value is not None and value != vdef] The statistical significance threshold for reporting matches against database sequences; the default value is 10, such that 10 matches are expected to be found merely by chance, according to the stochastic model of Karlin and Altschul (1990). If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Lower EXPECT thresholds are more stringent, leading to fewer chance matches being reported. Fractional values are acceptable. word_size Word Size (-W) Integer (defined $value) ? " -W $value" : "" ("" , " -W "+str(value) )[value is not None] Use words of size N. Zero invokes default behavior Default values: - 11 for blastn - 3 for others dropoff_extent X dropoff value for gapped alignment (-X) Float (defined $value) ? " -X $value" : "" ("" , " -X "+str(value))[value is not None] This is the value that control the path graph region explored by Blast during a gapped extension (Xg in the NAR paper) (default for blastp is 15). Default values: - 30 for blastn - 0 for tblastx - 15 for others eff_len Effective length of the search space (-Y) Integer (defined $value) " -Y $value" : "" ("" , " -Y "+str(value) )[value is not None] Use zero for the real size gapped_alig Perform or not gapped alignment (not available with tblastx) (-g) Boolean $bl2seq ne "tblastx" bl2seq != "tblastx" 1 ($value) ? "" : " -g F " (" -g F " , "")[value] translation_opt Translation options $bl2seq =~ /^(blastx|tblast[nx])$/ bl2seq in [ "blastx", "tblastx", "tblastn" ] 8 strand Query strand to search against second sequence (for blastx, tblastx or tblastn) (-S) Choice 3 1 2 3 (defined $value and $value ne $vdef) ? " -S $value" : "" ( "" , " -S " + str(value) )[ value is not None and value!= vdef] output_opt Output options 10 outformat Output format (-D) Choice 0 0 1 (defined $value and $value ne $vdef) ? "" : "-D $value" ( "" , " -D " + str( value ) )[ value is not None and value != vdef] Programs-5.1.1/scan_region.xml0000644000175000001560000006073511767572177015222 0ustar bneronsis scan_region scan_region Scan genomic regions in a query-file against a DB-file which contains chromosome locations for various genomics features genetics:detection scan_region.pl cnvfile CNV calls file (cnv) Cnv AbstractText (defined $value) ? " $value " : "" ( "" , " " + str(value) )[ value is not None] A file containing CNV calls, that could be generated by the test operation of detect_cnv program. 1 reffile Reference genes for CNV calls generated using hg18 (Mar 2006, NCBI build 36) human genome assembly Choice null null hg18_refGene.txt UCSCknownGene.txt (defined $value) ? " $value " : "" ( "" , " " + str(value) )[ value is not None ] 2 reference Flags specifying type of databases Choice null null --refgene --refcds --refexon --knowngene (defined $value and $value ne $vdef) ? " $value " : "" ( "" , " " + str(value) )[ value is not None and value !=vdef] 3 --refgene: specify that the database file is in refGene format from UCSC genome browser. --refcds: specify that the database file is in refGene format from UCSC genome browser, but user is only interested in the overlap of coding region (first exon to last exon). --refexon: specify that the database file is in refGene format from UCSC genome browser, but user is only interested in the overlap of query with exons. --knowngene: specify that the database file is in knownGene format from UCSC genome browser. dbfile-specific Database-specific arguments name2 Use name2 annotation in refGene file in output Boolean $reference eq '--refgene' reference == '--refgene' 0 ($value) ? "--name2 " : "" ( "" , " --name2 " )[ value] 3 This argument is used in conjunction with the --refgene argument, to specify that the alternative gene symbol in the "name2" field in the refGene file be printed in the output. reflink Specify a cross-reference file for the RefGene track in UCSC genome browser Boolean $reference ne '--knowngene' reference != '--knowngene' 0 ($value) ? "--reflink hg18_refLink.txt " : "" ( "" , " --reflink hg18_refLink.txt " )[ value] 3 Specify a cross-reference file for the RefGene track in UCSC genome browser, so that in the output, the gene identifier (gene name or refseq id) are replaced by the gene symbol specified in the link file. (If not found in the reflink file, the gene identifiers are still used). kgxref Specify a cross-reference file for the knownGene track in UCSC genome browser Boolean $reference eq '--knowngene' reference == '--knowngene' 0 ($value) ? "--kgxref UCSCkgXref.txt " : "" ( "" , " --kgxref UCSCkgXref.txt " )[ value] 3 Specify a cross-reference file for the knownGene track in UCSC genome browser, so that in the output, the gene identifier (gene name or refseq id) are replaced by the gene symbol specified in the kgxref file. (If not found in the kgxref file, the gene identifiers are still used). query-db-match Criteria for defining query-db match condense_query Condense and eliminate overlapping regions in query Boolean 0 ($value) ? "--condense_query " : "" ( "" , " --condense_query " )[ value] 3 Condense overlapped regions in the query file into non-over-lapped regions. When this argument is set, the annotation for each query (the strings after the chromosome location in each line of the query file) will not in the output. score_threshold Score threshold for database in UCSC annotation file Float (defined $value) ? " --score_threshold $value " : "" ( "" , " --score_threshold " + str(value) )[ value is not None] 3 Specify the score threshold in the database file to include in the search for overlaps. This argument is file format dependent. normscore_threshold Normalized score threshold for database in UCSC annotation file Float (defined $value) ? " --normscore_threshold $value " : "" ( "" , " --normscore_threshold " + str(value) )[ value is not None] 3 Specify the normalized score threshold in the database file to include in the search for overlaps. This argument is file format dependent. expansion_query Expansion of query to find match expandleft Expand left side of query regions (overwrite --expandmax) Integer ($reference eq '--knowngene' or $reference eq '--refgene') and (not $expandmax) (reference == '--knowngene' or reference == '--refgene') and (not expandmax) (defined $value) ? " --expandleft $value " : "" ( "" , " --expandleft " + str(value) )[ value is not None] 3 Expand the query region on the left side (5 megabases in forward strand, 3 megabases in reverse strand) to find overlap (used in conjunction with --refgene or --knowngene argument). expandright Expand right side of query regions (overwrite --expandmax) Integer ($reference eq '--knowngene' or $reference eq '--refgene') and (not $expandmax) (reference == '--knowngene' or reference == '--refgene') and (not expandmax) (defined $value) ? " --expandright $value " : "" ( "" , " --expandright " + str(value) )[ value is not None] 3 Expand the query region on the right side (3 megabases in forward strand, 5 megabases in reverse strand) to find overlap (used in conjunction with --refgene or --knowngene argument). expandmax Size of maximum expansion for query region to find overlap Integer $reference eq '--knowngene' or $reference eq '--refgene' reference == '--knowngene' or reference == '--refgene' (defined $value) ? " --expandmax $value " : "" ( "" , " --expandmax " + str(value) )[ value is not None] 3 Maximum expansion size of the query region on both side to find at least one overlap (used in junction with --refgene or --knowngene argument). After query expansion, only the closet gene will be printed; other genes, even if overlapping with the query after expansion, will not be printed. expanddb Expand definition of gene/cds/exon at both sides Integer (defined $value) ? " --expanddb $value " : "" ( "" , " --expanddb " + str(value) )[ value is not None] 3 Expand the chromosome region specified in the database-file to find overlap with the query regions. output_option Input/output options overlap Print overlapped portion of region only Boolean 0 ($value) ? "--overlap " : "" ( "" , " --overlap " )[ value] 3 Instead of printing the query region, only print the overlapped portion of the query region and template region. dbregion Print database region (default is to print query region) Boolean 0 ($value) ? "--dbregion " : "" ( "" , " --dbregion " )[ value] 3 Print the region in database file, rather than query file, when an overlapped hit is found. append Append extra information from annotation file to output Boolean 0 ($value) ? "--append " : "" ( "" , " --append " )[ value] 3 Append the score and normscore for the overlapped template region to the output for database files downloaded as UCSC tables. queryinfo Force to print query info when Print database region is used Boolean $dbregion dbregion 0 ($value) ? "--queryinfo " : "" ( "" , " --queryinfo " )[ value] 3 output_file Output file Cnv AbstractText "scan_region.out" "scan_region.out" Programs-5.1.1/blast2seqid.xml0000644000175000001560000000761511767601016015127 0ustar bneronsis blast2seqid 1.0 blast2seqid Extract sequence Ids from blast hits (in USA format) Bertrand Néron https://projets.pasteur.fr/projects/list_files/blast2usa https://projets.pasteur.fr/projects/show/blast2usa Extract the Identifier and Data Bank of the hits from the summary of a blast report ( in text format -m 0-6 ) The result is in USA list format. database:search:display blast2usa infile BLAST text report BlastTextReport Report " $value" " "+str(value) 40 A blast output in pairwise format ( option -m 0 default ). output Output options From ignore the hits until the hit n (integer) Integer (defined $value) ? " --from $value" : "" ( "" , " --from " + str(value) )[value is not None] 10 To ignore the hits after the hit n (integer) Integer (defined $value) ? " --to $value" : "" ( "" , " --to " + str(value) )[value is not None] 20 id_list hits identifier GenesId AbstractText USAList "blast2seqid.out" "blast2seqid.out" Programs-5.1.1/notseq.xml0000644000175000001560000002247012072525233014213 0ustar bneronsis notseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net notseq Write to file a subset of an input stream of sequences http://bioweb2.pasteur.fr/docs/EMBOSS/notseq.html http://emboss.sourceforge.net/docs/themes sequence:edit notseq e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_required Required section e_exclude Sequence names to exclude String ("", " -exclude=" + str(value))[value is not None] 2 Enter a list of sequence names or accession numbers to exclude from the sequences read in. The excluded sequences will be written to the file specified in the 'junkout' parameter. The remainder will be written out to the file specified in the 'outseq' parameter. The list of sequence names can be separated by either spaces or commas. The sequence names can be wildcarded. The sequence names are case independent. An example of a list of sequences to be excluded is: myseq, hs*, one two three a file containing a list of sequence names can be specified by giving the file name preceeded by a '@', eg: '@names.dat' e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename notseq.e_outseq ("" , " -outseq=" + str(value))[value is not None] 3 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 4 e_outseq_out outseq_out option Sequence e_outseq e_junkoutseq Name of the output sequence file (e_junkoutseq) Filename notseq.e_junkoutseq ("" , " -junkoutseq=" + str(value))[value is not None] 5 This file collects the sequences which you have excluded from the main output file of sequences. e_osformat_junkoutseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 6 e_junkoutseq_out junkoutseq_out option Sequence e_junkoutseq auto Turn off any prompting String " -auto -stdout" 7 Programs-5.1.1/degapseq.xml0000644000175000001560000001254212072525233014472 0ustar bneronsis degapseq EMBOSS 6.3.1 EMBOSS European Molecular Biology Open Software Suite Rice,P. Longden,I. and Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby, A. Trends in Genetics 16, (6) pp276--277 http://emboss.sourceforge.net/download http://emboss.sourceforge.net degapseq Removes non-alphabetic (e.g. gap) characters from sequences http://bioweb2.pasteur.fr/docs/EMBOSS/degapseq.html http://emboss.sourceforge.net/docs/themes sequence:edit degapseq e_input Input section e_sequence sequence option Sequence EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF 1,n ("", " -sequence=" + str(value))[value is not None] 1 e_output Output section e_outseq Name of the output sequence file (e_outseq) Filename degapseq.e_outseq ("" , " -outseq=" + str(value))[value is not None] 2 e_osformat_outseq Choose the sequence output format Choice FASTA EMBL FASTA GCG GENBANK NBRF CODATA RAW SWISSPROT GFF ("", " -osformat=" + str(value))[value is not None and value!=vdef] 3 e_outseq_out outseq_out option Sequence e_outseq auto Turn off any prompting String " -auto -stdout" 4