./fasta-36.3.8h/ 0000755 0001054 0001054 00000000000 13525131122 010654 5 ustar wrp ./fasta-36.3.8h/COPYRIGHT 0000644 0001054 0001054 00000001647 13525131122 012157 0 ustar wrp Copyright (c) 1996, 1997, 1998, 1999, 2002, 2014, 2015 by William R. Pearson and The Rector & Visitors of the University of Virginia */ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under this License is distributed on an "AS IS" BASIS, WITHOUT WRRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Code in the smith_waterman_sse2.c and smith_waterman_sse2.h files is copyright (c) 2006 by Michael Farrar. Code in the global_sse2.c, global_sse2.h, glocal_sse2.c, and glocal_sse2.h files is copyright (c) 2010 by Michael Farrar. ./fasta-36.3.8h/FASTA_LIST 0000644 0001054 0001054 00000001043 13525131122 012266 0 ustar wrp 4 Aug 2010 If you regularly install the latest version of the FASTA package from http://faculty.virginia.edu/wrpearson/fasta, you may want to join the fasta_list SYMPA mailing list. I use this list to announce new releases and solicit bug reports. To join the mailing list, go to the WWW page at: lists.virginia.edu/sympa/info/fasta_list Select the "Subscribe" option on the lower left, and at the linked page, enter your email address, and click "submit". You will be asked to confirm your membership in the mailing list. Bill Pearson ./fasta-36.3.8h/LICENSE 0000644 0001054 0001054 00000026075 13525131122 011673 0 ustar wrp Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "{}" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright {yyyy} {name of copyright owner} Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ./fasta-36.3.8h/README 0000644 0001054 0001054 00000005546 13525131122 011546 0 ustar wrp [This file has been replaced by README.md] December, 2017 The most up-to-date version information on FASTA versions is available in README.md and doc/readme.v36 . July, 2015 This version of the FASTA programs is fasta-36.3.8. Since March, 2011 (fasta-36.3.4), the FASTA programs are no longer interactive. Typing bin/fasta36 (or any of the other programs) provides a help message. The "classic" interactive mode is available by typing "fasta36 -I". In addition, there is only one version of the programs, "fasta36", "ssearch36", etc., which is threaded by default on Unix/Linux/MacOSX. As of November, 2014, the FASTA program code is avaiable under the Apache 2.0 open source license. Up to date release notes are available in the file doc/readme.v36 Documentation on the fasta3 version programs is available in the files: doc/fasta36.1 (unix man page) doc/changes_v36.html (short descriptions of enhancements to FASTA programs) doc/readme.v36 (text descriptions of bug fixes and version history) doc/fasta_guide.tex (Latex file which describes fasta-36, and provides an introduction to the FASTA programs, their use and installation.) doc/fasta_guide.pdf (printable/viewable description of fasta-36) The latter two files provide background information on installing the fasta programs (in particular, the FASTLIBS file), that new users of the fasta3 package may find useful. ================================================================ The FASTA distribution directories (this directory) has been substantially re-organized to make it easier to find things. However, some documentation has not yet been completely updated to reflect the re-organization, so some things may not make sense. Files can now be found in several sub-directories bin/ (pre-compiled binaries for some architectures) conf/ example fastlibs files data/ scoring matrices doc/ documentation files make/ make files misc/ perl scripts to reformat -m 9 output, convert -R search.res files for 'R', and embed domains in shuffled sequences scripts/ perl scripts for -V (annotate alignments) and -E (expand library) options seq/ test sequences src/ source code sql/ sql files and scripts for using the sql database access test/ test scripts For some binary distributions, only the doc/, data/, seq/, and bin/, directories are provided. ================ To make the standard FASTA programs: cd src make -f ../make/Makefile.linux_sse2 all where "../make/Makefile.linux_sse2" is the appropriate file for your system. The executable programs will then be found in ../bin (e.g. ../bin/fasta36, etc.) For a simple test of a program, try (from the src directory) ../bin/fasta36 -q ../seq/mgstm1.aa ../seq/prot_test.lseg ================================================================ Bill Pearson wrp@virginia.edu ./fasta-36.3.8h/README.md 0000644 0001054 0001054 00000011326 13525131122 012136 0 ustar wrp ## The FASTA package - protein and DNA sequence similarity searching and alignment programs The **FASTA** (pronounced FAST-Aye, not FAST-Ah) programs are a comprehensive set of similarity searching and alignment programs for searching protein and DNA sequence databases. Like the **BLAST** programs `blastp` and `blastn`, the `fasta` program itself uses a rapid heuristic strategy for finding similar regions in protein and DNA sequences. But in addition to heuristic similarity searching, the FASTA package provides programs for rigorous local (`ssearch`) and global (`ggsearch`) similarity searching, as well as a program for finding non-overlapping sequence similarities (`lalign`). Like BLAST, the FASTA package also includes programs for aligning translated DNA sequences against proteins (`fastx`, `fasty` are equivalent to `blastx`, and `tfastx`, `tfasty` are similar to `tblastn`). #### March, 2019 An updated release of the FASTA package (`fasta-36.3.8h`) is available. In addition to minor bug fixes, the latest version can generate query and library sequences using program scripts. See doc/README_v36.3.8h.md and doc/readme.v36 for a more complete summary of changes. #### December, 2018 The latest version of the FASTA package is `fasta-36.3.8h`, Dec. 2018. See doc/README_v36.3.8h.md for a more complete summary of changes. #### November, 2018 The current released version of the FASTA package is `fasta-36.3.8h`, Nov. 2018 See doc/README_v36.3.8h.md for a more complete summary of changes. #### October, 2018 The current version of the FASTA package is fasta-36.3.8g, Oct. 2018 See doc/README_v36.3.8h.md for a more complete summary of changes. #### April, 2018 The current version of the FASTA package is fasta-36.3.8g, Apr. 2018 #### December, 2017 The current FASTA version is fasta-36.3.8g, Dec. 2017 The statistics routines for normally distributed scores (ggsearch36, glsearch36) are more robust to very low E()-value thresholds. #### Sept, 2017 The current FASTA version is fasta-36.3.8f, Sept. 2017 If the -S option is used and a query sequence has no upper case letters, it is re-read with lower-case letters converted to upper-case. #### May, 2017 The current FASTA version is fasta-36.3.8f, May. 2017 Various bugs in sub-alignment scoring corrected and support for the EBI SP:GSTM1_HUMAN P09488 added. The format for the `$SRCH_URL` and `$SRCH_URL2` format strings has changed to enable pairwise alignment. #### September, 2016 The fasta-36.3.6e version includes a new directory, `psisearch2`, with scripts to run iterative PSSM (PSI-BLAST or SSEARCH36) searches using an improved strategy for reducing PSSM contamination due to alignment over-extension. As of November, 2014, the FASTA program code is available under the Apache 2.0 open source license. Up-to-date release notes are available in the file `doc/readme.v36`. Documentation on the FASTA programs is available in the files: dir/file | description ----------|------------ `doc/fasta36.1` | (unix man page) `doc/changes_v36.html` | (short descriptions of enhancements to FASTA programs) `doc/readme.v36` | (text descriptions of bug fixes and version history) `doc/fasta_guide.tex` | (Latex file which describes fasta36, and provides an introduction to the FASTA programs, their use and installation.) `doc/fasta_guide.pdf1` | (printable/viewable description of fasta-36) `fasta_guide.pdf` provides background information on installing the fasta programs (in particular, the `FASTLIBS` file), that new users of the fasta3 package may find useful. Parts of the FASTA package are distributed across several sub-directories dir | description ----|------------ `bin/` | (pre-compiled binaries for some architectures) `conf/` | example `FASTLIBS` files (files for finding libraries) `data/` | scoring matrices `doc/` | documentation files `make/` | make files `misc/` | perl scripts to reformat -m 9 output, convert -R search.res files for 'R', and embed domains in shuffled sequences `psisearch2/` | perl/python scripts implementing the new `psisearch2_msa` iterative PSSM search `scripts/` | perl scripts for -V (annotate alignments) and -E (expand library) options `seq/` | test sequences `src/` | source code `sql/` | sql files and scripts for using the sql database access `test/` | test scripts For some binary distributions, only the `doc/`, `data/`, `seq/`, and `bin/`, directories are provided. To make the standard FASTA programs: ``` cd src make -f ../make/Makefile.linux_sse2 all ``` where `../make/Makefile.linux_sse2` is the appropriate Makefile for your system. The executable programs will then be found in `../bin` (e.g. `../bin/fasta36`, etc.) For a simple test of a program, try (from the src directory) ``` ../bin/fasta36 -q ../seq/mgstm1.aa ../seq/prot_test.lseg ``` ./fasta-36.3.8h/bin/ 0000755 0001054 0001054 00000000000 13525131122 011424 5 ustar wrp ./fasta-36.3.8h/bin/README 0000644 0001054 0001054 00000000075 13525131122 012306 0 ustar wrp Placeholder file to create destination for program binaries. ./fasta-36.3.8h/conf/ 0000755 0001054 0001054 00000000000 13525131122 011601 5 ustar wrp ./fasta-36.3.8h/conf/README 0000644 0001054 0001054 00000001321 13525131122 012456 0 ustar wrp 22-Jan-2014 fasta36/conf ================ Files that allow FASTA programs to find libraries using abbreviations. For example, if the fast_libs_e.www has the line: Swissprot (NCBI)$0Q${SLIB2}/fa_dbs/swissprot.lseg and export SLIB2=/slib2 then: fasta36 ../seq/mgstm1.aa q is equivalent to: fasta36 ../seq/mgstm1.aa /slib2/fa_dbs/swissprot.lseg ================ fastlibs -- the original library abbreviation file fast_new -- allows abbreviations longer than one letter by using "+abbrev+" NBRF PIR1 Annotated Protein Database (rel 56)$0+pir1+/slib2/fa_dbs/pir1.lseg fast_libs_e.www -- use environment variables in library file name (+long+ abbreviations and ${SLIB2} environment variables can be combined) ./fasta-36.3.8h/conf/fast_libs_e.www 0000644 0001054 0001054 00000002263 13525131122 014624 0 ustar wrp PIR1 Annotated (rel. 66) $0A${SLIB2}/fa_dbs/pir1.lseg Swissprot (NCBI)$0Q${SLIB2}/fa_dbs/swissprot.lseg NCBI Refseq NP only$0P${SLIB2}/fa_dbs/refseq_np.lseg NCBI Refseq proteins$0S${SLIB2}/fa_dbs/refseq_protein.lseg NCBI PDB structures$0D${SLIB2}/fa_dbs/pdbaa.lseg NCBI NR non-redundant$0N${SLIB2}/fa_dbs/nr.lseg Human/Refseq proteins$0H${SLIB2}/genomes/hum_refseq.lseg Mouse/Refseq proteins$0M${SLIB2}/genomes/mus_refseq.lseg Rat/Refseq proteins$0R${SLIB2}/genomes/rat_refseq.lseg Drosophila/RefSeq proteins$0F${SLIB2}/genomes/d_melanogaster.lseg C. elegans/RefSeq proteins$0W${SLIB2}/genomes/c_elegans.lseg Arabidopsis/RefSeq proteins$0L${SLIB2}/genomes/a_thaliana.lseg Yeast (S. cerevisiae)${SLIB2}/genomes/s_cerevisiae.lseg E. coli proteins$0E${SLIB2}/genomes/ecoli_k12.lseg GB170.0 Primate$1P@${RDLIB2}/gb_asn/gbpri.nam GB170.0 Rodent$1R@${RDLIB2}/gb_asn/gbrod.nam GB170.0 other Mammal$1M@${RDLIB2}/gb_asn/gbmam.nam GB170.0 verteBrates$1B@${RDLIB2}/gb_asn/gbvrt.nam GB170.0 Invertebrates$1I@${RDLIB2}/gb_asn/gbinv.nam GB170.0 Bacteria$1T@${RDLIB2}/gb_asn/gbbct.nam GB170.0 pLants$1L@${RDLIB2}/gb_asn/gbpln.nam GB171.0 Viral$1V@${RDLIB2}/gb_asn/gbvrl.nam GB171.0 Phage$1G@${RDLIB2}/gb_asn/gbphg.nam ./fasta-36.3.8h/conf/fast_new 0000644 0001054 0001054 00000003662 13525131122 013341 0 ustar wrp NBRF PIR1 Annotated Protein Database (rel 56)$0+pir1+/slib2/fa_dbs/pir1.lseg NBRF Protein database (complete)$0+nbrf+@/seqlib/lib/NBRF.nam NRL_3d structure database$0D/seqlib/lib/nrl_3d.seq 5 NCBI/Blast non-redundant proteins$0+nr+/slib2/fa_dbs/nr.lseg NCBI/Blast Swissprot$0+sp+/slib2/fa_dbs/swissprot.lseg GENPEPT Translated Protein Database (rel 106.0)$0G/slib2/fa_dbs/genpept.fsa Swiss-Prot Release 34$0S/slib0/lib/swiss.seq 5 Yeast proteins$0Y/slib0/genomes/yeast_nr.pep C. elegans blast server$0W/slib2/fa_dbs/C.elegans_blast.fa E. coli proteome$0E/slib0/genomes/ecoli.npep H. influenzae proteome$0I/slib0/genomes/hinf.npep H. pylori proteome$0L/slib0/genomes/hpyl.npep NCBI Entrez Human proteins$0H/slib2/fa_dbs/human.aa M. pneumococcus proteome$0M/slib0/genomes/mpneu.npep M. jannaschii proteome$0J/slib0/genomes/mjan.npep Synechosystis proteome$0C/slib0/genomes/synecho.npep GB108.0 Invertebrates$1I/seqlib2/gcggenbank/gb_in.seq 6 GB108.0 Bacteria$1T@/slib0/lib/gb_ba.nam 6 GB108.0 Primate$1P@/slib0/lib/gb_pri.nam GB108.0 Rodent$1R/seqlib2/gcggenbank/gb_ro.seq 6 GB108.0 other Mammal$1M/seqlib2/gcggenbank/gb_om.seq 6 GB108.0 verteBrates$1B/seqlib2/gcggenbank/gb_ov.seq 6 GB108.0 Expressed Seq. Tags$1E@/slib0/lib/gb_est.nam GB108.0 High throughput genmomic$1h/seqlib2/gcggenbank/gb_htg.seq 6 GB108.0 pLants$1L@/slib0/lib/gb_pl.nam 6 GB108.0 genome Survey sequences$1S@/slib0/lib/gb_gss.nam 6 GB108.0 Viral$1V/seqlib2/gcggenbank/gb_vi.seq 6 GB108.0 Phage$1G/seqlib2/gcggenbank/gb_ph.seq 6 GB108.0 Unannotated$1D/seqlib2/gcggenbank/gb_un.seq 6 GB108.0 New$1u/seqlib2/gcggenbank/gb_new.seq 6 GB108.0 All sequences (long)$1A@/slib0/lib/genbank.nam Yeast genome$1Y@/seqlib/yeast/yeast_chr.nam E. coli genome$1D/slib0/genomes/ecoli.gbk 1 Blast Human ESTs$1F/slib2/fa_dbs/est_human TIGR Human Gene Index$1K/slib2/fa_dbs/HGI.nr.031898 Blast Mouse ESTs$1C/slib2/fa_dbs/est_mouse TIGR Mouse Gene Index$1J/slib2/fa_dbs/MGI.nr.022498 NCBI/BLAST NR DNA$1n/slib2/fa_dbs/nt ./fasta-36.3.8h/conf/fastlibs 0000644 0001054 0001054 00000004212 13525131122 013332 0 ustar wrp NBRF PIR1 Annotated Protein Database (rel 56)$0A/seqlib/lib/pir1.seq 5 NBRF PIR1 Annotated (seg) (rel 56)$0B/slib2/fa_dbs/pir1.seg NBRF Protein database (complete)$0P@/seqlib/lib/NBRF.nam NRL_3d structure database$0D/seqlib/lib/nrl_3d.seq 5 NCBI/Blast non-redundant proteins$0N/slib2/fa_dbs/nr NCBI/Blast non-redundant proteins (seg)$0K/slib2/fa_dbs/nr.seg NCBI/Blast Swissprot$0Q/slib2/fa_dbs/swissprot NCBI/Blast Swissprot (seg)$0R/slib2/fa_dbs/swissprot.seg OWL 30.1 non-redundant protein database$0O/slib2/OWL/owl.seq 5 GENPEPT Translated Protein Database (rel 106.0)$0G/slib2/fa_dbs/genpept.fsa Swiss-Prot Release 34$0S/slib0/lib/swiss.seq 5 Yeast proteins$0Y/slib0/genomes/yeast_nr.pep C. elegans blast server$0W/slib2/fa_dbs/C.elegans_blast.fa E. coli proteome$0E/slib0/genomes/ecoli.npep H. influenzae proteome$0I/slib0/genomes/hinf.npep H. pylori proteome$0L/slib0/genomes/hpyl.npep NCBI Entrez Human proteins$0H/slib2/fa_dbs/human.aa M. pneumococcus proteome$0M/slib0/genomes/mpneu.npep M. jannaschii proteome$0J/slib0/genomes/mjan.npep Synechosystis proteome$0C/slib0/genomes/synecho.npep GB108.0 Invertebrates$1I/seqlib2/gcggenbank/gb_in.seq 6 GB108.0 Bacteria$1T@/slib0/lib/gb_ba.nam 6 GB108.0 Primate$1P@/slib0/lib/gb_pri.nam GB108.0 Rodent$1R/seqlib2/gcggenbank/gb_ro.seq 6 GB108.0 other Mammal$1M/seqlib2/gcggenbank/gb_om.seq 6 GB108.0 verteBrates$1B/seqlib2/gcggenbank/gb_ov.seq 6 GB108.0 Expressed Seq. Tags$1E@/slib0/lib/gb_est.nam GB108.0 High throughput genmomic$1h/seqlib2/gcggenbank/gb_htg.seq 6 GB108.0 pLants$1L@/slib0/lib/gb_pl.nam 6 GB108.0 genome Survey sequences$1S@/slib0/lib/gb_gss.nam 6 GB108.0 Viral$1V/seqlib2/gcggenbank/gb_vi.seq 6 GB108.0 Phage$1G/seqlib2/gcggenbank/gb_ph.seq 6 GB108.0 Unannotated$1D/seqlib2/gcggenbank/gb_un.seq 6 GB108.0 New$1u/seqlib2/gcggenbank/gb_new.seq 6 GB108.0 All sequences (long)$1A@/slib0/lib/genbank.nam Yeast genome$1Y@/seqlib/yeast/yeast_chr.nam E. coli genome$1D/slib0/genomes/ecoli.gbk 1 Blast Human ESTs$1F/slib2/fa_dbs/est_human TIGR Human Gene Index$1K/slib2/fa_dbs/HGI.nr.031898 Blast Mouse ESTs$1C/slib2/fa_dbs/est_mouse TIGR Mouse Gene Index$1J/slib2/fa_dbs/MGI.nr.022498 NCBI/BLAST NR DNA$1n/slib2/fa_dbs/nt ./fasta-36.3.8h/data/ 0000755 0001054 0001054 00000000000 13525131122 011565 5 ustar wrp ./fasta-36.3.8h/data/VTML_10.mat 0000644 0001054 0001054 00000005314 13525131122 013355 0 ustar wrp # # VTML_10 # # This matrix was produced from: vtml_10qij.mat using vtml_P.mat background frequencies # # VTML_10 substitution matrix, Units = bits/2.0 # Expected score = -3.896435 bits; Entropy = 3.467957 bits # Target fraction identity = 0.9105 # Lowest Score = -20, Highest Score= 12 # A R N D C Q E G H I L K M F P S T W Y V B Z X * A 7 -8 -8 -8 -5 -7 -7 -6 -9 -9 -9 -8 -7 -10 -6 -4 -5 -11 -10 -5 -8 -7 0 -7 R -8 8 -7 -16 -9 -4 -14 -9 -5 -10 -10 -2 -8 -12 -9 -8 -8 -10 -9 -11 -11 -9 0 -7 N -8 -7 9 -3 -10 -5 -7 -7 -4 -11 -11 -5 -9 -12 -10 -4 -5 -12 -8 -11 3 -6 0 -7 D -8 -16 -3 8 -18 -6 -3 -8 -6 -15 -19 -7 -11 -20 -8 -7 -8 -12 -17 -11 2 -4 0 -7 C -5 -9 -10 -18 12 -17 -18 -9 -8 -7 -16 -17 -6 -17 -11 -5 -7 -19 -6 -5 -14 -17 0 -7 Q -7 -4 -5 -6 -17 9 -3 -10 -3 -12 -8 -4 -6 -10 -7 -6 -7 -19 -16 -9 -5 3 0 -7 E -7 -14 -7 -3 -18 -3 8 -8 -8 -12 -10 -4 -10 -18 -8 -6 -7 -20 -9 -9 -5 2 0 -7 G -6 -9 -7 -8 -9 -10 -8 7 -9 -19 -13 -9 -12 -13 -10 -6 -10 -11 -12 -12 -7 -9 0 -7 H -9 -5 -4 -6 -8 -3 -8 -9 10 -11 -9 -7 -16 -7 -8 -6 -7 -8 -3 -10 -5 -5 0 -7 I -9 -10 -11 -15 -7 -12 -12 -19 -11 8 -3 -11 -3 -7 -13 -11 -7 -8 -10 -1 -13 -12 0 -7 L -9 -10 -11 -19 -16 -8 -10 -13 -9 -3 7 -10 -2 -5 -9 -10 -9 -8 -8 -5 -15 -9 0 -7 K -8 -2 -5 -7 -17 -4 -4 -9 -7 -11 -10 8 -7 -18 -8 -7 -6 -10 -10 -10 -6 -4 0 -7 M -7 -8 -9 -11 -6 -6 -10 -12 -16 -3 -2 -7 10 -4 -12 -10 -6 -16 -15 -5 -10 -8 0 -7 F -10 -12 -12 -20 -17 -10 -18 -13 -7 -7 -5 -18 -4 9 -11 -9 -10 -5 -1 -8 -16 -14 0 -7 P -6 -9 -10 -8 -11 -7 -8 -10 -8 -13 -9 -8 -12 -11 9 -6 -8 -11 -19 -9 -9 -7 0 -7 S -4 -8 -4 -7 -5 -6 -6 -6 -6 -11 -10 -7 -10 -9 -6 8 -3 -10 -8 -10 -5 -6 0 -7 T -5 -8 -5 -8 -7 -7 -7 -10 -7 -7 -9 -6 -6 -10 -8 -3 8 -19 -10 -6 -6 -7 0 -7 W -11 -10 -12 -12 -19 -19 -20 -11 -8 -8 -8 -10 -16 -5 -11 -10 -19 12 -4 -17 -12 -19 0 -7 Y -10 -9 -8 -17 -6 -16 -9 -12 -3 -10 -8 -10 -15 -1 -19 -8 -10 -4 10 -10 -12 -12 0 -7 V -5 -11 -11 -11 -5 -9 -9 -12 -10 -1 -5 -10 -5 -8 -9 -10 -6 -17 -10 7 -11 -9 0 -7 B -8 -11 3 2 -14 -5 -5 -7 -5 -13 -15 -6 -10 -16 -9 -5 -6 -12 -12 -11 8 -4 0 -7 Z -7 -9 -6 -4 -17 3 2 -9 -5 -12 -9 -4 -8 -14 -7 -6 -7 -19 -12 -9 -4 8 0 -7 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 -7 * -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 1 ./fasta-36.3.8h/data/VTML_120.mat 0000644 0001054 0001054 00000004764 13525131122 013447 0 ustar wrp # # VTML_120 # # This matrix was produced from: vtml_120qij.mat using vtml_P.mat background frequencies # # VTML_120 substitution matrix, Units = bits/2.0 # Expected score = -0.712191 bits; Entropy = 0.933608 bits # Target fraction identity = 0.3740 # Lowest Score = -7, Highest Score= 11 # A R N D C Q E G H I L K M F P S T W Y V B Z X A 4 -2 -1 -1 0 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -4 -3 0 -1 -1 0 R -2 6 -1 -3 -3 1 -2 -3 0 -4 -3 3 -2 -4 -2 -1 -2 -3 -3 -3 -2 0 0 N -1 -1 6 2 -3 0 0 -1 1 -4 -4 0 -3 -4 -2 1 0 -5 -2 -3 4 0 0 D -1 -3 2 6 -5 0 2 -1 -1 -5 -6 -1 -4 -7 -2 -1 -1 -6 -5 -4 4 1 0 C 0 -3 -3 -5 10 -4 -5 -2 -2 -1 -4 -4 -1 -4 -3 0 -1 -6 -1 0 -4 -4 0 Q -1 1 0 0 -4 5 2 -2 1 -3 -2 1 -1 -3 -1 -1 -1 -6 -4 -2 0 3 0 E -1 -2 0 2 -5 2 5 -2 -1 -4 -4 1 -3 -5 -2 -1 -1 -6 -3 -3 1 3 0 G 0 -3 -1 -1 -2 -2 -2 6 -2 -6 -5 -2 -4 -5 -3 -1 -2 -4 -5 -4 -1 -2 0 H -2 0 1 -1 -2 1 -1 -2 7 -3 -2 -1 -3 -1 -2 -1 -1 -2 2 -3 0 0 0 I -2 -4 -4 -5 -1 -3 -4 -6 -3 5 2 -3 2 0 -4 -3 -1 -2 -2 3 -4 -3 0 L -2 -3 -4 -6 -4 -2 -4 -5 -2 2 5 -3 2 1 -3 -3 -2 -2 -1 1 -5 -3 0 K -1 3 0 -1 -4 1 1 -2 -1 -3 -3 5 -2 -5 -1 -1 -1 -4 -3 -3 0 1 0 M -1 -2 -3 -4 -1 -1 -3 -4 -3 2 2 -2 7 1 -4 -3 -1 -4 -3 1 -3 -2 0 F -3 -4 -4 -7 -4 -3 -5 -5 -1 0 1 -5 1 7 -4 -3 -3 1 4 -1 -5 -4 0 P -1 -2 -2 -2 -3 -1 -2 -3 -2 -4 -3 -1 -4 -4 7 0 -1 -4 -5 -3 -2 -1 0 S 1 -1 1 -1 0 -1 -1 -1 -1 -3 -3 -1 -3 -3 0 4 2 -3 -2 -2 0 -1 0 T 0 -2 0 -1 -1 -1 -1 -2 -1 -1 -2 -1 -1 -3 -1 2 5 -6 -3 0 0 -1 0 W -4 -3 -5 -6 -6 -6 -6 -4 -2 -2 -2 -4 -4 1 -4 -3 -6 11 2 -4 -5 -6 0 Y -3 -3 -2 -5 -1 -4 -3 -5 2 -2 -1 -3 -3 4 -5 -2 -3 2 7 -3 -3 -3 0 V 0 -3 -3 -4 0 -2 -3 -4 -3 3 1 -3 1 -1 -3 -2 0 -4 -3 4 -3 -2 0 B -1 -2 4 4 -4 0 1 -1 0 -4 -5 0 -3 -5 -2 0 0 -5 -3 -3 6 1 0 Z -1 0 0 1 -4 3 3 -2 0 -3 -3 1 -2 -4 -1 -1 -1 -6 -3 -2 1 5 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 ./fasta-36.3.8h/data/VTML_160.mat 0000644 0001054 0001054 00000004764 13525131122 013453 0 ustar wrp # # VTML_160 # # This matrix was produced from: vtml_160qij.mat using vtml_P.mat background frequencies # # VTML_160 substitution matrix, Units = bits/3.0 # Expected score = -0.493659 bits; Entropy = 0.617215 bits # Target fraction identity = 0.2884 # Lowest Score = -8, Highest Score= 16 # A R N D C Q E G H I L K M F P S T W Y V B Z X A 5 -2 -1 -1 1 -1 -1 0 -2 -2 -2 -1 -1 -3 0 1 1 -5 -4 0 -1 -1 0 R -2 8 -1 -3 -3 2 -2 -3 1 -4 -4 4 -2 -5 -2 -1 -2 -4 -3 -4 -2 0 0 N -1 -1 7 3 -3 0 0 0 1 -5 -5 0 -3 -5 -2 1 0 -6 -2 -4 5 0 0 D -1 -3 3 8 -6 0 3 -1 0 -6 -7 0 -5 -8 -2 0 -1 -7 -6 -5 5 1 0 C 1 -3 -3 -6 13 -5 -5 -3 -2 -1 -4 -5 -1 -4 -4 1 -1 -7 -1 1 -4 -5 0 Q -1 2 0 0 -5 6 3 -3 2 -4 -3 2 -1 -4 -1 0 -1 -7 -4 -3 0 4 0 E -1 -2 0 3 -5 3 6 -2 -1 -5 -4 1 -4 -6 -1 0 -1 -8 -4 -3 1 4 0 G 0 -3 0 -1 -3 -3 -2 8 -3 -7 -7 -3 -5 -6 -3 0 -3 -5 -6 -5 0 -2 0 H -2 1 1 0 -2 2 -1 -3 10 -4 -3 0 -4 0 -2 -1 -1 -2 3 -4 0 0 0 I -2 -4 -5 -6 -1 -4 -5 -7 -4 6 3 -4 2 0 -5 -4 -1 -2 -2 4 -5 -4 0 L -2 -4 -5 -7 -4 -3 -4 -7 -3 3 6 -4 4 2 -3 -4 -2 -2 -1 2 -6 -3 0 K -1 4 0 0 -5 2 1 -3 0 -4 -4 6 -2 -6 -1 -1 -1 -5 -4 -3 0 1 0 M -1 -2 -3 -5 -1 -1 -4 -5 -4 2 4 -2 8 1 -4 -3 -1 -4 -3 1 -4 -2 0 F -3 -5 -5 -8 -4 -4 -6 -6 0 0 2 -6 1 9 -5 -3 -3 3 6 -1 -6 -5 0 P 0 -2 -2 -2 -4 -1 -1 -3 -2 -5 -3 -1 -4 -5 10 0 -1 -5 -6 -3 -2 -1 0 S 1 -1 1 0 1 0 0 0 -1 -4 -4 -1 -3 -3 0 5 2 -4 -2 -2 0 0 0 T 1 -2 0 -1 -1 -1 -1 -3 -1 -1 -2 -1 -1 -3 -1 2 6 -7 -3 0 0 -1 0 W -5 -4 -6 -7 -7 -7 -8 -5 -2 -2 -2 -5 -4 3 -5 -4 -7 16 4 -5 -6 -7 0 Y -4 -3 -2 -6 -1 -4 -4 -6 3 -2 -1 -4 -3 6 -6 -2 -3 4 10 -3 -4 -4 0 V 0 -4 -4 -5 1 -3 -3 -5 -4 4 2 -3 1 -1 -3 -2 0 -5 -3 5 -4 -3 0 B -1 -2 5 5 -4 0 1 0 0 -5 -6 0 -4 -6 -2 0 0 -6 -4 -4 7 1 0 Z -1 0 0 1 -5 4 4 -2 0 -4 -3 1 -2 -5 -1 0 -1 -7 -4 -3 1 6 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 ./fasta-36.3.8h/data/VTML_20.mat 0000644 0001054 0001054 00000004762 13525131122 013364 0 ustar wrp # # VTML_20 # # This matrix was produced from: vtml_20qij.mat using vtml_P.mat background frequencies # # VTML_20 substitution matrix, Units = bits/2.0 # Expected score = -2.916179 bits; Entropy = 2.912514 bits # Target fraction identity = 0.8307 # Lowest Score = -16, Highest Score= 12 # A R N D C Q E G H I L K M F P S T W Y V B Z X A 7 -7 -6 -6 -3 -5 -5 -4 -7 -7 -7 -6 -5 -8 -4 -2 -3 -9 -8 -3 -6 -5 0 R -7 8 -5 -12 -7 -2 -10 -7 -3 -8 -8 0 -6 -10 -7 -6 -6 -8 -7 -9 -8 -6 0 N -6 -5 8 -1 -8 -4 -5 -5 -3 -9 -9 -3 -7 -10 -8 -2 -4 -10 -6 -9 3 -4 0 D -6 -12 -1 8 -14 -4 -1 -6 -4 -12 -15 -5 -9 -16 -6 -5 -6 -10 -14 -9 3 -2 0 C -3 -7 -8 -14 12 -13 -14 -7 -6 -5 -12 -13 -4 -13 -9 -3 -5 -15 -4 -3 -11 -13 0 Q -5 -2 -4 -4 -13 9 -1 -8 -2 -9 -6 -2 -4 -8 -5 -4 -5 -15 -12 -7 -4 4 0 E -5 -10 -5 -1 -14 -1 7 -6 -6 -10 -8 -2 -8 -14 -6 -5 -6 -16 -7 -7 -3 3 0 G -4 -7 -5 -6 -7 -8 -6 7 -7 -15 -11 -7 -10 -11 -8 -4 -8 -9 -10 -10 -5 -7 0 H -7 -3 -3 -4 -6 -2 -6 -7 10 -9 -7 -5 -12 -5 -6 -5 -5 -6 -1 -8 -3 -4 0 I -7 -8 -9 -12 -5 -9 -10 -15 -9 7 -2 -9 -2 -5 -10 -9 -5 -6 -8 1 -10 -9 0 L -7 -8 -9 -15 -12 -6 -8 -11 -7 -2 6 -8 0 -3 -7 -8 -7 -6 -6 -3 -12 -7 0 K -6 0 -3 -5 -13 -2 -2 -7 -5 -9 -8 7 -5 -14 -6 -5 -4 -9 -8 -8 -4 -2 0 M -5 -6 -7 -9 -4 -4 -8 -10 -12 -2 0 -5 10 -3 -10 -8 -4 -13 -11 -3 -8 -6 0 F -8 -10 -10 -16 -13 -8 -14 -11 -5 -5 -3 -14 -3 9 -9 -7 -8 -3 0 -6 -13 -11 0 P -4 -7 -8 -6 -9 -5 -6 -8 -6 -10 -7 -6 -10 -9 9 -4 -6 -9 -15 -7 -7 -5 0 S -2 -6 -2 -5 -3 -4 -5 -4 -5 -9 -8 -5 -8 -7 -4 7 -1 -8 -6 -8 -3 -4 0 T -3 -6 -4 -6 -5 -5 -6 -8 -5 -5 -7 -4 -4 -8 -6 -1 8 -15 -8 -4 -5 -5 0 W -9 -8 -10 -10 -15 -15 -16 -9 -6 -6 -6 -9 -13 -3 -9 -8 -15 12 -2 -13 -10 -15 0 Y -8 -7 -6 -14 -4 -12 -7 -10 -1 -8 -6 -8 -11 0 -15 -6 -8 -2 9 -8 -10 -9 0 V -3 -9 -9 -9 -3 -7 -7 -10 -8 1 -3 -8 -3 -6 -7 -8 -4 -13 -8 7 -9 -7 0 B -6 -8 3 3 -11 -4 -3 -5 -3 -10 -12 -4 -8 -13 -7 -3 -5 -10 -10 -9 8 -2 0 Z -5 -6 -4 -2 -13 4 3 -7 -4 -9 -7 -2 -6 -11 -5 -4 -5 -15 -9 -7 -2 8 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 ./fasta-36.3.8h/data/VTML_200.mat 0000644 0001054 0001054 00000004764 13525131122 013446 0 ustar wrp # # VTML_200 # # This matrix was produced from: vtml_200qij.mat using vtml_P.mat background frequencies # # VTML_200 substitution matrix, Units = bits/3.0 # Expected score = -0.358430 bits; Entropy = 0.412084 bits # Target fraction identity = 0.2295 # Lowest Score = -6, Highest Score= 15 # A R N D C Q E G H I L K M F P S T W Y V B Z X A 4 -2 -1 -1 1 -1 -1 0 -2 -1 -2 -1 -1 -3 0 1 1 -4 -3 0 -1 -1 0 R -2 7 0 -2 -3 2 -1 -2 1 -3 -3 4 -2 -4 -1 -1 -1 -3 -2 -3 -1 0 0 N -1 0 6 3 -2 1 1 0 1 -4 -4 1 -3 -4 -2 1 0 -5 -2 -3 4 1 0 D -1 -2 3 6 -4 1 3 -1 0 -5 -5 0 -4 -6 -1 0 -1 -6 -4 -4 4 2 0 C 1 -3 -2 -4 12 -3 -4 -2 -2 0 -3 -4 -1 -3 -3 1 0 -6 0 1 -3 -3 0 Q -1 2 1 1 -3 5 2 -2 2 -3 -2 2 -1 -3 -1 0 0 -6 -3 -2 1 3 0 E -1 -1 1 3 -4 2 5 -1 0 -4 -4 1 -3 -5 -1 0 -1 -6 -3 -3 2 3 0 G 0 -2 0 -1 -2 -2 -1 8 -2 -6 -5 -2 -4 -5 -2 0 -2 -5 -5 -4 0 -1 0 H -2 1 1 0 -2 2 0 -2 8 -3 -2 0 -3 0 -2 0 -1 -1 3 -3 0 1 0 I -1 -3 -4 -5 0 -3 -4 -6 -3 5 3 -3 2 0 -4 -3 -1 -2 -2 4 -4 -3 0 L -2 -3 -4 -5 -3 -2 -4 -5 -2 3 5 -3 3 2 -3 -3 -2 -1 -1 2 -4 -3 0 K -1 4 1 0 -4 2 1 -2 0 -3 -3 5 -2 -5 -1 0 0 -4 -3 -3 0 1 0 M -1 -2 -3 -4 -1 -1 -3 -4 -3 2 3 -2 6 1 -3 -2 -1 -3 -2 2 -3 -2 0 F -3 -4 -4 -6 -3 -3 -5 -5 0 0 2 -5 1 8 -4 -3 -3 3 5 -1 -5 -4 0 P 0 -1 -2 -1 -3 -1 -1 -2 -2 -4 -3 -1 -3 -4 9 0 -1 -4 -5 -3 -1 -1 0 S 1 -1 1 0 1 0 0 0 0 -3 -3 0 -2 -3 0 4 2 -4 -2 -2 0 0 0 T 1 -1 0 -1 0 0 -1 -2 -1 -1 -2 0 -1 -3 -1 2 4 -5 -3 0 0 0 0 W -4 -3 -5 -6 -6 -6 -6 -5 -1 -2 -1 -4 -3 3 -4 -4 -5 15 4 -4 -5 -6 0 Y -3 -2 -2 -4 0 -3 -3 -5 3 -2 -1 -3 -2 5 -5 -2 -3 4 9 -2 -3 -3 0 V 0 -3 -3 -4 1 -2 -3 -4 -3 4 2 -3 2 -1 -3 -2 0 -4 -2 4 -3 -2 0 B -1 -1 4 4 -3 1 2 0 0 -4 -4 0 -3 -5 -1 0 0 -5 -3 -3 6 2 0 Z -1 0 1 2 -3 3 3 -1 1 -3 -3 1 -2 -4 -1 0 0 -6 -3 -2 2 5 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 ./fasta-36.3.8h/data/VTML_40.mat 0000644 0001054 0001054 00000004762 13525131122 013366 0 ustar wrp # # VTML_40 # # This matrix was produced from: vtml_40qij.mat using vtml_P.mat background frequencies # # VTML_40 substitution matrix, Units = bits/2.0 # Expected score = -1.991667 bits; Entropy = 2.267456 bits # Target fraction identity = 0.6960 # Lowest Score = -12, Highest Score= 12 # A R N D C Q E G H I L K M F P S T W Y V B Z X A 6 -5 -4 -4 -1 -3 -3 -2 -5 -5 -5 -4 -4 -6 -3 0 -1 -7 -6 -1 -4 -3 0 R -5 8 -3 -8 -5 -1 -7 -5 -2 -7 -6 2 -4 -8 -5 -4 -4 -6 -5 -7 -5 -4 0 N -4 -3 8 0 -6 -2 -3 -3 -1 -7 -7 -2 -5 -8 -6 -1 -2 -8 -4 -7 4 -2 0 D -4 -8 0 8 -10 -3 1 -4 -3 -10 -11 -3 -7 -12 -4 -3 -4 -9 -10 -7 4 -1 0 C -1 -5 -6 -10 11 -9 -10 -5 -5 -3 -9 -9 -3 -9 -7 -2 -3 -12 -3 -2 -8 -9 0 Q -3 -1 -2 -3 -9 8 1 -6 0 -7 -4 0 -3 -6 -3 -2 -3 -11 -8 -5 -2 4 0 E -3 -7 -3 1 -10 1 7 -5 -4 -8 -7 -1 -6 -11 -4 -3 -4 -12 -5 -6 -1 4 0 G -2 -5 -3 -4 -5 -6 -5 7 -5 -12 -9 -5 -8 -9 -6 -3 -6 -7 -8 -8 -3 -5 0 H -5 -2 -1 -3 -5 0 -4 -5 10 -7 -5 -3 -8 -3 -4 -3 -4 -4 0 -6 -2 -2 0 I -5 -7 -7 -10 -3 -7 -8 -12 -7 7 0 -7 0 -3 -8 -7 -3 -4 -5 2 -8 -7 0 L -5 -6 -7 -11 -9 -4 -7 -9 -5 0 6 -6 1 -1 -5 -6 -5 -4 -4 -1 -9 -5 0 K -4 2 -2 -3 -9 0 -1 -5 -3 -7 -6 7 -4 -10 -4 -3 -3 -7 -6 -6 -2 0 0 M -4 -4 -5 -7 -3 -3 -6 -8 -8 0 1 -4 9 -1 -8 -6 -3 -9 -8 -2 -6 -4 0 F -6 -8 -8 -12 -9 -6 -11 -9 -3 -3 -1 -10 -1 8 -7 -5 -6 -1 2 -4 -10 -8 0 P -3 -5 -6 -4 -7 -3 -4 -6 -4 -8 -5 -4 -8 -7 8 -2 -4 -7 -11 -6 -5 -3 0 S 0 -4 -1 -3 -2 -2 -3 -3 -3 -7 -6 -3 -6 -5 -2 7 1 -6 -4 -6 -2 -2 0 T -1 -4 -2 -4 -3 -3 -4 -6 -4 -3 -5 -3 -3 -6 -4 1 7 -11 -6 -2 -3 -3 0 W -7 -6 -8 -9 -12 -11 -12 -7 -4 -4 -4 -7 -9 -1 -7 -6 -11 12 0 -10 -8 -11 0 Y -6 -5 -4 -10 -3 -8 -5 -8 0 -5 -4 -6 -8 2 -11 -4 -6 0 9 -6 -7 -6 0 V -1 -7 -7 -7 -2 -5 -6 -8 -6 2 -1 -6 -2 -4 -6 -6 -2 -10 -6 6 -7 -5 0 B -4 -5 4 4 -8 -2 -1 -3 -2 -8 -9 -2 -6 -10 -5 -2 -3 -8 -7 -7 8 0 0 Z -3 -4 -2 -1 -9 4 4 -5 -2 -7 -5 0 -4 -8 -3 -2 -3 -11 -6 -5 0 7 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 ./fasta-36.3.8h/data/VTML_80.mat 0000644 0001054 0001054 00000004761 13525131122 013371 0 ustar wrp # # VTML_80 # # This matrix was produced from: vtml_80qij.mat using vtml_P.mat background frequencies # # VTML_80 substitution matrix, Units = bits/2.0 # Expected score = -1.134601 bits; Entropy = 1.427882 bits # Target fraction identity = 0.5015 # Lowest Score = -9, Highest Score= 11 # A R N D C Q E G H I L K M F P S T W Y V B Z X A 5 -3 -2 -2 0 -2 -2 -1 -3 -3 -3 -2 -2 -4 -1 1 0 -5 -4 0 -2 -2 0 R -3 7 -2 -5 -4 1 -3 -3 0 -5 -4 3 -3 -6 -3 -2 -3 -4 -3 -5 -3 -1 0 N -2 -2 7 1 -4 -1 -1 -1 0 -5 -5 0 -4 -5 -4 1 -1 -6 -2 -5 4 -1 0 D -2 -5 1 7 -7 -1 2 -2 -1 -7 -8 -2 -5 -9 -2 -1 -2 -7 -7 -5 4 0 0 C 0 -4 -4 -7 10 -6 -7 -3 -3 -2 -5 -6 -1 -6 -4 0 -2 -8 -1 0 -5 -6 0 Q -2 1 -1 -1 -6 7 2 -4 1 -5 -3 1 -2 -4 -2 -1 -2 -8 -5 -3 -1 4 0 E -2 -3 -1 2 -7 2 6 -3 -2 -5 -5 0 -4 -7 -2 -1 -2 -8 -4 -4 0 4 0 G -1 -3 -1 -2 -3 -4 -3 7 -3 -8 -7 -3 -6 -6 -4 -1 -4 -5 -6 -5 -1 -3 0 H -3 0 0 -1 -3 1 -2 -3 9 -5 -3 -1 -5 -1 -3 -1 -2 -2 1 -4 0 0 0 I -3 -5 -5 -7 -2 -5 -5 -8 -5 6 1 -5 1 -1 -6 -5 -2 -3 -3 3 -6 -5 0 L -3 -4 -5 -8 -5 -3 -5 -7 -3 1 5 -4 2 0 -4 -4 -3 -2 -2 0 -6 -4 0 K -2 3 0 -2 -6 1 0 -3 -1 -5 -4 6 -2 -7 -2 -2 -1 -5 -4 -4 -1 0 0 M -2 -3 -4 -5 -1 -2 -4 -6 -5 1 2 -2 8 0 -5 -4 -1 -6 -4 0 -4 -3 0 F -4 -6 -5 -9 -6 -4 -7 -6 -1 -1 0 -7 0 8 -5 -3 -4 1 3 -2 -7 -5 0 P -1 -3 -4 -2 -4 -2 -2 -4 -3 -6 -4 -2 -5 -5 8 -1 -2 -5 -7 -4 -3 -2 0 S 1 -2 1 -1 0 -1 -1 -1 -1 -5 -4 -2 -4 -3 -1 5 1 -4 -3 -3 0 -1 0 T 0 -3 -1 -2 -2 -2 -2 -4 -2 -2 -3 -1 -1 -4 -2 1 6 -7 -4 -1 -1 -2 0 W -5 -4 -6 -7 -8 -8 -8 -5 -2 -3 -2 -5 -6 1 -5 -4 -7 11 1 -6 -6 -8 0 Y -4 -3 -2 -7 -1 -5 -4 -6 1 -3 -2 -4 -4 3 -7 -3 -4 1 8 -4 -4 -4 0 V 0 -5 -5 -5 0 -3 -4 -5 -4 3 0 -4 0 -2 -4 -3 -1 -6 -4 5 -5 -3 0 B -2 -3 4 4 -5 -1 0 -1 0 -6 -6 -1 -4 -7 -3 0 -1 -6 -4 -5 7 0 0 Z -2 -1 -1 0 -6 4 4 -3 0 -5 -4 0 -3 -5 -2 -1 -2 -8 -4 -3 0 6 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 ./fasta-36.3.8h/data/blosum45.mat 0000644 0001054 0001054 00000003602 13525131122 013743 0 ustar wrp # Matrix made by matblas from blosum45.iij # BLOSUM Clustered Scoring Matrix in 1/3 Bit Units # Blocks Database = /data/blocks_5.0/blocks.dat # Cluster Percentage: >= 45 # Entropy = 0.3795, Expected = -0.2789 A R N D C Q E G H I L K M F P S T W Y V B Z X A 5 -2 -1 -2 -1 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -2 -2 0 -1 -1 0 R -2 7 0 -1 -3 1 0 -2 0 -3 -2 3 -1 -2 -2 -1 -1 -2 -1 -2 -1 0 -1 N -1 0 6 2 -2 0 0 0 1 -2 -3 0 -2 -2 -2 1 0 -4 -2 -3 4 0 -1 D -2 -1 2 7 -3 0 2 -1 0 -4 -3 0 -3 -4 -1 0 -1 -4 -2 -3 5 1 -1 C -1 -3 -2 -3 12 -3 -3 -3 -3 -3 -2 -3 -2 -2 -4 -1 -1 -5 -3 -1 -2 -3 -2 Q -1 1 0 0 -3 6 2 -2 1 -2 -2 1 0 -4 -1 0 -1 -2 -1 -3 0 4 -1 E -1 0 0 2 -3 2 6 -2 0 -3 -2 1 -2 -3 0 0 -1 -3 -2 -3 1 4 -1 G 0 -2 0 -1 -3 -2 -2 7 -2 -4 -3 -2 -2 -3 -2 0 -2 -2 -3 -3 -1 -2 -1 H -2 0 1 0 -3 1 0 -2 10 -3 -2 -1 0 -2 -2 -1 -2 -3 2 -3 0 0 -1 I -1 -3 -2 -4 -3 -2 -3 -4 -3 5 2 -3 2 0 -2 -2 -1 -2 0 3 -3 -3 -1 L -1 -2 -3 -3 -2 -2 -2 -3 -2 2 5 -3 2 1 -3 -3 -1 -2 0 1 -3 -2 -1 K -1 3 0 0 -3 1 1 -2 -1 -3 -3 5 -1 -3 -1 -1 -1 -2 -1 -2 0 1 -1 M -1 -1 -2 -3 -2 0 -2 -2 0 2 2 -1 6 0 -2 -2 -1 -2 0 1 -2 -1 -1 F -2 -2 -2 -4 -2 -4 -3 -3 -2 0 1 -3 0 8 -3 -2 -1 1 3 0 -3 -3 -1 P -1 -2 -2 -1 -4 -1 0 -2 -2 -2 -3 -1 -2 -3 9 -1 -1 -3 -3 -3 -2 -1 -1 S 1 -1 1 0 -1 0 0 0 -1 -2 -3 -1 -2 -2 -1 4 2 -4 -2 -1 0 0 0 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -1 -1 2 5 -3 -1 0 0 -1 0 W -2 -2 -4 -4 -5 -2 -3 -2 -3 -2 -2 -2 -2 1 -3 -4 -3 15 3 -3 -4 -2 -2 Y -2 -1 -2 -2 -3 -1 -2 -3 2 0 0 -1 0 3 -3 -2 -1 3 8 -1 -2 -2 -1 V 0 -2 -3 -3 -1 -3 -3 -3 -3 3 1 -2 1 0 -3 -1 0 -3 -1 5 -3 -3 -1 B -1 -1 4 5 -2 0 1 -1 0 -3 -3 0 -2 -3 -2 0 0 -4 -2 -3 4 2 -1 Z -1 0 0 1 -3 4 4 -2 0 -3 -2 1 -1 -3 -1 0 -1 -2 -2 -3 2 4 -1 X 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 0 -2 -1 -1 -1 -1 -1 ./fasta-36.3.8h/data/blosum50.mat 0000644 0001054 0001054 00000003601 13525131122 013736 0 ustar wrp # Matrix made by matblas from blosum50.iij # BLOSUM Clustered Scoring Matrix in 1/3 Bit Units # Blocks Database = /data/blocks_5.0/blocks.dat # Cluster Percentage: >= 50 # Entropy = 0.4808, Expected = -0.3573 A R N D C Q E G H I L K M F P S T W Y V B Z X A 5 -2 -1 -2 -1 -1 -1 0 -2 -1 -2 -1 -1 -3 -1 1 0 -3 -2 0 -2 -1 -1 R -2 7 -1 -2 -4 1 0 -3 0 -4 -3 3 -2 -3 -3 -1 -1 -3 -1 -3 -1 0 -1 N -1 -1 7 2 -2 0 0 0 1 -3 -4 0 -2 -4 -2 1 0 -4 -2 -3 4 0 -1 D -2 -2 2 8 -4 0 2 -1 -1 -4 -4 -1 -4 -5 -1 0 -1 -5 -3 -4 5 1 -1 C -1 -4 -2 -4 13 -3 -3 -3 -3 -2 -2 -3 -2 -2 -4 -1 -1 -5 -3 -1 -3 -3 -2 Q -1 1 0 0 -3 7 2 -2 1 -3 -2 2 0 -4 -1 0 -1 -1 -1 -3 0 4 -1 E -1 0 0 2 -3 2 6 -3 0 -4 -3 1 -2 -3 -1 -1 -1 -3 -2 -3 1 5 -1 G 0 -3 0 -1 -3 -2 -3 8 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4 -1 -2 -2 H -2 0 1 -1 -3 1 0 -2 10 -4 -3 0 -1 -1 -2 -1 -2 -3 2 -4 0 0 -1 I -1 -4 -3 -4 -2 -3 -4 -4 -4 5 2 -3 2 0 -3 -3 -1 -3 -1 4 -4 -3 -1 L -2 -3 -4 -4 -2 -2 -3 -4 -3 2 5 -3 3 1 -4 -3 -1 -2 -1 1 -4 -3 -1 K -1 3 0 -1 -3 2 1 -2 0 -3 -3 6 -2 -4 -1 0 -1 -3 -2 -3 0 1 -1 M -1 -2 -2 -4 -2 0 -2 -3 -1 2 3 -2 7 0 -3 -2 -1 -1 0 1 -3 -1 -1 F -3 -3 -4 -5 -2 -4 -3 -4 -1 0 1 -4 0 8 -4 -3 -2 1 4 -1 -4 -4 -2 P -1 -3 -2 -1 -4 -1 -1 -2 -2 -3 -4 -1 -3 -4 10 -1 -1 -4 -3 -3 -2 -1 -2 S 1 -1 1 0 -1 0 -1 0 -1 -3 -3 0 -2 -3 -1 5 2 -4 -2 -2 0 0 -1 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 2 5 -3 -2 0 0 -1 0 W -3 -3 -4 -5 -5 -1 -3 -3 -3 -3 -2 -3 -1 1 -4 -4 -3 15 2 -3 -5 -2 -3 Y -2 -1 -2 -3 -3 -1 -2 -3 2 -1 -1 -2 0 4 -3 -2 -2 2 8 -1 -3 -2 -1 V 0 -3 -3 -4 -1 -3 -3 -4 -4 4 1 -3 1 -1 -3 -2 0 -3 -1 5 -4 -3 -1 B -2 -1 4 5 -3 0 1 -1 0 -4 -4 0 -3 -4 -2 0 0 -5 -3 -4 5 2 -1 Z -1 0 0 1 -3 4 5 -2 0 -3 -3 1 -1 -4 -1 0 -1 -2 -2 -3 2 5 -1 X -1 -1 -1 -1 -2 -1 -1 -2 -1 -1 -1 -1 -1 -2 -2 -1 0 -3 -1 -1 -1 -1 -1 ./fasta-36.3.8h/data/blosum62.mat 0000644 0001054 0001054 00000003602 13525131122 013742 0 ustar wrp # Matrix made by matblas from blosum62.iij # BLOSUM Clustered Scoring Matrix in 1/2 Bit Units # Blocks Database = /data/blocks_5.0/blocks.dat # Cluster Percentage: >= 62 # Entropy = 0.6979, Expected = -0.5209 A R N D C Q E G H I L K M F P S T W Y V B Z X A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 -2 -1 0 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 -1 0 -1 N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 3 0 -1 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 4 1 -1 C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2 Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 0 3 -1 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 1 4 -1 G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 -1 -2 -1 H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 0 0 -1 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 -3 -3 -1 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 -4 -3 -1 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 0 1 -1 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 -3 -1 -1 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 -3 -3 -1 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 -2 -1 -2 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 0 0 0 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 -1 -1 0 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 -4 -3 -2 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 -3 -2 -1 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 -3 -2 -1 B -2 -1 3 4 -3 0 1 -1 0 -3 -4 0 -3 -3 -2 0 -1 -4 -3 -3 4 1 -1 Z -1 0 0 1 -3 3 4 -2 0 -3 -3 1 -1 -3 -1 0 -1 -3 -2 -2 1 4 -1 X 0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 0 0 -2 -1 -1 -1 -1 -1 ./fasta-36.3.8h/data/blosum80.mat 0000644 0001054 0001054 00000003604 13525131122 013744 0 ustar wrp # Matrix made by matblas from blosum80_3.iij # BLOSUM Clustered Scoring Matrix in 1/3 Bit Units # Blocks Database = /data/blocks_5.0/blocks.dat # Cluster Percentage: >= 80 # Entropy = 0.9868, Expected = -0.7442 A R N D C Q E G H I L K M F P S T W Y V B Z X A 7 -3 -3 -3 -1 -2 -2 0 -3 -3 -3 -1 -2 -4 -1 2 0 -5 -4 -1 -3 -2 -1 R -3 9 -1 -3 -6 1 -1 -4 0 -5 -4 3 -3 -5 -3 -2 -2 -5 -4 -4 -2 0 -2 N -3 -1 9 2 -5 0 -1 -1 1 -6 -6 0 -4 -6 -4 1 0 -7 -4 -5 5 -1 -2 D -3 -3 2 10 -7 -1 2 -3 -2 -7 -7 -2 -6 -6 -3 -1 -2 -8 -6 -6 6 1 -3 C -1 -6 -5 -7 13 -5 -7 -6 -7 -2 -3 -6 -3 -4 -6 -2 -2 -5 -5 -2 -6 -7 -4 Q -2 1 0 -1 -5 9 3 -4 1 -5 -4 2 -1 -5 -3 -1 -1 -4 -3 -4 -1 5 -2 E -2 -1 -1 2 -7 3 8 -4 0 -6 -6 1 -4 -6 -2 -1 -2 -6 -5 -4 1 6 -2 G 0 -4 -1 -3 -6 -4 -4 9 -4 -7 -7 -3 -5 -6 -5 -1 -3 -6 -6 -6 -2 -4 -3 H -3 0 1 -2 -7 1 0 -4 12 -6 -5 -1 -4 -2 -4 -2 -3 -4 3 -5 -1 0 -2 I -3 -5 -6 -7 -2 -5 -6 -7 -6 7 2 -5 2 -1 -5 -4 -2 -5 -3 4 -6 -6 -2 L -3 -4 -6 -7 -3 -4 -6 -7 -5 2 6 -4 3 0 -5 -4 -3 -4 -2 1 -7 -5 -2 K -1 3 0 -2 -6 2 1 -3 -1 -5 -4 8 -3 -5 -2 -1 -1 -6 -4 -4 -1 1 -2 M -2 -3 -4 -6 -3 -1 -4 -5 -4 2 3 -3 9 0 -4 -3 -1 -3 -3 1 -5 -3 -2 F -4 -5 -6 -6 -4 -5 -6 -6 -2 -1 0 -5 0 10 -6 -4 -4 0 4 -2 -6 -6 -3 P -1 -3 -4 -3 -6 -3 -2 -5 -4 -5 -5 -2 -4 -6 12 -2 -3 -7 -6 -4 -4 -2 -3 S 2 -2 1 -1 -2 -1 -1 -1 -2 -4 -4 -1 -3 -4 -2 7 2 -6 -3 -3 0 -1 -1 T 0 -2 0 -2 -2 -1 -2 -3 -3 -2 -3 -1 -1 -4 -3 2 8 -5 -3 0 -1 -2 -1 W -5 -5 -7 -8 -5 -4 -6 -6 -4 -5 -4 -6 -3 0 -7 -6 -5 16 3 -5 -8 -5 -5 Y -4 -4 -4 -6 -5 -3 -5 -6 3 -3 -2 -4 -3 4 -6 -3 -3 3 11 -3 -5 -4 -3 V -1 -4 -5 -6 -2 -4 -4 -6 -5 4 1 -4 1 -2 -4 -3 0 -5 -3 7 -6 -4 -2 B -3 -2 5 6 -6 -1 1 -2 -1 -6 -7 -1 -5 -6 -4 0 -1 -8 -5 -6 6 0 -3 Z -2 0 -1 1 -7 5 6 -4 0 -6 -5 1 -3 -6 -2 -1 -2 -5 -4 -4 0 6 -1 X -1 -2 -2 -3 -4 -2 -2 -3 -2 -2 -2 -2 -2 -3 -3 -1 -1 -5 -3 -2 -3 -1 -2 ./fasta-36.3.8h/data/dna.mat 0000644 0001054 0001054 00000001720 13525131122 013032 0 ustar wrp # Sample dna matrix A C G T U R Y M W S K D H V B N X A 5 -4 -4 -4 -4 2 -1 2 2 -1 -1 1 1 1 -2 -1 -1 C -4 5 -4 -4 -4 -1 2 2 -1 2 -1 -2 1 1 1 -1 -1 G -4 -4 5 -4 -4 2 -1 -1 -1 2 2 1 -2 1 1 -1 -1 T -4 -4 -4 5 5 -1 2 -1 2 -1 2 1 1 -2 1 -1 -1 U -4 -4 -4 5 5 -1 2 -1 2 -1 2 1 1 -2 1 -1 -1 R 2 -1 2 -1 -1 2 -2 -1 1 1 1 1 -1 1 -1 -1 -1 Y -1 2 -1 2 2 -2 2 -1 1 1 1 -1 1 -1 1 -1 -1 M 2 2 -1 -1 -1 -1 -1 2 1 1 -1 -1 1 1 -1 -1 -1 W 2 -1 -1 2 2 1 1 1 2 -1 1 1 1 -1 -1 -1 -1 S -1 2 2 -1 -1 1 1 1 -1 2 1 -1 -1 1 1 -1 -1 K -1 -1 2 2 2 1 1 -1 1 1 2 1 -1 -1 1 -1 -1 D 1 -2 1 1 1 1 -1 -1 1 -1 1 1 -1 -1 -1 -1 -1 H 1 1 -2 1 1 -1 1 1 1 -1 -1 -1 1 -1 -1 -1 -1 V 1 1 1 -2 -2 1 -1 1 -1 1 -1 -1 -1 1 -1 -1 -1 B -2 1 1 1 1 -1 1 -1 -1 1 1 -1 -1 -1 1 -1 -1 N -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ./fasta-36.3.8h/data/idn_aa.mat 0000644 0001054 0001054 00000004320 13525131122 013502 0 ustar wrp A R N D C Q E G H I L K M F P S T W Y V B Z X A 4 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 R -10 4 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 N -10 -10 4 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 D -10 -10 -10 4 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 C -10 -10 -10 -10 4 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 Q -10 -10 -10 -10 -10 4 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 E -10 -10 -10 -10 -10 -10 4 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 G -10 -10 -10 -10 -10 -10 -10 4 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 H -10 -10 -10 -10 -10 -10 -10 -10 4 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 I -10 -10 -10 -10 -10 -10 -10 -10 -10 4 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 L -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 4 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 K -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 4 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 M -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 4 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 F -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 4 -10 -10 -10 -10 -10 -10 -10 -10 -10 P -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 4 -10 -10 -10 -10 -10 -10 -10 -10 S -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 4 -10 -10 -10 -10 -10 -10 -10 T -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 4 -10 -10 -10 -10 -10 -10 W -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 4 -10 -10 -10 -10 -10 Y -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 4 -10 -10 -10 -10 V -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 4 -10 -10 -10 B -10 -10 2 2 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 4 -10 -10 Z -10 -10 -10 -10 -10 2 2 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 4 -10 X -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 0 ./fasta-36.3.8h/data/md_10.mat 0000644 0001054 0001054 00000004317 13525131122 013175 0 ustar wrp A R N D C Q E G H I L K M F P S T W Y V B Z X A 11 -13 -12 -11 -13 -13 -10 -8 -15 -13 -15 -14 -13 -18 -7 -5 -4 -20 -19 -6 -12 -11 -1 R -12 12 -13 -18 -10 -5 -15 -9 -5 -17 -14 -2 -14 -22 -11 -10 -12 -9 -17 -17 -15 -10 -1 N -12 -13 13 -3 -14 -11 -12 -11 -5 -13 -19 -6 -15 -20 -17 -4 -7 -21 -12 -17 5 -11 -1 D -11 -18 -3 12 -20 -13 -2 -9 -10 -19 -21 -15 -18 -23 -18 -12 -14 -24 -13 -15 5 -7 -1 C -13 -10 -14 -20 17 -19 -22 -12 -12 -18 -16 -21 -15 -11 -18 -7 -14 -9 -7 -12 -17 -21 -1 Q -13 -5 -11 -13 -19 13 -5 -15 -3 -19 -12 -6 -14 -22 -8 -13 -13 -17 -16 -17 -12 4 -1 E -10 -15 -12 -2 -22 -5 12 -9 -15 -19 -20 -8 -17 -23 -17 -15 -15 -20 -21 -14 -7 3 -1 G -8 -9 -11 -9 -12 -16 -9 11 -16 -21 -21 -15 -18 -22 -16 -7 -14 -13 -21 -13 -10 -13 -1 H -16 -5 -5 -10 -12 -3 -15 -16 16 -17 -13 -13 -15 -14 -10 -11 -13 -20 -3 -19 -7 -9 -1 I -13 -17 -14 -19 -17 -20 -19 -21 -18 12 -7 -17 -4 -11 -19 -14 -7 -20 -15 -1 -16 -19 -1 L -15 -14 -19 -21 -16 -12 -20 -21 -13 -7 10 -18 -4 -6 -10 -13 -15 -13 -16 -8 -20 -16 -1 K -14 -2 -6 -15 -21 -6 -8 -15 -13 -17 -18 12 -12 -24 -17 -13 -10 -19 -20 -18 -11 -7 -1 M -13 -14 -15 -18 -15 -14 -18 -19 -15 -4 -4 -12 16 -14 -17 -15 -7 -16 -18 -5 -16 -16 -1 F -18 -22 -19 -22 -11 -22 -23 -22 -14 -11 -6 -23 -14 14 -17 -11 -18 -13 -3 -12 -21 -22 -1 P -7 -12 -17 -18 -18 -8 -17 -16 -10 -19 -10 -16 -17 -17 13 -6 -9 -22 -20 -16 -17 -13 -1 S -5 -10 -4 -12 -7 -13 -15 -7 -11 -14 -13 -13 -15 -11 -6 11 -4 -15 -12 -14 -8 -14 -1 T -4 -12 -7 -14 -14 -13 -15 -14 -13 -7 -16 -10 -7 -19 -9 -4 12 -19 -17 -10 -10 -14 -1 W -21 -9 -21 -21 -10 -17 -21 -13 -21 -21 -13 -21 -17 -13 -21 -15 -18 18 -12 -16 -21 -19 -1 Y -20 -17 -12 -13 -7 -16 -21 -20 -3 -15 -16 -20 -17 -3 -20 -12 -17 -12 15 -18 -13 -19 -1 V -6 -17 -17 -15 -12 -17 -14 -13 -19 -1 -8 -18 -5 -12 -16 -14 -10 -16 -18 11 -16 -15 -1 B -12 -15 5 5 -17 -12 -7 -10 -7 -16 -20 -11 -17 -21 -17 -8 -10 -22 -13 -16 13 -9 -1 Z -16 -18 -17 -8 -32 1 9 -17 -17 -29 -26 -11 -24 -34 -21 -21 -21 -29 -29 -22 -9 13 -1 X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ./fasta-36.3.8h/data/md_20.mat 0000644 0001054 0001054 00000004320 13525131122 013170 0 ustar wrp A R N D C Q E G H I L K M F P S T W Y V B Z X A 10 -10 -9 -8 -10 -10 -7 -5 -12 -10 -12 -11 -9 -15 -5 -2 -1 -17 -16 -3 -9 -8 -1 R -10 12 -10 -14 -7 -3 -11 -6 -3 -14 -12 0 -11 -18 -9 -7 -9 -6 -14 -14 -12 -7 -1 N -9 -10 13 -1 -11 -8 -9 -8 -2 -11 -15 -4 -12 -16 -13 -1 -4 -18 -9 -14 6 -8 -1 D -8 -14 -1 12 -16 -9 1 -6 -7 -16 -18 -11 -15 -20 -15 -9 -11 -20 -11 -12 6 -4 -1 C -10 -7 -11 -16 17 -16 -19 -9 -9 -14 -13 -17 -12 -8 -14 -4 -11 -7 -4 -10 -14 -17 -1 Q -10 -3 -8 -9 -16 13 -3 -12 0 -16 -9 -3 -11 -18 -5 -10 -10 -14 -12 -14 -9 5 -1 E -7 -11 -9 1 -19 -3 11 -7 -12 -16 -17 -5 -14 -20 -14 -12 -12 -17 -18 -11 -4 4 -1 G -5 -6 -8 -6 -9 -12 -7 11 -13 -17 -18 -12 -15 -19 -12 -5 -11 -10 -17 -11 -7 -9 -1 H -12 -3 -2 -7 -9 0 -12 -13 15 -14 -10 -9 -12 -11 -7 -8 -10 -16 0 -15 -4 -6 -1 I -10 -14 -11 -16 -14 -16 -16 -17 -14 12 -4 -14 -1 -8 -15 -11 -4 -16 -12 2 -13 -16 -1 L -12 -11 -15 -18 -13 -9 -17 -18 -10 -4 10 -15 -2 -4 -7 -10 -12 -10 -13 -5 -17 -13 -1 K -11 0 -4 -12 -17 -3 -5 -12 -9 -14 -15 12 -9 -21 -13 -10 -7 -16 -17 -15 -8 -4 -1 M -9 -11 -12 -15 -12 -11 -15 -16 -12 -1 -2 -9 15 -10 -14 -12 -4 -13 -14 -3 -13 -13 -1 F -15 -19 -16 -19 -8 -18 -20 -19 -11 -8 -4 -19 -10 13 -14 -8 -15 -10 0 -9 -17 -19 -1 P -5 -9 -13 -15 -14 -5 -14 -12 -7 -15 -7 -13 -14 -14 12 -3 -7 -18 -16 -13 -14 -10 -1 S -2 -8 -1 -9 -4 -10 -12 -5 -8 -11 -10 -10 -12 -8 -3 10 -1 -12 -9 -11 -5 -11 -1 T -1 -9 -4 -11 -10 -10 -12 -11 -10 -4 -12 -7 -4 -15 -7 -1 11 -16 -14 -7 -7 -11 -1 W -17 -6 -18 -18 -7 -14 -18 -10 -17 -17 -10 -17 -14 -10 -18 -12 -15 18 -9 -13 -18 -16 -1 Y -16 -14 -9 -11 -4 -12 -18 -17 0 -12 -12 -17 -14 0 -16 -9 -13 -9 14 -15 -10 -15 -1 V -3 -14 -14 -12 -9 -14 -11 -11 -15 2 -5 -15 -2 -9 -13 -11 -7 -13 -14 11 -13 -12 -1 B -9 -12 6 6 -14 -9 -4 -7 -4 -13 -17 -8 -13 -18 -14 -5 -7 -19 -10 -13 12 -6 -1 Z -12 -13 -13 -4 -27 4 10 -13 -12 -24 -21 -6 -20 -29 -17 -17 -17 -24 -24 -18 -6 12 -1 X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ./fasta-36.3.8h/data/md_40.mat 0000644 0001054 0001054 00000004317 13525131122 013200 0 ustar wrp A R N D C Q E G H I L K M F P S T W Y V B Z X A 9 -7 -6 -6 -7 -7 -5 -3 -10 -6 -9 -8 -7 -11 -2 0 1 -13 -12 -1 -6 -6 -1 R -7 11 -6 -10 -5 0 -8 -4 0 -10 -9 3 -8 -14 -6 -5 -6 -4 -10 -11 -8 -4 -1 N -6 -6 12 2 -8 -5 -5 -5 0 -8 -12 -1 -9 -13 -9 1 -2 -16 -6 -10 7 -5 -1 D -6 -10 2 11 -13 -6 3 -4 -5 -12 -15 -8 -11 -16 -11 -6 -7 -15 -8 -9 6 -1 -1 C -6 -5 -8 -13 16 -12 -15 -7 -6 -11 -11 -13 -9 -6 -11 -2 -7 -4 -2 -7 -11 -13 -1 Q -7 0 -5 -6 -12 12 0 -9 2 -13 -6 0 -8 -14 -3 -7 -7 -11 -9 -11 -6 6 -1 E -5 -8 -5 3 -15 0 10 -4 -8 -12 -13 -3 -11 -16 -10 -8 -8 -13 -14 -8 -1 5 -1 G -3 -4 -5 -4 -7 -9 -4 10 -10 -13 -14 -9 -12 -15 -9 -2 -8 -7 -15 -8 -5 -7 -1 H -10 0 0 -5 -6 2 -8 -10 14 -11 -7 -6 -9 -7 -4 -6 -7 -12 2 -12 -2 -3 -1 I -6 -10 -8 -12 -11 -13 -12 -13 -11 11 -1 -11 1 -6 -11 -8 -2 -12 -9 4 -10 -12 -1 L -9 -9 -12 -14 -11 -6 -13 -14 -7 -1 9 -12 1 -1 -5 -7 -9 -7 -9 -2 -13 -10 -1 K -8 3 -1 -8 -13 0 -3 -9 -6 -11 -12 11 -7 -18 -10 -7 -5 -12 -13 -12 -5 -2 -1 M -7 -8 -9 -11 -8 -8 -11 -12 -9 1 1 -7 14 -7 -10 -8 -2 -11 -11 0 -10 -10 -1 F -11 -14 -12 -16 -6 -14 -16 -15 -7 -6 -1 -17 -7 13 -11 -5 -11 -7 2 -6 -14 -15 -1 P -2 -6 -9 -12 -11 -3 -10 -9 -4 -11 -5 -10 -10 -11 12 -1 -4 -14 -12 -9 -11 -7 -1 S 0 -5 1 -6 -2 -7 -8 -2 -6 -8 -7 -7 -8 -5 -1 9 1 -10 -7 -7 -3 -8 -1 T 1 -6 -2 -7 -7 -7 -8 -8 -7 -2 -9 -5 -2 -11 -4 1 10 -14 -10 -4 -5 -8 -1 W -14 -4 -17 -15 -4 -12 -13 -7 -11 -12 -7 -13 -11 -7 -14 -10 -14 18 -6 -11 -16 -12 -1 Y -12 -9 -6 -8 -2 -9 -14 -14 2 -9 -9 -13 -11 2 -12 -7 -11 -6 14 -11 -7 -11 -1 V -1 -11 -10 -9 -7 -11 -8 -8 -12 4 -2 -12 0 -6 -10 -7 -4 -10 -11 10 -10 -9 -1 B -6 -8 7 6 -11 -6 -1 -5 -2 -10 -13 -5 -10 -14 -10 -3 -5 -16 -7 -10 11 -3 -1 Z -8 -8 -8 0 -21 6 10 -9 -7 -18 -16 -3 -15 -23 -12 -12 -12 -19 -18 -14 -3 11 -1 X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ./fasta-36.3.8h/data/pam120.mat 0000644 0001054 0001054 00000003602 13525131122 013271 0 ustar wrp # # This matrix was produced by "pam" Version 1.0.6 [28-Jul-93] # # PAM 120 substitution matrix, scale = ln(2)/2 = 0.346574 # # Expected score = -1.64, Entropy = 0.979 bits # # Lowest score = -8, Highest score = 12 # A R N D C Q E G H I L K M F P S T W Y V B Z X A 3 -3 -1 0 -3 -1 0 1 -3 -1 -3 -2 -2 -4 1 1 1 -7 -4 0 0 -1 -1 R -3 6 -1 -3 -4 1 -3 -4 1 -2 -4 2 -1 -5 -1 -1 -2 1 -5 -3 -2 -1 -2 N -1 -1 4 2 -5 0 1 0 2 -2 -4 1 -3 -4 -2 1 0 -4 -2 -3 3 0 -1 D 0 -3 2 5 -7 1 3 0 0 -3 -5 -1 -4 -7 -3 0 -1 -8 -5 -3 4 3 -2 C -3 -4 -5 -7 9 -7 -7 -4 -4 -3 -7 -7 -6 -6 -4 0 -3 -8 -1 -3 -6 -7 -4 Q -1 1 0 1 -7 6 2 -3 3 -3 -2 0 -1 -6 0 -2 -2 -6 -5 -3 0 4 -1 E 0 -3 1 3 -7 2 5 -1 -1 -3 -4 -1 -3 -7 -2 -1 -2 -8 -5 -3 3 4 -1 G 1 -4 0 0 -4 -3 -1 5 -4 -4 -5 -3 -4 -5 -2 1 -1 -8 -6 -2 0 -2 -2 H -3 1 2 0 -4 3 -1 -4 7 -4 -3 -2 -4 -3 -1 -2 -3 -3 -1 -3 1 1 -2 I -1 -2 -2 -3 -3 -3 -3 -4 -4 6 1 -3 1 0 -3 -2 0 -6 -2 3 -3 -3 -1 L -3 -4 -4 -5 -7 -2 -4 -5 -3 1 5 -4 3 0 -3 -4 -3 -3 -2 1 -4 -3 -2 K -2 2 1 -1 -7 0 -1 -3 -2 -3 -4 5 0 -7 -2 -1 -1 -5 -5 -4 0 -1 -2 M -2 -1 -3 -4 -6 -1 -3 -4 -4 1 3 0 8 -1 -3 -2 -1 -6 -4 1 -4 -2 -2 F -4 -5 -4 -7 -6 -6 -7 -5 -3 0 0 -7 -1 8 -5 -3 -4 -1 4 -3 -5 -6 -3 P 1 -1 -2 -3 -4 0 -2 -2 -1 -3 -3 -2 -3 -5 6 1 -1 -7 -6 -2 -2 -1 -2 S 1 -1 1 0 0 -2 -1 1 -2 -2 -4 -1 -2 -3 1 3 2 -2 -3 -2 0 -1 -1 T 1 -2 0 -1 -3 -2 -2 -1 -3 0 -3 -1 -1 -4 -1 2 4 -6 -3 0 0 -2 -1 W -7 1 -4 -8 -8 -6 -8 -8 -3 -6 -3 -5 -6 -1 -7 -2 -6 12 -2 -8 -6 -7 -5 Y -4 -5 -2 -5 -1 -5 -5 -6 -1 -2 -2 -5 -4 4 -6 -3 -3 -2 8 -3 -3 -5 -3 V 0 -3 -3 -3 -3 -3 -3 -2 -3 3 1 -4 1 -3 -2 -2 0 -8 -3 5 -3 -3 -1 B 0 -2 3 4 -6 0 3 0 1 -3 -4 0 -4 -5 -2 0 0 -6 -3 -3 4 2 -1 Z -1 -1 0 3 -7 4 4 -2 1 -3 -3 -1 -2 -6 -1 -1 -2 -7 -5 -3 2 4 -1 X -1 -2 -1 -2 -4 -1 -1 -2 -2 -1 -2 -2 -2 -3 -2 -1 -1 -5 -3 -1 -1 -1 -2 ./fasta-36.3.8h/data/pam250.mat 0000644 0001054 0001054 00000003603 13525131122 013276 0 ustar wrp # # This matrix was produced by "pam" Version 1.0.6 [28-Jul-93] # # PAM 250 substitution matrix, scale = ln(2)/3 = 0.231049 # # Expected score = -0.844, Entropy = 0.354 bits # # Lowest score = -8, Highest score = 17 # A R N D C Q E G H I L K M F P S T W Y V B Z X A 2 -2 0 0 -2 0 0 1 -1 -1 -2 -1 -1 -3 1 1 1 -6 -3 0 0 0 0 R -2 6 0 -1 -4 1 -1 -3 2 -2 -3 3 0 -4 0 0 -1 2 -4 -2 -1 0 -1 N 0 0 2 2 -4 1 1 0 2 -2 -3 1 -2 -3 0 1 0 -4 -2 -2 2 1 0 D 0 -1 2 4 -5 2 3 1 1 -2 -4 0 -3 -6 -1 0 0 -7 -4 -2 3 3 -1 C -2 -4 -4 -5 12 -5 -5 -3 -3 -2 -6 -5 -5 -4 -3 0 -2 -8 0 -2 -4 -5 -3 Q 0 1 1 2 -5 4 2 -1 3 -2 -2 1 -1 -5 0 -1 -1 -5 -4 -2 1 3 -1 E 0 -1 1 3 -5 2 4 0 1 -2 -3 0 -2 -5 -1 0 0 -7 -4 -2 3 3 -1 G 1 -3 0 1 -3 -1 0 5 -2 -3 -4 -2 -3 -5 0 1 0 -7 -5 -1 0 0 -1 H -1 2 2 1 -3 3 1 -2 6 -2 -2 0 -2 -2 0 -1 -1 -3 0 -2 1 2 -1 I -1 -2 -2 -2 -2 -2 -2 -3 -2 5 2 -2 2 1 -2 -1 0 -5 -1 4 -2 -2 -1 L -2 -3 -3 -4 -6 -2 -3 -4 -2 2 6 -3 4 2 -3 -3 -2 -2 -1 2 -3 -3 -1 K -1 3 1 0 -5 1 0 -2 0 -2 -3 5 0 -5 -1 0 0 -3 -4 -2 1 0 -1 M -1 0 -2 -3 -5 -1 -2 -3 -2 2 4 0 6 0 -2 -2 -1 -4 -2 2 -2 -2 -1 F -3 -4 -3 -6 -4 -5 -5 -5 -2 1 2 -5 0 9 -5 -3 -3 0 7 -1 -4 -5 -2 P 1 0 0 -1 -3 0 -1 0 0 -2 -3 -1 -2 -5 6 1 0 -6 -5 -1 -1 0 -1 S 1 0 1 0 0 -1 0 1 -1 -1 -3 0 -2 -3 1 2 1 -2 -3 -1 0 0 0 T 1 -1 0 0 -2 -1 0 0 -1 0 -2 0 -1 -3 0 1 3 -5 -3 0 0 -1 0 W -6 2 -4 -7 -8 -5 -7 -7 -3 -5 -2 -3 -4 0 -6 -2 -5 17 0 -6 -5 -6 -4 Y -3 -4 -2 -4 0 -4 -4 -5 0 -1 -1 -4 -2 7 -5 -3 -3 0 10 -2 -3 -4 -2 V 0 -2 -2 -2 -2 -2 -2 -1 -2 4 2 -2 2 -1 -1 -1 0 -6 -2 4 -2 -2 -1 B 0 -1 2 3 -4 1 3 0 1 -2 -3 1 -2 -4 -1 0 0 -5 -3 -2 3 2 -1 Z 0 0 1 3 -5 3 3 0 2 -2 -3 0 -2 -5 0 0 -1 -6 -4 -2 2 3 -1 X 0 -1 0 -1 -3 -1 -1 -1 -1 -1 -1 -1 -1 -2 -1 0 0 -4 -2 -1 -1 -1 -1 ./fasta-36.3.8h/data/rna.mat 0000644 0001054 0001054 00000001746 13525131122 013060 0 ustar wrp # Sample rna matrix with +2 for G:A, TU:C A C G T U R Y M W S K D H V B N X A 5 -4 2 -4 -4 2 -1 1 1 -1 -1 1 1 1 -2 -1 -1 C -4 5 -4 2 2 -1 1 1 -1 1 -1 -2 1 1 1 -1 -1 G -4 -4 5 -4 -4 1 -1 -1 -1 1 1 1 -2 1 1 -1 -1 T -4 -4 -4 5 5 -1 2 -1 1 -1 1 1 1 -2 1 -1 -1 U -4 -4 -4 5 5 -1 2 -1 1 -1 1 1 1 -2 1 -1 -1 R 2 -1 2 -1 -1 2 -2 -1 1 1 1 1 -1 1 -1 -1 -1 Y -1 2 -1 2 2 -2 2 -1 1 1 1 -1 1 -1 1 -1 -1 M 1 1 -1 -1 -1 -1 -1 2 1 1 -1 -1 1 1 -1 -1 -1 W 1 -1 -1 1 1 1 1 1 1 -1 1 1 1 -1 -1 -1 -1 S -1 1 1 -1 -1 1 1 1 -1 2 1 -1 -1 1 1 -1 -1 K -1 -1 1 1 1 1 1 -1 1 1 2 1 -1 -1 1 -1 -1 D 1 -2 1 1 1 1 -1 -1 1 -1 1 1 -1 -1 -1 -1 -1 H 1 1 -2 1 1 -1 1 1 1 -1 -1 -1 1 -1 -1 -1 -1 V 1 1 1 -2 -2 1 -1 1 -1 1 -1 -1 -1 1 -1 -1 -1 B -2 1 1 1 1 -1 1 -1 -1 1 1 -1 -1 -1 1 -1 -1 N -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ./fasta-36.3.8h/data/vtml160.mat 0000644 0001054 0001054 00000005323 13525131122 013504 0 ustar wrp # # VTML160 # # This matrix was produced with scripts written by # Tobias Mueller and Sven Rahmann [June-2001]. # # VTML160 substitution matrix, Units = Third-Bits # Expected Score = -1.297840 Third-Bits # Lowest Score = -7, Highest Score = 16 # # Entropy H = 0.562489 Bits # # 30-Jun-2001 A R N D C Q E G H I L K M F P S T W Y V B Z X * A 5 -2 -1 -1 1 -1 -1 0 -2 -1 -2 -1 -1 -3 0 1 1 -5 -3 0 -1 -1 0 -7 R -2 7 0 -3 -3 2 -1 -3 1 -4 -3 4 -2 -5 -2 -1 -1 -4 -3 -4 -2 0 0 -7 N -1 0 7 3 -3 0 0 0 1 -4 -4 0 -3 -5 -2 1 0 -5 -2 -4 5 0 0 -7 D -1 -3 3 7 -5 1 3 -1 0 -6 -6 0 -5 -7 -1 0 -1 -7 -5 -4 6 3 0 -7 C 1 -3 -3 -5 13 -4 -5 -2 -2 -1 -4 -4 -1 -4 -3 1 0 -7 -1 1 -4 -5 0 -7 Q -1 2 0 1 -4 6 2 -3 2 -4 -2 2 -1 -4 -1 0 -1 -6 -4 -3 0 4 0 -7 E -1 -1 0 3 -5 2 6 -2 -1 -5 -4 1 -3 -6 -1 0 -1 -7 -3 -3 2 5 0 -7 G 0 -3 0 -1 -2 -3 -2 8 -3 -7 -6 -2 -5 -6 -3 0 -2 -5 -5 -5 -1 -2 0 -7 H -2 1 1 0 -2 2 -1 -3 9 -4 -3 0 -3 0 -2 -1 -1 -1 3 -3 0 0 0 -7 I -1 -4 -4 -6 -1 -4 -5 -7 -4 6 3 -4 2 0 -4 -3 -1 -2 -2 4 -5 -4 0 -7 L -2 -3 -4 -6 -4 -2 -4 -6 -3 3 6 -3 4 2 -3 -3 -2 -1 -1 2 -5 -3 0 -7 K -1 4 0 0 -4 2 1 -2 0 -4 -3 5 -2 -5 -1 -1 -1 -5 -3 -3 0 2 0 -7 M -1 -2 -3 -5 -1 -1 -3 -5 -3 2 4 -2 8 1 -4 -3 -1 -4 -2 1 -4 -3 0 -7 F -3 -5 -5 -7 -4 -4 -6 -6 0 0 2 -5 1 9 -5 -3 -3 3 6 -1 -6 -5 0 -7 P 0 -2 -2 -1 -3 -1 -1 -3 -2 -4 -3 -1 -4 -5 9 0 -1 -5 -6 -3 -2 -1 0 -7 S 1 -1 1 0 1 0 0 0 -1 -3 -3 -1 -3 -3 0 4 2 -4 -2 -2 1 0 0 -7 T 1 -1 0 -1 0 -1 -1 -2 -1 -1 -2 -1 -1 -3 -1 2 5 -6 -3 0 0 -1 0 -7 W -5 -4 -5 -7 -7 -6 -7 -5 -1 -2 -1 -5 -4 3 -5 -4 -6 16 4 -5 -6 -7 0 -7 Y -3 -3 -2 -5 -1 -4 -3 -5 3 -2 -1 -3 -2 6 -6 -2 -3 4 10 -3 -3 -4 0 -7 V 0 -4 -4 -4 1 -3 -3 -5 -3 4 2 -3 1 -1 -3 -2 0 -5 -3 5 -4 -3 0 -7 B -1 -2 5 6 -4 0 2 -1 0 -5 -5 0 -4 -6 -2 1 0 -6 -3 -4 5 2 0 -7 Z -1 0 0 3 -5 4 5 -2 0 -4 -3 2 -3 -5 -1 0 -1 -7 -4 -3 2 5 0 -7 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -7 * -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 -7 1 ./fasta-36.3.8h/doc/ 0000755 0001054 0001054 00000000000 13525131122 011421 5 ustar wrp ./fasta-36.3.8h/doc/INSTALL 0000644 0001054 0001054 00000000563 13525131122 012456 0 ustar wrp 22-Jan-2014 fasta36/doc/INSTALL To compile the FASTA programs, change to the fasta36/src directory: cd ~/fasta36/src and type: make -f ../make/Makefile.linux64_sse2 all The file fasta36/make/README gives more information about the different Makefiles available. The code is routinely compiled and tested under Linux (64-bit, sse2) and MacOS X (64-bit, sse2). ./fasta-36.3.8h/doc/README.versions 0000644 0001054 0001054 00000004203 13525131122 014147 0 ustar wrp $Id: README.versions 120 2010-01-31 19:42:09Z wrp $ $Revision: 210 $ January, 2010 This directory contains the newest version of FASTA, version 36. FASTA36 is a major update to FASTA35 that provides the ability to display multiple significant alignments to a query sequence. Previous versions of FASTA displayed only the best alignment between the query and library sequence; if the library sequence was long, with multiple similar regions, only the best was shown. This contrasts with BLAST, which has always displayed multiple "HSPs" when they are present. FASTA36 provides some additional improvements; like BLAST, it now uses statistical estimates to set thresholds for band optimization, which can increase search speed as much as 2-fold, and it provides much more flexibility in specifying the files that are searched (indirect files of filenames can include additional indirection). But the main improvement is the display of multiple HSPs. All of the traditional alignment programs: ssearch36, fasta36, [t]fast[xy]36 and glsearch36 display multiple HSPs. The peptide and mixed peptide alignment programs ([t]fasts36, fastf36, fastm36) still show a single HSP. Currently, the PVM/MPI parallel versions of the programs still display a single HSP. As of late 2007, there is almost no reason to use the fasta2 programs; the major programs present in fasta2 that were not present in fasta3 (version 34) -- align (global alignments) and lalign (non-overlapping local alignments) are now available in fasta version 36. For more information about the programs in the current FASTA v36 package, see the "changes_v36.html" and "readme.v36" files. There are still a very few programs in the fasta2 package that are not available in the fasta3 package - programs for global alignments without end-gap penalties, the "grease" Kyte-Doolittle plot, and "garnier" and "chofas" for classic (but inaccurate) secondary structure prediction. You should not use the fasta2 programs for library searching; the fasta3 programs are more sensitive and have better statistics. Precompiled versions of the programs for Windows and MacOS are available in the executables directory. ./fasta-36.3.8h/doc/README_v36.3.8d.md 0000644 0001054 0001054 00000002766 13525131122 014064 0 ustar wrp ## The FASTA package - protein and DNA sequence similarity searching and alignment programs Changes in **fasta-36.3.8d** released 13-April-2016: 1. Various bug fixes to `pssm_asn_subs.c` that avoid coredumps when reading NCBI PSSM ASN.1 binary files. `pssm_asn_subs.c` can now read IUPACAA sequences. 2. default gap penalties for VT40 (from -14/-2 to -13/-1), VT80 (from -14/-2 to -11/-1), and VT120 (from -10/-1 to 11/-1) have changed slightly. 3. Introduction of `scripts/m9B_btop_msa.pl` and `scripts/m8_btop_msa.pl`, which uses the BTOP (`-m 9B` or `-m 8CB`) encoded alignment strings to produce a query driving multiple sequence alignment (MSA) in ClustalW format. This MSA can be used as input to `psiblast` to produce an ASN.1 PSSM. 4. The `scripts/annot_blast_btop2.pl` script replaces `scripts/annot_blast_btop.pl` and allows annotation of both the query and subject sequences. 5. Various domain annotation scripts have been renamed for clarity. For example, `ann_feats_up_sql.pl` uses an SQL implementation of Uniprot features tables to annotate domains. Likewise, `ann_pfam_www.pl` gets domain information from the Pfam web site, while `ann_pfam27.pl` gets the information from the downloaded Pfam27 mySQL tables, and `ann_pfam28.pl` uses the Pfam28 mySQL tables. 6. percent identity in sub-alignment scores is calculated like a BLAST percent identity -- gaps are not included in the denominator. For more detailed information, see `doc/readme.v36`. ./fasta-36.3.8h/doc/README_v36.3.8h.md 0000644 0001054 0001054 00000016022 13525131122 014056 0 ustar wrp ## The FASTA package - protein and DNA sequence similarity searching and alignment programs Changes in **fasta-36.3.8h** August, 2019 1. Modifications to support makeblastdb format v5 databases. Currently, only simple database reads have been tested. Changes in **fasta-36.3.8h** March, 2019 1. Translation table 1 (`-t 1`) now translates 'TGA'->'U' (selenocysteine). 2. New script for extracting DNA sequences from genomes (`scripts/get_genome_seq.py`). Currently works with human (hg38), mouse (mm10), and rat (rn6). Changes in **fasta-36.3.8h** January, 2019 1. Bug fixes: `fastx`/`tfastx` searches done with the `-t t` option (which adds a `*` to protein sequences so that termination codons can be matched), did not work properly with the `VT` series of matrices, particularly `VT10`. This has been fixed. 2. New features: Both query and library/subject sequences can be generated by specifying a program script, either by putting a `!` at the start of the query/subject file name, or by specifying library type `9`. Thus, `fasta36 \\!../scripts/get_protein.py+P09488+P30711 /seqlib/swissprot.fa` or `fasta36 "../scripts/get_protein.py+P09488+P30711 9" /seqlib/swissprot.fa` will compare two query sequences, `P09488` and `P30711`, to SwissProt, by downloading them from Uniprot using the `get_protein.py` script (which can download sequences using either Uniprot or RefSeq protein accessions). Often, the leading `!` must be escaped from shell interpretation with `\\!`. New scripts that return FASTA sequences using accessions or genome coordinates are available in `scripts/`. `get_protein.py`, `get_uniprot.py`, `get_up_prot_iso_sql.py` and `get_refseq.py`. `get_refseq.py` can download either protein or mRNA RefSeq entries. `get_up_prot_iso_sql.py` retrieves a protein and its isoforms from a MySQL database. `get_genome_seq.py` extracts genome sequences using coordinates from local reference genomes (`hg38` and `mm10` included by default). Changes in **fasta-36.3.8h** December, 2018 The `scripts/ann_exons_up_www.pl` and `ann_exons_up_sql.pl` now include the option `--gen_coord` which provides the associated genome coordinate (including chromosome) as a feature, indicated by `'<'` (start of exon) and `'>'` (end of exon). Changes in **fasta-36.3.8h** released November, 2018 **fasta-36.3.8h** provides new scripts and modifications to the `fasta` programs that normalize the process of merging sub-alignment scores and region information into both FASTA and BLAST results. To move BLASTP towards FASTA with respect to alignment annotation and sub-alignment scoring: 1. The `blastp_annot_cmd.sh` runs a blast search, finds and scores domain information for the alignments, and merges this information back into the blast output `.html` file. This script uses: 1. `annot_blast_btab2.pl --query query.file --ann_script annot_script.pl --q_ann_script annot_script.pl blast.btab_file > blast.btab_file_ann` (a blast tabular file with one or two new fields, an annotation field and (optionally with --dom_info) a raw domain content field. 2. `merge_blast_btab.pl --btab blast.btab_file_ann blast.html > blast_ann.html` (merge the annotations and domain content information in the `blast.btab_file_ann` file together with the standard blast output file to produce annotated alignments. 3. In addition, `rename_exons.py` is available to rename exons (later other domains) in the subject sequences to match the exon labeling in the aligned query sequence. 4. `relabel_domains.py` can be used to adjust color sets for homologous domains. 2. There is also an equivalent `fasta_annot_cmd.sh` script that provides similar funtionality for the FASTA programs. This script does not need to use `annot_blast_btab2.pl` to produce domain subalignment scores (that functionality is provided in FASTA), but it also can use `merge_fasta_btab.pl` and `rename_exons.py` to modify the names of the aligned exons/domains in the subject sequences. 3. To support the independence of the `blastp`/`fasta` output from html annotation, the FASTA package includes some new options: 1. The `-m 8CBL` option includes query sequence length and subject sequence length in the blast tabular output. In addition, if domain annotations are available, the raw domain coordinates are provided in an additional field after the annotation/subalignment scoring field. `-m 8CBl` provides the sequence lengths, but does not add the raw domain coordinates. 2. The `-Xa` option prevents annotation information from being included in the html output -- it is only available in the `-m 8CB` (or `-m 8CBL/l`) output 3. To reduce problems with spaces in script arguements, annotation scripts with spaces separating arguments can use '+' instead of ' '. 4. The `fasta_annot_cmd.sh` script produces both a conventional alignment on `stdout` and a `-m 8CBL` alignment, which is sent to a separate file, which is separated from the `-m F8CBL` option with a `=`, thus `-m F8CBL=tmp_output.blast_tab`. Changes in **fasta-36.3.8g** released 23-Oct-2018 1. (Oct. 2018) Improvements to scripts in the `psisearch2/` directory: 1. `psisearch2/m89_btop_msa2.pl` 1. the `--clustal` option produces a "CLUSTALW (1.8)", which is required for some downstream programs 2. the `--trunc_acc` option removes the database and accession from identifiers of the form: `sp|P09488|GSTM1_HUMAN` to produce `GSTM1_HUMAN`. 3. the `--min_align` option specifies the fraction of the query sequence that must be aligned `(q_end-q_start+1)/q_length)` Together, these changes make it possible for the output of `m89_btop_msa2.pl` to be used by the EMBOSS program `fprotdist`. 2. A more general implementation of `psisearch2_msa_iter.sh`, which does `psisearch2` one iteration at a time, and a new equivalent `psisearch2_msa_iter_bl.sh`, which uses `psiblast` to do the search. * (Oct. 2018) A small restructuring of the `make/Makefiles` to remove the `-lz` dependence for non-debugging scripts (and add it back when -DDEBUG is used). Changes in **fasta-36.3.8g** released 5-Aug-2018 1. (Apr 2018) incorporation of `-t t1` termination codes ("*") in `-m 8CB`, `-m 8CC`, and `-m9C` so that aligned termination codons are indicated as `**` (`-m8CB`) or `*1` (`-m8CC`, `-m9C`). 2. (Mar 2018) Updates to scripts/annot_blast_btop2.pl to provide subalignment scoring for blastp searches (BLOSUM62 only). (see doc/readme.v36) 3. (Feb. 2018) a new extended option, `-XB`, which causes percent identity, percent similarity, and alignment length to be calculated using the BLAST model, which does not count gaps in the alignment length. see readme.v36 for other bug fixes. Changes in **fasta-36.3.8g** released 31-Dec-2017 1. (December, 2017) -- Make statistical thresholds more robust for small E()-values with normally distributed scores (`ggsearch36`,`glsearch36`). 2. (September, 2017) Treat lower-case queries with no upper-case residues as uppercase with `-S` option. 3. (May, 2017) Improvements/fixes to sub-alignment scoring strategies. 4. Improvements/fixes to psisearch2 scripts. For more detailed information, see `doc/readme.v36`. ./fasta-36.3.8h/doc/changes_v34.html 0000644 0001054 0001054 00000033633 13525131122 014423 0 ustar wrp
$Id: changes_v34.html 120 2010-01-31 19:42:09Z wrp $ $Revision: 210 $
fasta34 -q query.aa '${SLIB2}/swissprot.fa' expands as expected.While this is not important for command lines, where the Unix shell would expand things anyway, it is very helpful for various configuration files, such as files of file names, where:
<${SLIB2}/blast swissprot.fanow expands properly, and in FASTLIBS files the line:
NCBI/Blast Swissprot$0S${SLIB2}/blast/swissprot.faexpands properly. Currently, Environment variable expansion only takes place for library file names, and the <directory in a file of file names.
ssearch34 -P "pssm.asn1 2" .....
>GTM1_HUMAN ... PMILGYWDIRGLAHAIRLLLEYTDS@S?YEEKKYT@MG DAPDYDRS@QWLNEKFKLGLDFPNLPYLIDGAHKITmight mark known and expected (S,T) phosphorylation sites. These symbols are then displayed on the query coordinate line:
10 20 @? 30 @ 40 @ 50 60 GTM1_H PMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLP :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: gtm1_h PMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLP 10 20 30 40 50 60This annotation is mostly designed to display post-translational modifications detected by MassSpec with FASTS, but is also available with FASTA and SSEARCH.
ssearch34 -P blast.ckpt -f -11 -g -1 -s BL62 query.aa libraryWill use the frequency information in the blast.chkpt file to do a position specific scoring matrix (PSSM) search using the Smith-Waterman algorithm. Because ssearch34 calculates scores for each of the sequences in the database, we anticipate that PSSM ssearch34 statistics will be more reliable than PSI-Blast statistics. The Blast checkpoint file is mostly double precision frequency numbers, which are represented in a machine specific way. Thus, you must generate the checkpoint file on the same machine that you run ssearch34 or prss34 -P query.ckpt. To generate a checkpoint file, run:
blastpgp -j 2 -h 1e-6 -i query.fa -d swissprot -C query.ckpt -o /dev/null(This searches swissprot for 2 iterations ("-j 2" using a E() threshold 1e-6 saving the resulting position specific frequencies in query.ckpt. Note that the original query.fa and query.ckpt must match.)
A new option, "-t t", is available to specify that all the protein sequences have implicit termination codons "*" at the end. Thus, all protein sequences are one residue longer, and full length matches are extended one extra residue and get a higher score. For fastx34/tfastx34, this helps extend alignments to the very end in cases where there may be a mismatch at the C-terminal residues.
-m 9c has also been modified to indicate locations of termination codons ( *1).
fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12the first lines of output from FASTA will be:
# fasta34_t -q gtt1_drome.aa /slib/swissprot FASTA searches a protein or DNA sequence data bank version 3.4t20 Nov 10, 2002 Please cite: W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448This has been turned on by default in most FASTA Makefiles.
10 20 30 40 50 60 70 GT8.7 NVRGLTHPIRMLLEYTDSSYDEKRYTMGDAPDFDRSQWLNEKFKL--GLDFPNLPYL-IDGSHKITQ :.:: . :: :: . .::: : .: ::.: .: : ..:.. ::: :..: XURTG NARGRMECIRWLLAAAGVEFDEK---------FIQSPEDLEKLKKDGNLMFDQVPMVEIDG-MKLAQ 20 30 40 50 60would be encoded: "=23+9=13-2=10-1=3+1=5". The alignment encoding is with respect to the beginning of the alignment, not the beginning of either sequence. The beginning of the alignment in either sequence is given by the an0/an1 values. This capability is particularly useful for [t]fast[xy], where it can be used to indicate frameshift positions "/#\#" compactly. If "-m 9c" is used, the "The best scores" title line includes "aln_code".
cat prot_test.lseg | fasta34 -q -S @ /seqlib/swissprotproduces 11 searches. If you use the multiple query functions, the query subset applies only to the first sequence. Unfortunately, it is not possible to search against a STDIN library, because the FASTA programs do not keep the entire library in memory and need to be able to re-read high-scoring library sequences. Since it is not possible to fseek() against STDIN, searching against a STDIN library is not possible.
fasta34 -q mgstm1.aa "seq_demo.sql 16"mysql_lib.c has been modified to remove the restriction that mySQL protein sequence unique identifiers be integers. This allows the program to be used with the PIRPSD database. The RANLIB() function call has been changed to include "libstr", to support SQL text keys. Due to the size of libstr[], unique ID's must be < MAX_UID (20) characters.
A "pirpsd.sql" file is available for searching the mySQL distribution of the PIRPSD database. PIRPSD is available from ftp://nbrfa.georgetown.edu/pir_databases/psd/mysql.
$Id: changes_v35.html 120 2010-01-31 19:42:09Z wrp $ $Revision: 210 $
In translated sequence comparisons, annotations are only available for the protein sequence.
Add ability to search a subset of a library using a file name and a list of accession/gi numbers. This version introduces a new filetype, 10, which consists of a first line with a target filename, format, and accession number format-type, and optionally the accession number format in the database, followed by a list of accession numbers. For example:
</slib2/blast/swissprot.lseg 0:2 4| 3121763 51701705 7404340 74735515 ...Tells the program that the target database is swissprot.lseg, which is in FASTA (library type 0) format.
The accession format comes after the ":". Currently, there are four accession formats, two that require ordered accessions (:1, :2), and two that hash the accessions (:3, :4) so they do not need to be ordered. The number and character after the accession format (e.g. "4|") indicate the offset of the beginning of the accession and the character that terminates the accession. Thus, in the typical NCBI Fasta definition line:
>gi|1170095|sp|P46419|GSTM1_DERPT Glutathione S-transferase (GST class-mu)The offset is 4 and the termination character is '|'. For databases distributed in FASTA format from the European Bioinformatics Institute, the offset depends on the name of the database, e.g.
>SW:104K_THEAN Q4U9M9 104 kDa microneme/rhoptry antigen precursor (p104).and the delimiter is ' ' (space, the default).
Accession formats 1 and 3 expect strings; accession formats 2 and 4 work with integers (e.g. gi numbers).
lalign35 -q mchu.aa:1-74 mchu.aa:75-148Note, however, that the subset range applied to the library will be applied to every sequence in the library - not just the first - and that the same subset range is applied to each sequence. This probably makes sense only if the library contains a single sequence (this is also true for the query sequence file).
Add Mueller and Vingron (2000) J. Comp. Biol. 7:761-776 VT160 matrix, "-s VT160", and OPTIMA_5 (Kann et al. (2000) Proteins 41:498-503).
lalign35 -m 11 | lav2psreplaces plalign (from FASTA2).
>>gi|121716|sp|P10649|GSTM1_MOUSE Glutathione S-transfer (218 aa) s-w opt: 1497 Z-score: 1857.5 bits: 350.8 E(): 8.3e-97 Smith-Waterman score: 1497; 100.0% identity (100.0% similar) in 218 aa overlap (1-218:1-218) ^^^^^^^^^^^^^^where the highlighted text was either: "Smith-Waterman" or "banded Smith-Waterman". In fact, scores were calculated in other ways, including global/local for fasts and fastf. With the addition of ggsearch35, glsearch35, and lalign35, there are many more ways to calculate alignments: "Smith-Waterman" (ssearch and protein fasta), "banded Smith-Waterman" (DNA fasta), "Waterman-Eggert", "trans. Smith-Waterman", "global/local", "trans. global/local", "global/global (N-W)". The last option is a global global alignment, but with the affine gap penalties used in the Smith-Waterman algorithm.
$Id: changes_v36.html $
1 [ - GST_N 88 ] - 90 [ - GST_C 208 ] -Since the closing "]" was associated with the previous "[", domains could not overlap.
The new format is:
1 - 88 GST_N 90 - 208 GST_Cwhich allows annotations of the form:
1 - 88 GST_N 75 - 123 GST-middle 90 - 208 GST_C
FASTA version 36.3.6f extends previous versions in several ways:
Additional bug fixes are documented in fasta-36.3.6f/doc/readme.v36
FASTA version 36.3.6 provides two new features:
(fasta-36.3.5 January 2013) The NCBI's transition from BLAST to BLAST+ several years ago broke the ability of ssearch36 to use PSSMs, because psiblast did not produce the binary ASN.1 PSSMs that ssearch36 could parse. With the January 2013 fasta-36.3.5f, release ssearch36 can read binary ASN.1 PSSM files produced by the NCBI datatool utility. See fasta_guide.pdf for more information (look for the -P option).
Likewise, the score histogram is no longer shown by default; use the -H option to show the histogram (or compile with -DSHOW_HIST for previous behavior).
The _t (fasta36_t) versions of the programs are built automatically on Linux/MacOSX machines and named fasta36, etc. (the programs are threaded by default, and only one program version is built).
Documentation has been significantly revised and updated. See doc/fasta_guide.pdf for a description of the programs and options.
By default, the statistical threshold for alternate alignments (HSPs) is the E()-threshold / 10.0. For proteins, the default expect threshold is E()< 10.0, the secondary threshold for showing alternate alignments is thus E() < 1.0. Fror translated comparisons, the E()-thresholds are 5.0/0.5; for DNA:DNA 2.0/0.2.
Both the primary and secondary E()-thresholds are set with the -E "prim sec" command line option. If the secondary value is betwee zero and 1.0, it is taken as the actual threshold. If it is > 1.0, it is taken as a divisor for the primary threshold. If it is negative, alternative alignments are disabled and only the best alignment is shown.
(fasta-36.3.4) Alignment option -m B provides BLAST-like alignments (no context, coordinates at the beginning and end of the alignment line, Query/Sbjct.
Statistical thresholds can dramatically reduce the number of "optimized" scores, from which statistical estimates are calculated. To address this problem, the statistical estimation procedure has been adjusted to correct for the fraction of scores that were optimized. This process can dramatically improve statistical accuracy for some matrices and gap pentalies, e.g. BLOSUM62 -11/-1.
With the new joining thresholds, the -c "E-opt E-join" options have expanded meanings. -c "E-opt E-join" calculates a threshold designed (but not guaranteed) to do band optimization and joining for that fraction of sequences. Thus, -c "0.02 0.1" seeks to do band optimization (E-opt) on 2% of alignments, and joining on 10% of alignments. -c "40 10" sets the gap threshold as in earlier versions.
By default, the program will read up to 2 GB (32-bit systems) or 12 GB (64-bit systems) of the database into memory for multi-query searches. The amount of memory available for databases can be set with the -XM4G option.
In translated sequence comparisons, annotations are only available for the protein sequence.
Add ability to search a subset of a library using a file name and a list of accession/gi numbers. This version introduces a new filetype, 10, which consists of a first line with a target filename, format, and accession number format-type, and optionally the accession number format in the database, followed by a list of accession numbers. For example:
</slib2/blast/swissprot.lseg 0:2 4| 3121763 51701705 7404340 74735515 ...Tells the program that the target database is swissprot.lseg, which is in FASTA (library type 0) format.
The accession format comes after the ":". Currently, there are four accession formats, two that require ordered accessions (:1, :2), and two that hash the accessions (:3, :4) so they do not need to be ordered. The number and character after the accession format (e.g. "4|") indicate the offset of the beginning of the accession and the character that terminates the accession. Thus, in the typical NCBI Fasta definition line:
>gi|1170095|sp|P46419|GSTM1_DERPT Glutathione S-transferase (GST class-mu)The offset is 4 and the termination character is '|'. For databases distributed in FASTA format from the European Bioinformatics Institute, the offset depends on the name of the database, e.g.
>SW:104K_THEAN Q4U9M9 104 kDa microneme/rhoptry antigen precursor (p104).and the delimiter is ' ' (space, the default).
Accession formats 1 and 3 expect strings; accession formats 2 and 4 work with integers (e.g. gi numbers).
lalign35 -q mchu.aa:1-74 mchu.aa:75-148Note, however, that the subset range applied to the library will be applied to every sequence in the library - not just the first - and that the same subset range is applied to each sequence. This probably makes sense only if the library contains a single sequence (this is also true for the query sequence file).
Add Mueller and Vingron (2000) J. Comp. Biol. 7:761-776 VT160 matrix, "-s VT160", and OPTIMA_5 (Kann et al. (2000) Proteins 41:498-503).
lalign35 -m 11 | lav2psreplaces plalign (from FASTA2).
>>gi|121716|sp|P10649|GSTM1_MOUSE Glutathione S-transfer (218 aa) s-w opt: 1497 Z-score: 1857.5 bits: 350.8 E(): 8.3e-97 Smith-Waterman score: 1497; 100.0% identity (100.0% similar) in 218 aa overlap (1-218:1-218) ^^^^^^^^^^^^^^where the highlighted text was either: "Smith-Waterman" or "banded Smith-Waterman". In fact, scores were calculated in other ways, including global/local for fasts and fastf. With the addition of ggsearch35, glsearch35, and lalign35, there are many more ways to calculate alignments: "Smith-Waterman" (ssearch and protein fasta), "banded Smith-Waterman" (DNA fasta), "Waterman-Eggert", "trans. Smith-Waterman", "global/local", "trans. global/local", "global/global (N-W)". The last option is a global global alignment, but with the affine gap penalties used in the Smith-Waterman algorithm.