exonerate-2.4.0/0000777000175000017500000000000011222703704010512 500000000000000exonerate-2.4.0/README0000644000175000017500000002201010753130637011310 00000000000000README file for exonerate package. FAQs ----------------------------------------------------------------- General ======= What does this package contain ? ------------------------------ This package contains exonerate and its libraries. Exonerate is a general purpose tool for biological sequence comparison. The libraries are provided with the intention of providing general utility for development of various sequence comparison programs. This package was previously called ensembl-nci to reflect the Nucleotide Comparison Interface that was behind much of the code. This interface has now being dropped, along with the name. How do I get the latest version ? ------------------------------- The latest version is in the exonerate CVS repository available from http://www.ensembl.org/ To do this with anonymous CVS, you need to do: (with bash) export CVSROOT=":pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl" (with tcsh) setenv CVSROOT ":pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl" Then: cvs login # (password CVSUSER) cvs co exonerate This will checkout the CVS head which is unstable. To get the latest stable version, use: cvs co -r branch-0-9-0 exonerate or download the latest tarball from: http://www.ebi.ac.uk/~guy/exonerate/ How do I build exonerate ? ------------------------ If you have a version from CVS, it is a development version, so you need a machine with the GNU autoconf/automake tools. You need to do: ./maintain.sh ready Then proceed as for a tarball releases. For a tarball release, you just need to do: ./configure make Use "./configure --help" to see the configure options. If you are using a stable (tarball) version, then just use ./configure then make. It is a good idea to also run "make check" to ensure that everything is working OK on you machine. How do I build exonerate for profiling with gprof ? ------------------------------------------------- make clean ./configure --disable-assert --enable-gprof make What is glib ? ------------ glib is the C library which gtk is based upon. It is highly portable, and provides useful data structures, and convenient functions for handling things like dynamic libraries. glib needs to be installed with at least version 1.2.0 to use with exonerate. You can check this with the command `glib-config --version` glib is probably already installed on your system, but is available from http://www.gtk.org/ Under which license is the code made available ? ---------------------------------------------- GPL, version 3. See: http://www.gnu.org/licenses/gpl.txt This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. (included in the file COPYING distributed with this package) Which platforms will this run on ? -------------------------------- The code has mostly been developed on GNU/Linux systems, and is used mostly on Linux and alpha/OSF1 machines, so it works well with those, but should be portable to other architectures. Is there a paper describing this ? -------------------------------- See: BMC Bioinformatics 2005, 6:31 "Automated generation of heuristics for biological sequence comparison" http://www.biomedcentral.com/1471-2105/6/31 More papers to follow. How do I report a bug ? --------------------- Run the supplied script buginfo.sh in the top level directory to report info about the system on which you have found a bug. You can also try running "make check" to ensure that all the tests pass on your platform. Please supply enough information to make the bug reproducible, including a command line and sample sequences where appropriate. Then send all this info to guy@ebi.ac.uk or ensembl-dev@ebi.ac.uk How do I contribute to the project ? ---------------------------------- PLEASE DO NOT TRY TO CHECK CODE DIRECTLY INTO CVS. If you have a fix for a bug, or would like to contribute a new feature or optimisation to the package, then please prepare a patch for review. The code in the patch must be available under the LGPL. Unified diff output with 5 lines of context is the preferred format for patches. The easiest way to do this is to use CVS to produce a diff against the version currently in the repository. From the exonerate directory in the modified version, do: cvs diff -u -5 What else is there in this package ? ---------------------------------- This code base also include the program iPCRess (In-silico PCR Experiment Simulation System), and also some utilities for manipulation of large fasta files. These are not installed by default, but can be found in the src/util directory. I hope the names are fairly self explanatory. Use ./fasta -h for more info. The utilities are: esd2esi fasta2esd fastaannotatecdna fastachecksum fastaclean fastaclip fastacomposition fastadiff fastaexplode fastafetch fastahardmask fastaindex fastalength fastanrdb fastaoverlap fastareformat fastaremove fastarevcomp fastasoftmask fastasort fastasplit fastasubseq fastatranslate fastavalidcds How can I get more help ? ----------------------- Using any of the programs with -h or --help will give you more information about the options available. There are man pages available for exonerate, exonerate-server and ipcress. If they have not yet been installed, you can view them from the source distribution using: nroff -man doc/man/man1/exonerate.1 | less nroff -man doc/man/man1/exonerate-server.1 | less nroff -man doc/man/man1/ipcress.1 | less Failing that, you can email me: guy@ebi.ac.uk ----------------------------------------------------------------- Exonerate specific questions ============================ How do I make exonerate run faster ? ---------------------------------- Checklist: o Did you ./configure --disable-assert ? o Are the sequences being read from local disk ? o Are the sequences repeat masked with RepeatMasker and dust ? o Are you using the default parameters ? o Are you multiplexing the query and using the -M flag ? Quick and dirty searching parameters: -w 16 -t 80 -H 80 -D 5 -a 1000 -m 1024 Why is it taking ages to build ? ------------------------------ Exonerate uses the C4 dynamic programming library, which optimises the viterbi algorithm by using code generation. About 1,000,000 of code is generated by default for efficient viterbi calculations of the various models. All the generated code for this is written into the codegen directory. The environment variables $C4_CC, $C4_LD, $C4_CCFLAGS and $C4_LDFLAGS can be used to change the behaviour during the code generation. The linking can be modified by using $C4_AR, $C4_AR_INIT and $C4_AR_APPEND (see examples below). If you only want to use exonerate for a couple of models, you can speed up the compilation (and reduce the executable size) by specifying which models to generate code for by using $C4_COMPILED_MODELS. eg. export C4_COMPILED_MODELS="est2genome protein2genome" If you try to run exonerate on a model for which it does not have code generation, a warning is issued, and the viterbi algorithm will run much more slowly. ----------------------------------------------------------------- -- Farm Notes: ---------- Linux: # For Intel CC: # setenv PATH /opt/intel_cc_80/bin/:${PATH} # setenv LD_LIBRARY_PATH /opt/intel_cc_80/lib # setenv CC "icc" # setenv CFLAGS "-O3 -axWK" # setenv C4_CC $CC # setenv C4_CFLAGS "$CFLAGS" ./maintain.sh ready make clean rm config.cache rm -rf codegen ./configure --enable-largefile --prefix=/acari/work2/gs2/gs2/local/archive/Linux/1.0.0 make make install-strip TRU64: rm -f config.cache make clean # setenv C4_AR "/bin/ar" # setenv C4_AR_INIT "czr" # setenv C4_AR_APPEND "zr" # setenv CC "cc" # setenv CFLAGS "-O3 -arch ev6 -fast" # setenv CFLAGS "-O3 -arch ev67 -fast" setenv C4_CC $CC setenv C4_CFLAGS "$CFLAGS" ./configure \ --prefix=/acari/work2/gs2/gs2/local/archive/OSF1/1.0.0 make make install-strip Darwin: # export C4_AR_INIT="csq" # export C4_AR_APPEND="sq" ./configure --enable-largefile --prefix=/wherever/ make make install-strip ia64: ./configure --enable-largefile --prefix=/acari/work2/gs2/gs2/local/archive/ia64/1.0.0 Making .deb dh_make -e guy@ebi.ac.uk -f ../exonerate-1.0.0.tar.gz vi debian/control Section: science Build-Depends: libglib1.2-dev as root: dpkg-buildpackage -- exonerate-2.4.0/AUTHORS0000644000175000017500000000107011035371747011506 00000000000000 Guy St.C. Slater The following have contributed code to this project: (in chronological order) Ewan Birney Ian Holmes Steve Searle Tim Cutts Don Gilbert Michael Schuster Please let me know if I have forgotton to add you to this list. -- Guy. exonerate-2.4.0/COPYING0000644000175000017500000010451310642030763011470 00000000000000 GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU General Public License is a free, copyleft license for software and other kinds of works. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it. For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions. Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users' freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users. Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. "This License" refers to version 3 of the GNU General Public License. "Copyright" also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. "The Program" refers to any copyrightable work licensed under this License. Each licensee is addressed as "you". "Licensees" and "recipients" may be individuals or organizations. To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work. A "covered work" means either the unmodified Program or a work based on the Program. To "propagate" a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To "convey" a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays "Appropriate Legal Notices" to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The "source code" for a work means the preferred form of the work for making modifications to it. "Object code" means any non-source form of a work. A "Standard Interface" means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to "keep intact all notices". c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A "User Product" is either (1) a "consumer product", which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, "normally used" refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. "Installation Information" for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. "Additional permissions" are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered "further restrictions" within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An "entity transaction" is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A "contributor" is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's "contributor version". A contributor's "essential patent claims" are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, "control" includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a "patent license" is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To "grant" such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. "Knowingly relying" means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is "discriminatory" if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Use with the GNU Affero General Public License. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Also add information on how to contact you by electronic and paper mail. If the program does terminal interaction, make it output a short notice like this when it starts in an interactive mode: Copyright (C) This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, your program's commands might be different; for a GUI interface, you would use an "about box". You should also get your employer (if you work as a programmer) or school, if any, to sign a "copyright disclaimer" for the program, if necessary. For more information on this, and how to apply and follow the GNU GPL, see . The GNU General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. But first, please read . exonerate-2.4.0/ChangeLog0000644000175000017500000004014111222677144012210 00000000000000 2.3.0 -> 2.4.0 o Fixed many thread safety bugs o Fixed bugs with large files over 4GB o Modified client to work with multiple servers o Fixed server to handle SIGPIPE o Added support for FOSN input (can be files, directories or servers) o Fixed bug with --wordambiguity and --wordjump (with normal FSM) o Added %pS to --ryo (percent self score over equivalenced regions) 2.2.0 -> 2.3.0 o Fixed several bugs with thread safety for exonerate-server o Fixed bug with custom genetic codes not working o Fixed bug with overflow of seedrepeat horizon in hspset o Added passing of wordlimit params to server o Implemented --cores option for multi-threaded alignment o Added identity and similarity score for GFF genes and exons o Increased default FSM memory limit to 256Mb o Added --revcomp option to disable searching of reverse complement o Fixed bug with compact-FSM (VFSM) based database searching o Added --wordambiguity option for searching ambiguous reference seqs o Fixed bug with esd2esi and .esd files >2Gb 2.1.0 -> 2.2.0 o Fixed bug with esd2esi when softmasking is off o Removed all GMemChunk usage due to caching problems with glib-2 o Fixed various minor memory leaks 2.0.0 -> 2.1.0 o Fixed portability bugs for x86_64 o Added linecount: tag for long client:server messages o Added setsockopt(TCP_NODELAY) to improved socket performance o Changed server to use pthreads by default to avoid LSF memlimit problems o Added faster implementation of Sequence_strncpy() o Fixed bug with sequence caching o Added -V 3 client-server debug code o Prevented sub-optimial alignments below geneseed threshold from appearing o Fixed sorting of translated indices o Completed implementation of server-side geneseed o Fixed bug with .esd file parsing o Made compilation use glib2 by default o Fixed SDP memory management bug o Fixed segfault bug with --refine (Thanks to Don Gilbert) o Improved splice site scoring o Reverted to protein substition scoring for cd2g model o Fixed bug with fasta parser and \r\n type newlines 1.4.0 -> 2.0.0 o Modified exonerate to work in Client:Server mode o Fixed --refine to work with SDP alignments o Disabled codegen warnings from bootstrapper during compilation o Added --geneticcode option for using alternative genetic codes o Added --splice5 and --splice3 to allow alternative splice site PSSMs o Fixed several bugs when scanning query sequences (eg. --forcescan query) o Fixed to report query on forward strand whenever possible. o Fixed --ryo sequence dumping with --bestn o Fixed a bug with missing seeds when using --annotation o Changed license to GPLv3 o Added protein2dna:bestfit and protein2genome:bestfit models (works with exhaustive alignment only) 1.3.0 -> 1.4.0 o Improved splice site cache memory management o Added --geneseed option to speed up large-scale analysis o Fixed a bug with 3'utr coordinates in GFF output o Fixed a crash during GFF dumping when using glib2 o Added error message for failed seeks when input is stdin 1.2.0 -> 1.3.0 o Fixed a bad memory leak in boundary.c o Fixed bug with selenocysteines in query sequences o Fixed bug with subsequence creation (and --ryo problems) o Fixed a bug with transition mapping in reduced space alignments o Removed variable_submat to fix bugs with codegen and allow speed ups o Fixed bug with --usetlaaa and translated models o Fixed emit calculation in HPair for BSDP o Added C field to vulgar for disambiguation of codon matches o Fixed bug with region finding in exhaustive DP codegen o Fixed GFF bugs (start always < end) and off-by-one bug on rev strand o Fixed bugs with codegen for cd2g o Fixed bug with chained silent transitions in reduced space traceback o Fixed bug broken bigseq comparisons o Fixed --percent and --bestn ungapped comparisons 1.1.0 -> 1.2.0 o Fixed bugs with SDP and short exons in spanned models o Fixed bug with overwriting of some valid SDP span scores o Fixed forward coordinates in GFF o Added utr3,utr5 and cds fields to GFF output o Fixed rounding bug in codonsubmat (this prevented codon models working on some machines) o Updated SDP codegen o Fixed compilation to work with both glib-1 or glib-2 o Fixed phase macros o Fixed build problems on OSF o Added initial client/server code (nothing useful yet) 1.0.0 -> 1.1.0 o Replaced C4_Model_find_*() functions using names o Fixed DP bug with non-local models o Added Seeder and Match modules to support multi-hspset models o Added %tcs etc to --ryo for dumping coding sequences o Added CodonSubmat module o Changed WordHood to work in codon space o Changed Intron and Phase models to support joint introns for g2g o Fixed bug with display of selenocysteine containing alignments o Added --annotation option for cd2g model o Fixed bug with introns in cigar strings o Modified configure.in to compile cleanly on solaris o Changed to use abstract sequences (requires optimisations) o Fixed bug with --bestn tiebreaking o Fixed bugs with suboptimal SDP alignments o Simplified tracebacks for SDP o Fixed bug with adjacent phased introns o Added --codongapopen and --codongapextend o Added new softmasking implementation 0.9.1 -> 1.0.0 o Added code for sdp over spanned models o Added initial code for UTR model o Fixed various minor bugs o Improved error reporting o Added --fastasuffix option for filtering directory contents o Made installation of utilities optional o Fixed memory bug with --percent option o Improved build system 0.9.0 -> 0.9.1 o Fixed memory leak with SDP o Bug fix to handle selenocysteines o Fixed bug with words missing from word neighbourhood o Fixed minor bugs and altered alignment display 0.8.3 -> 0.9.0 o Added seeded dynamic programming modules (does not yet work with models containing spans) o Fixed --bestn to allow tiebreaking of identically scoring hits o Fixed bug with lost error messages with --bestn o Fixed bug with alignment display and --forwardcoordinates 0.8.2 -> 0.8.3 o Fixed memory leak with BSDP o Added BSAM module for whole chromosome alignment o Added VFSM method of alignment seeding o Rewrote word neighbourhood code for larger neighbourhoods o Added --silent to fastafetch o Added --alignmentwidth for alignent formatting o Added --quality option for BSDP filtering o Fixed tricky bug with suboptimal alignments in BSDP o Fixed bug with reporting revcomp --bestn hits o Marked non-consensus splice sites in alignment display o Fixed bug with coordinates in alignment display o Added frameshifts and split codons to gff output 0.8.1 -> 0.8.2 o Optimised BSDP for repeat-rich genomic comparisons o Fixed bug with intron phase bounds o Added experimental BSDP repeat pruning algorithm o Allowed input to be directories of fasta files o Added example to the exonerate usage message o Fixed bug in code generation o Tightened span bounds o Fixed a bug with Waterman-Eggert o Removed old node_is_used code o Fixed memory leak in phase model o Added PCR product dumping to ipcress o Fixed bug with softmasking translated sequences o Sorted multiple input files on file size for --bestn performance 0.8.0 -> 0.8.1 o Added module for codon usage calculations o Fixed bug with phase model shadows 0.7.2 -> 0.8.0 o Added support for Waterman-Eggert style suboptimal alignments o Fixed bugs with Hughey-style DP traceback o Fixed bugs with phase model o Added option for alignment refinement o Stopped glib error messages from dumping core o Added --{query,target}chunk{id,total} for splitting up jobs o Added start of test suite o Fixed a bug with the shadow designation algorithm o Added %e[ti] for equivalenced symbol counting o Turned off --forcegtag by default o Added query-specific score thresholds o Added RangeTree-based HSP pairing algorithm o Made Shadows go state->transition to fix span model problems o Fixed BSDP to work properly with suboptimal alignments o Fixed shadow propagation algorithm for derived models. o Simplified phase model for new shadows 0.7.1 -> 0.7.2 o Moved transition shields to calc o Fixed a bug caused by shadow designation clashing o Added shadow to intron model to apply min/max intron limits o Fixed bugs with 3:3 HSP filtering o Added general speedups to HSP code o Bug fix for guessing database type of large files o Added coding2genome model o Simplified intron model (fewer states) o Fixed bugs with shadow assignment ordering o Fixed compilation when G_GNUC_EXTENSION not defined o Added option for aggressive HSP filtering o Removed old terminal_data/secondary_data system for C4_Calc o Added Alphabet type to replace Sequence_Type and handle masking o Improved softmasking implementation o Removed all use of gmodule and renamed Plugin as Codegen, o Changed bootstrapper for faster linking o Separated Viterbi code from Optimal module o Made bootstrapper use C4_COMPILED_MODEL_LIST o Added Model_Type module o Fixed sorting of results (on score) with --bestn 0.7.0 -> 0.7.1 o Modularised code for frameshifts and intron phases o Added support for DNA:DNA alignments at protein level o Added Coding2Coding and started Genome2Genome model o Added support for per-transition output with --ryo o Made changes for compilation on OS-X o Separated parameters for dna and protein level alignments o Added dejavu exact repeat finding library (for reprobate) o Fixed query softmasking and various other bugs o Added speedups to approximate matching for ipcress o Updated man pages o Added transition shields to fix phase model problems o Separated DP layout code from Optimal module 0.6.7 -> 0.7.0 o Imported various fasta format file processing utilities o Added --intronpenalty o Added shadow transitions to C4 o Added protein2genomic model o Refactored DP core by addition of Optimal_Data and replacement of various hacks with proper 4D row data. o Refactored codegen to use a single viterbi function per plugin. o Modified Optimal_PatternMatrix system to reduce length of generated code and decrease compile time. o Fixed bootstrapper to use C4_CC and C4_CFLAGS o Changed WordHood generation algorithm to utilise separate input and output alphabets. o Changed FSM implementation to allow more control when combining words which are redundant or subsequences of each other. o Imported ipcress and reimplemented most of it to handle mismatches, mixed length and redundant primers etc. o Changed ./configure --enable-gprof to work with profiling version of libc6 o Changed argument handling code to parse lists from command line o Added CompoundFile module o Made exonerate work with multiple sequence query and target files o Added more info to man page for exonerate o Wrote man page for ipcress o Added more sequence checking when using assertions. o Fixed bug with display of equivalenced Ns o Fixed attribute fields and off-by-one coordinate bugs in GFF. o Added --bestn option for best-in-genome type searches etc. o Changed --scanquery to --forcescan and made default selection automatic based on sizes of input files. o Fixed to parse non-standard __DATE__ format on OS-X. o Simplified HPair code by addition of SAR module. o Simplified HSPset code by using HSP horizon o Simplified DNA2Protein model 0.6.6 -> 0.6.7 o Fixed memory leak with SListSet o Simplification of Portal/Span system o Added short help (-h) command line synopsis o Implemented C4 model inheritance o Separated Intron model from est2genome code o Added --wordhoodjump for large query inputs o Added --showquerygff and --showtargetgff for GFF dumping o Added --scanquery option and removed symmetric models 0.6.5 -> 0.6.6 o Fixed bug with nascent HSPs o Added --verbose option o Added initial support for dna2protein and protein2dna models o Renamed GenomicNER -> NER, allowed proteins alignments o Added --showvulgar --ryo output options o Fixed wordhood bug when Xs are present in nucleotide sequence o Added Dynamite-style alignment labels o Replaced model-specific alignment drawing code with a single label-based implementation o Fixed bugs with bootstrapper looking for (unnecessary) region finding DP implementations for global models o Added ungapped models - made ungapped model the default 0.6.4 -> 0.6.5 o Added support for 2D spans o Added support for genomic NER model o Added basic speedups to Heuristic_Span_integrate o Fixed bug with query multiplexing o Added descriptions to alignment output o Added softmasking support for query or target o Added thresholds to Optimal_find_path (speed up) o Added lots more command line options o Updated man pages, added examples 0.6.3 -> 0.6.4 o Added ability to align sequences exhaustively (slow) o Major refactoring of all Optimal DP code o Added Hughey-style reduced space DP implementation (consisting of region finding, checkpoint finding, continuation DP and associated tracebacks) o Fixed more bugs with splice site prediction and gtag_only o Fixed more bugs with non-diagonal HSPs o Fixed bug with derived model creation o Fixed several bugs with terminal matrix upper bound calculation and end HSP component integration o Added many more command line options and updated man page 0.6.2 -> 0.6.3 o Added bootstrapping system for static linking of c4 models o Added support for est2genome model o Fixed bugs with est2genome model and splice site prediction o Started module-localised command-line arguments o Added C4_Model_configure_extra to supply C4_Model with {init,exit}_{func,macro} like C4_Calc o Updated man page (still incomplete) 0.6.1 -> 0.6.2 o Added hub directory (and moved UGAM to it) o Added GAM: Gapped Alignment Manager o Added Analysis: High-level analysis object o Added program directory and exonerate program o Moved sequence_type from FastaDB to Sequence o Moved BSDP threshold from Heuristic -> HPair o Various other small fixes 0.6.0 -> 0.6.1 o Added FastaDB_Seq_revcomp() o Many minor bugs fixed (mostly found by valgrind) o Simplified FSM memory management to use glib memchunks (previous implemention was slower and buggy) o Added SList: Efficient Single-linked Lists o Many fixes for non-diagonal HSPs o Added UGAM: UnGapped Alignment Manager 0.5.0 -> 0.6.0 o Changed from ensembl-nci code base to exonerate o Removal of NCI o Major refactoring including many minor changes not listed here o Build system now builds statically when using gprof o Build system now supports gcov coverage analysis o Build system supports "make check" to run tests (a lot more of the tests do something worthwhile now) o Started a man page for exonerate. RTFM! o FastaDB object changed to inherit from Sequence o Sequence object includes strand information o Rewrite of word neighbourhood generation using a new algorithm, - now much faster and without memory overhead o Allowed word neighbourhood generation by dropoff o Added support for non-diagonal HSPsets o Changed plugin system to recognise out-of-date modules o Changed plugin system to support multi-architecture farms o Added underflow/overflow protection to DP implementations o Used Region object much more to simplify C4 o Changes to flow of control around C4<->HSP interfaces o Added exit_func and exit_macro to C4_Calc o Simplified splice site prediction coordinate system o Converted est2genome model to use PSSM splice site prediction o Fixed C4_Model_select() to avoid copying redundant calcs -- exonerate-2.4.0/INSTALL0000644000175000017500000001722710371635214011474 00000000000000Basic Installation ================== These are generic installation instructions. The `configure' shell script attempts to guess correct values for various system-dependent variables used during compilation. It uses those values to create a `Makefile' in each directory of the package. It may also create one or more `.h' files containing system-dependent definitions. Finally, it creates a shell script `config.status' that you can run in the future to recreate the current configuration, a file `config.cache' that saves the results of its tests to speed up reconfiguring, and a file `config.log' containing compiler output (useful mainly for debugging `configure'). If you need to do unusual things to compile the package, please try to figure out how `configure' could check whether to do them, and mail diffs or instructions to the address given in the `README' so they can be considered for the next release. If at some point `config.cache' contains results you don't want to keep, you may remove or edit it. The file `configure.in' is used to create `configure' by a program called `autoconf'. You only need `configure.in' if you want to change it or regenerate `configure' using a newer version of `autoconf'. The simplest way to compile this package is: 1. `cd' to the directory containing the package's source code and type `./configure' to configure the package for your system. If you're using `csh' on an old version of System V, you might need to type `sh ./configure' instead to prevent `csh' from trying to execute `configure' itself. Running `configure' takes awhile. While running, it prints some messages telling which features it is checking for. 2. Type `make' to compile the package. 3. Optionally, type `make check' to run any self-tests that come with the package. 4. Type `make install' to install the programs and any data files and documentation. 5. You can remove the program binaries and object files from the source code directory by typing `make clean'. To also remove the files that `configure' created (so you can compile the package for a different kind of computer), type `make distclean'. There is also a `make maintainer-clean' target, but that is intended mainly for the package's developers. If you use it, you may have to get all sorts of other programs in order to regenerate files that came with the distribution. Compilers and Options ===================== Some systems require unusual options for compilation or linking that the `configure' script does not know about. You can give `configure' initial values for variables by setting them in the environment. Using a Bourne-compatible shell, you can do that on the command line like this: CC=c89 CFLAGS=-O2 LIBS=-lposix ./configure Or on systems that have the `env' program, you can do it like this: env CPPFLAGS=-I/usr/local/include LDFLAGS=-s ./configure Compiling For Multiple Architectures ==================================== You can compile the package for more than one kind of computer at the same time, by placing the object files for each architecture in their own directory. To do this, you must use a version of `make' that supports the `VPATH' variable, such as GNU `make'. `cd' to the directory where you want the object files and executables to go and run the `configure' script. `configure' automatically checks for the source code in the directory that `configure' is in and in `..'. If you have to use a `make' that does not supports the `VPATH' variable, you have to compile the package for one architecture at a time in the source code directory. After you have installed the package for one architecture, use `make distclean' before reconfiguring for another architecture. Installation Names ================== By default, `make install' will install the package's files in `/usr/local/bin', `/usr/local/man', etc. You can specify an installation prefix other than `/usr/local' by giving `configure' the option `--prefix=PATH'. You can specify separate installation prefixes for architecture-specific files and architecture-independent files. If you give `configure' the option `--exec-prefix=PATH', the package will use PATH as the prefix for installing programs and libraries. Documentation and other data files will still use the regular prefix. In addition, if you use an unusual directory layout you can give options like `--bindir=PATH' to specify different values for particular kinds of files. Run `configure --help' for a list of the directories you can set and what kinds of files go in them. If the package supports it, you can cause programs to be installed with an extra prefix or suffix on their names by giving `configure' the option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'. Optional Features ================= Some packages pay attention to `--enable-FEATURE' options to `configure', where FEATURE indicates an optional part of the package. They may also pay attention to `--with-PACKAGE' options, where PACKAGE is something like `gnu-as' or `x' (for the X Window System). The `README' should mention any `--enable-' and `--with-' options that the package recognizes. For packages that use the X Window System, `configure' can usually find the X include and library files automatically, but if it doesn't, you can use the `configure' options `--x-includes=DIR' and `--x-libraries=DIR' to specify their locations. Specifying the System Type ========================== There may be some features `configure' can not figure out automatically, but needs to determine by the type of host the package will run on. Usually `configure' can figure that out, but if it prints a message saying it can not guess the host type, give it the `--host=TYPE' option. TYPE can either be a short name for the system type, such as `sun4', or a canonical name with three fields: CPU-COMPANY-SYSTEM See the file `config.sub' for the possible values of each field. If `config.sub' isn't included in this package, then this package doesn't need to know the host type. If you are building compiler tools for cross-compiling, you can also use the `--target=TYPE' option to select the type of system they will produce code for and the `--build=TYPE' option to select the type of system on which you are compiling the package. Sharing Defaults ================ If you want to set default values for `configure' scripts to share, you can create a site shell script called `config.site' that gives default values for variables like `CC', `cache_file', and `prefix'. `configure' looks for `PREFIX/share/config.site' if it exists, then `PREFIX/etc/config.site' if it exists. Or, you can set the `CONFIG_SITE' environment variable to the location of the site script. A warning: not all `configure' scripts look for a site script. Operation Controls ================== `configure' recognizes the following options to control how it operates. `--cache-file=FILE' Use and save the results of the tests in FILE instead of `./config.cache'. Set FILE to `/dev/null' to disable caching, for debugging `configure'. `--help' Print a summary of the options to `configure', and exit. `--quiet' `--silent' `-q' Do not print messages saying which checks are being made. To suppress all normal output, redirect it to `/dev/null' (any error messages will still be shown). `--srcdir=DIR' Look for the package's source code in directory DIR. Usually `configure' can determine that directory automatically. `--version' Print the version of Autoconf used to generate the `configure' script, and exit. `configure' also accepts some other, not widely useful, options. exonerate-2.4.0/Makefile.am0000644000175000017500000000022610401032175012455 00000000000000 SUBDIRS = doc src test MAINTAINERCLEANFILES = Makefile.in aclocal.m4 \ config.h.in configure \ tags exonerate-2.4.0/Makefile.in0000644000175000017500000002422511222703634012502 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = . ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ SUBDIRS = doc src test MAINTAINERCLEANFILES = Makefile.in aclocal.m4 config.h.in configure tags ACLOCAL_M4 = $(top_srcdir)/aclocal.m4 mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = DIST_COMMON = README AUTHORS COPYING ChangeLog INSTALL Makefile.am \ Makefile.in NEWS TODO aclocal.m4 config.guess config.sub configure \ configure.in install-sh missing mkinstalldirs DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best all: all-redirect .SUFFIXES: $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$@ CONFIG_HEADERS= $(SHELL) ./config.status $(ACLOCAL_M4): configure.in cd $(srcdir) && $(ACLOCAL) config.status: $(srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES) $(SHELL) ./config.status --recheck $(srcdir)/configure: $(srcdir)/configure.in $(ACLOCAL_M4) $(CONFIGURE_DEPENDENCIES) cd $(srcdir) && $(AUTOCONF) # This directory's subdirectories are mostly independent; you can cd # into them and run `make' without going through this Makefile. # To change the values of `make' variables: instead of editing Makefiles, # (1) if the variable is set in `config.status', edit `config.status' # (which will cause the Makefiles to be regenerated when you run `make'); # (2) otherwise, pass the desired values on the `make' command line. @SET_MAKE@ all-recursive install-data-recursive install-exec-recursive \ installdirs-recursive install-recursive uninstall-recursive \ check-recursive installcheck-recursive info-recursive dvi-recursive: @set fnord $(MAKEFLAGS); amf=$$2; \ dot_seen=no; \ target=`echo $@ | sed s/-recursive//`; \ list='$(SUBDIRS)'; for subdir in $$list; do \ echo "Making $$target in $$subdir"; \ if test "$$subdir" = "."; then \ dot_seen=yes; \ local_target="$$target-am"; \ else \ local_target="$$target"; \ fi; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \ || case "$$amf" in *=*) exit 1;; *k*) fail=yes;; *) exit 1;; esac; \ done; \ if test "$$dot_seen" = "no"; then \ $(MAKE) $(AM_MAKEFLAGS) "$$target-am" || exit 1; \ fi; test -z "$$fail" mostlyclean-recursive clean-recursive distclean-recursive \ maintainer-clean-recursive: @set fnord $(MAKEFLAGS); amf=$$2; \ dot_seen=no; \ rev=''; list='$(SUBDIRS)'; for subdir in $$list; do \ rev="$$subdir $$rev"; \ test "$$subdir" != "." || dot_seen=yes; \ done; \ test "$$dot_seen" = "no" && rev=". $$rev"; \ target=`echo $@ | sed s/-recursive//`; \ for subdir in $$rev; do \ echo "Making $$target in $$subdir"; \ if test "$$subdir" = "."; then \ local_target="$$target-am"; \ else \ local_target="$$target"; \ fi; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \ || case "$$amf" in *=*) exit 1;; *k*) fail=yes;; *) exit 1;; esac; \ done && test -z "$$fail" tags-recursive: list='$(SUBDIRS)'; for subdir in $$list; do \ test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) tags); \ done tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: tags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SUBDIRS)'; for subdir in $$list; do \ if test "$$subdir" = .; then :; else \ test -f $$subdir/TAGS && tags="$$tags -i $$here/$$subdir/TAGS"; \ fi; \ done; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(PACKAGE)-$(VERSION) top_distdir = $(distdir) # This target untars the dist file and tries a VPATH configuration. Then # it guarantees that the distribution is self-contained by making another # tarfile. distcheck: dist -rm -rf $(distdir) GZIP=$(GZIP_ENV) $(TAR) zxf $(distdir).tar.gz mkdir $(distdir)/=build mkdir $(distdir)/=inst dc_install_base=`cd $(distdir)/=inst && pwd`; \ cd $(distdir)/=build \ && ../configure --srcdir=.. --prefix=$$dc_install_base \ && $(MAKE) $(AM_MAKEFLAGS) \ && $(MAKE) $(AM_MAKEFLAGS) dvi \ && $(MAKE) $(AM_MAKEFLAGS) check \ && $(MAKE) $(AM_MAKEFLAGS) install \ && $(MAKE) $(AM_MAKEFLAGS) installcheck \ && $(MAKE) $(AM_MAKEFLAGS) dist -rm -rf $(distdir) @banner="$(distdir).tar.gz is ready for distribution"; \ dashes=`echo "$$banner" | sed s/./=/g`; \ echo "$$dashes"; \ echo "$$banner"; \ echo "$$dashes" dist: distdir -chmod -R a+r $(distdir) GZIP=$(GZIP_ENV) $(TAR) chozf $(distdir).tar.gz $(distdir) -rm -rf $(distdir) dist-all: distdir -chmod -R a+r $(distdir) GZIP=$(GZIP_ENV) $(TAR) chozf $(distdir).tar.gz $(distdir) -rm -rf $(distdir) distdir: $(DISTFILES) -rm -rf $(distdir) mkdir $(distdir) -chmod 777 $(distdir) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done for subdir in $(SUBDIRS); do \ if test "$$subdir" = .; then :; else \ test -d $(distdir)/$$subdir \ || mkdir $(distdir)/$$subdir \ || exit 1; \ chmod 777 $(distdir)/$$subdir; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) top_distdir=../$(distdir) distdir=../$(distdir)/$$subdir distdir) \ || exit 1; \ fi; \ done info-am: info: info-recursive dvi-am: dvi: dvi-recursive check-am: all-am check: check-recursive installcheck-am: installcheck: installcheck-recursive install-exec-am: install-exec: install-exec-recursive install-data-am: install-data: install-data-recursive install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-recursive uninstall-am: uninstall: uninstall-recursive all-am: Makefile all-redirect: all-recursive install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: installdirs-recursive installdirs-am: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-recursive clean-am: clean-tags clean-generic mostlyclean-am clean: clean-recursive distclean-am: distclean-tags distclean-generic clean-am distclean: distclean-recursive -rm -f config.status maintainer-clean-am: maintainer-clean-tags maintainer-clean-generic \ distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-recursive -rm -f config.status .PHONY: install-data-recursive uninstall-data-recursive \ install-exec-recursive uninstall-exec-recursive installdirs-recursive \ uninstalldirs-recursive all-recursive check-recursive \ installcheck-recursive info-recursive dvi-recursive \ mostlyclean-recursive distclean-recursive clean-recursive \ maintainer-clean-recursive tags tags-recursive mostlyclean-tags \ distclean-tags clean-tags maintainer-clean-tags distdir info-am info \ dvi-am dvi check check-am installcheck-am installcheck install-exec-am \ install-exec install-data-am install-data install-am install \ uninstall-am uninstall all-redirect all-am all installdirs-am \ installdirs mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/NEWS0000644000175000017500000000000010371635214011117 00000000000000exonerate-2.4.0/TODO0000644000175000017500000002215610371635214011130 00000000000000-- IN_PROGRESS: ----------- -//- reprobate, agglomerate Add extra stop codon penalty ? Share subopt between fwd/rev strands for translating models ? optimisation: stop codegen from testing match more than once per cell. optimisation: prevent subopt for certain optimals (eg. global). Fix configure_extra clashes from intron models (remove configure_extra altogether in intron?) ./exonerate -m p2g pep.fa human.genomic -E --forcegtag no (score check problem) - add dependency in codegen.c:188 replace when: codegen/model.o is older than model/model.o (Make model expiry depend on parent model expiry) - Expand man page intro, mention main features. ------------------------------------------------------------------- -- o Separate code for DP scheduler from c4.[ch] o Option for --refine (###check with region) (needs to work with suboptimal alignments) Refine strategies: - none: default - full: redo full alignment - region: redo alignment region + edge boundary - cobs: do alignment between all cobs points in the final alignment. (or just 'clean' cobs points) ( Also terminal cobs->corner, terminal cobs->align_end + boundary ) - hsp : redo alignment hsp_set bound box + boundary - align: bad splice sites low quality regions missing regions at ends. o Split RedSpace from viterbi ? o Move region finding, redspace etc outside of viterbi o Allow hidden states / compound transitions for optimiser (store chained transition for each transition, also state map, must translate alignment afterwards) o Fix u:t type models to check revcomp (analysis.c:133) (Should translate target for trans2dna models) o Add c2g model (c2c + tg intron): check working for reversed ? (./exonerate c.f e.f -m c2g) o Add phase 1->2 transition for phase model o Add joint intron type to intron model (joint cis only) o Add joint phase type, (9-state model) o Split alignment: alignment->sequence/gene,ryo,label combine Alignment output methods add Alignment_traverse have query_gene target_gene o Add --ryo %[qt]f for frame (%[qt]b %3) - need gene object ? can map all onto gene object ? gene = Gene_create(alignment, on_query); Gene_write_gff(gene); o Separate GFF code from alignment o Should GFF show all coordinates on the +ve strand? (jason_p2g eg) o Report frameshifts (and in-frame stops) in GFF o GAM_Type_Data o Rearrange C4/optimal: 1 RedSpace: subalignment,checkpoint,recursion,traceback problem with two different cell sizes 2 Viterbi: interpreted,codegen,optimal_data,optimal_mode 3 SubOpt: store,region,macros 4 OptModel: dp model optimiser eg. [-O->] => [->] 5 OPair: create,destroy,next_score,next_path (make HPair same interface) 6 Optimal: optimal_type,find_score,find_path o Refactor Heuristic (and HPair) to clarify heuristic model representation (use (shared?) simple model object ?) o Make interpreted and codegen implementations more similar. o Need Alignment_traverse() to handle shadow setting etc. ( Adapt from Alignment_has_valid_alignment() ) o Implement --verbose or --verbose (or have module-specific verbose options ?) o Introduce chain type/chain object ? [dna|protein|other] - replace C4 user_data / terminal_data system - allow multiple chain types to be used with some models o Make all Optimal_Mode find_{region,score,path,checkpoints} inherit from basic model definitions o Command line meta-options o Separation of C4 runtime and compile time requirements (eg. upper bounds for calcs not required at compile time - fix this with model-specific params ?) -//- sequence/gene: create(dna_seq), destroy, add_exon sequence/alphabet: stuff from Sequence_{Type,Filter} also masking SubOpt: create(alignment), allow exclusion in viterbi Alignment: Alignment_build_gene -//- BUGS: ----- o Genome2Genome model: - C4_Label_SPLIT_CODON display fixes for dna2dna - Get test working with joint Phase and Intron models o Add checks to catch duplicate C4 names. (necessary as duplicates are removed on model copying) (also need namespace management for C4 codegen) o Bug with --saturatethrehold memory on large analyses (maybe not taken into account by --fsmmemory ?) o Increase default SAR ranges to span wordlength (check titin example (intron:146) - bug ?). o Bug with memory allocation on genome exhaustive alignments o Fix alignment drawing scheme to handle silent dna:dna mutations o Clean up unnecessary C4 inheritance macros o Suboptimal exhaustive alignments (can still do reuse lookups in constant time, for linear space alignment as we know the DP computation order) (also required to prevent SARs containing paths from other HSPs or higher scoring alignments) - requires ajoining HSP component retraction?). OPTIMISATIONS: ------------- o Only copy designated shadows in codegen. o Defer/join SAR/Region memory allocation and/or use memchunks. (just return bound initially, as most bounds are not confirmed ?) o Profiling (and memory profiling) FEATURES: -------- o Change --ryo transition per line format to allow () : all {} : non-silent only (qy_adv || tg_adv) <> : match only (qy_adv && tg_adv) (or add --ryog for gene output ?) -//- o Prevent from generating > --bestn suboptimal alignments for any pairwise comparison. (ie. update --bestn/BSDP threshold after every new alignment) o Allow multiple span states in a single span ? (ie. do all dp in a single sweep) o Add support for large bounds and joins for missing exon finding and for worst cases when two close HSPs fail to join. (or just detect why this occurs ...) o Write DP optimiser - skip single input/output states O->O->O => O->O - remove start/end states when single transistion or simple input/output transitions can be duplicated (ie. make multiple start/end states) - shadows for single seq-independent loop states (eg. simple ins/del) DOCUMENTATON: ------------ o Add note about interpretation of exonerate alignments (split introns, translating equivalence notation etc.) o Add note about --score and --hspthreshold for ungapped models - or add a warning/error when different) - or make hsp_threshold = MAX(score, hsp_threshold) o Add new examples to the man page. eg. multiple files etc. o Add info about each type of model. --------------------------------------------------------------------- --< RELEASE POINT >-- --< NEXT RELEASE POINT >-- o Add macros for Span start/end reporting functions (Heuristic_Span_src_report_end_func, Heuristic_Span_dst_init_start_func). o Add option to show single-char AAs in alignments. o Option for no revcomp of query or target (useful for exhaustive) o Division of calcs into [independent, qy dep, tg, dep, qy/tg dep]. (this will allow further optimisations) o Add full alignment dump. One of: o each transition o each non-silent transition o each differently scoring label. or just add match:mismatch base-pair level detail to --ryo eg. vulgar: M 7 7 (%m) -> 5 5 5 -4 5 5 5 o Alignment dumping/regeneration facility o Add FastaDB sequence cache ? --< SUBSEQUENT RELEASE POINT >-- o Add simple model specification (like smile strings ?) - describe model structure using predefined states and transition types - should be able to describe all existing models o Change default nucleic matrix to +5/-1 ? o Optimise DP row_matrix access with shadows ? o Tidy codegen o Removal of unnecessary codegen (with Optimal_Type) o Check function naming to ensure reuse of global model codegen o Add --use-config option (auto update implementation) o Option to dump parameter set used ? o GFF3 support ? http://song.sf.net/ http://sourceforge.net/mailarchive/forum.php?thread_id=2591765&forum_id=27223 -- o Training of heuristic params using exhaustive alignments, training of wordhood params using ungapped suboptimal alignments etc o Installed headers for libxnr8/libc4 (as required for C4 codegen) - requires addition of opaque types o Add special tests directory including: * tests with data * all tests with valgrind * all tests with MALLOC_CHECK_ o Automation of model:[global,bestfit,local,overlap] system ? o Test speed of simplified DP implementations: - by removing initialisations (on PatternMatrix:normal) - or a faster interpreted implementation ? o Sublinear time / non-word-based HSP-generation o Stats o Heuristics for flips / reversals ? o Heuristics for SCFGs o Blast-compatible output format (sigh) -- exonerate-2.4.0/aclocal.m40000644000175000017500000012141411222703634012273 00000000000000dnl aclocal.m4 generated automatically by aclocal 1.4-p6 dnl Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. dnl This file is free software; the Free Software Foundation dnl gives unlimited permission to copy and/or distribute it, dnl with or without modifications, as long as this notice is preserved. dnl This program is distributed in the hope that it will be useful, dnl but WITHOUT ANY WARRANTY, to the extent permitted by law; without dnl even the implied warranty of MERCHANTABILITY or FITNESS FOR A dnl PARTICULAR PURPOSE. # lib-prefix.m4 serial 5 (gettext-0.15) dnl Copyright (C) 2001-2005 Free Software Foundation, Inc. dnl This file is free software; the Free Software Foundation dnl gives unlimited permission to copy and/or distribute it, dnl with or without modifications, as long as this notice is preserved. dnl From Bruno Haible. dnl AC_LIB_ARG_WITH is synonymous to AC_ARG_WITH in autoconf-2.13, and dnl similar to AC_ARG_WITH in autoconf 2.52...2.57 except that is doesn't dnl require excessive bracketing. ifdef([AC_HELP_STRING], [AC_DEFUN([AC_LIB_ARG_WITH], [AC_ARG_WITH([$1],[[$2]],[$3],[$4])])], [AC_DEFUN([AC_][LIB_ARG_WITH], [AC_ARG_WITH([$1],[$2],[$3],[$4])])]) dnl AC_LIB_PREFIX adds to the CPPFLAGS and LDFLAGS the flags that are needed dnl to access previously installed libraries. The basic assumption is that dnl a user will want packages to use other packages he previously installed dnl with the same --prefix option. dnl This macro is not needed if only AC_LIB_LINKFLAGS is used to locate dnl libraries, but is otherwise very convenient. AC_DEFUN([AC_LIB_PREFIX], [ AC_BEFORE([$0], [AC_LIB_LINKFLAGS]) AC_REQUIRE([AC_PROG_CC]) AC_REQUIRE([AC_CANONICAL_HOST]) AC_REQUIRE([AC_LIB_PREPARE_MULTILIB]) AC_REQUIRE([AC_LIB_PREPARE_PREFIX]) dnl By default, look in $includedir and $libdir. use_additional=yes AC_LIB_WITH_FINAL_PREFIX([ eval additional_includedir=\"$includedir\" eval additional_libdir=\"$libdir\" ]) AC_LIB_ARG_WITH([lib-prefix], [ --with-lib-prefix[=DIR] search for libraries in DIR/include and DIR/lib --without-lib-prefix don't search for libraries in includedir and libdir], [ if test "X$withval" = "Xno"; then use_additional=no else if test "X$withval" = "X"; then AC_LIB_WITH_FINAL_PREFIX([ eval additional_includedir=\"$includedir\" eval additional_libdir=\"$libdir\" ]) else additional_includedir="$withval/include" additional_libdir="$withval/$acl_libdirstem" fi fi ]) if test $use_additional = yes; then dnl Potentially add $additional_includedir to $CPPFLAGS. dnl But don't add it dnl 1. if it's the standard /usr/include, dnl 2. if it's already present in $CPPFLAGS, dnl 3. if it's /usr/local/include and we are using GCC on Linux, dnl 4. if it doesn't exist as a directory. if test "X$additional_includedir" != "X/usr/include"; then haveit= for x in $CPPFLAGS; do AC_LIB_WITH_FINAL_PREFIX([eval x=\"$x\"]) if test "X$x" = "X-I$additional_includedir"; then haveit=yes break fi done if test -z "$haveit"; then if test "X$additional_includedir" = "X/usr/local/include"; then if test -n "$GCC"; then case $host_os in linux* | gnu* | k*bsd*-gnu) haveit=yes;; esac fi fi if test -z "$haveit"; then if test -d "$additional_includedir"; then dnl Really add $additional_includedir to $CPPFLAGS. CPPFLAGS="${CPPFLAGS}${CPPFLAGS:+ }-I$additional_includedir" fi fi fi fi dnl Potentially add $additional_libdir to $LDFLAGS. dnl But don't add it dnl 1. if it's the standard /usr/lib, dnl 2. if it's already present in $LDFLAGS, dnl 3. if it's /usr/local/lib and we are using GCC on Linux, dnl 4. if it doesn't exist as a directory. if test "X$additional_libdir" != "X/usr/$acl_libdirstem"; then haveit= for x in $LDFLAGS; do AC_LIB_WITH_FINAL_PREFIX([eval x=\"$x\"]) if test "X$x" = "X-L$additional_libdir"; then haveit=yes break fi done if test -z "$haveit"; then if test "X$additional_libdir" = "X/usr/local/$acl_libdirstem"; then if test -n "$GCC"; then case $host_os in linux*) haveit=yes;; esac fi fi if test -z "$haveit"; then if test -d "$additional_libdir"; then dnl Really add $additional_libdir to $LDFLAGS. LDFLAGS="${LDFLAGS}${LDFLAGS:+ }-L$additional_libdir" fi fi fi fi fi ]) dnl AC_LIB_PREPARE_PREFIX creates variables acl_final_prefix, dnl acl_final_exec_prefix, containing the values to which $prefix and dnl $exec_prefix will expand at the end of the configure script. AC_DEFUN([AC_LIB_PREPARE_PREFIX], [ dnl Unfortunately, prefix and exec_prefix get only finally determined dnl at the end of configure. if test "X$prefix" = "XNONE"; then acl_final_prefix="$ac_default_prefix" else acl_final_prefix="$prefix" fi if test "X$exec_prefix" = "XNONE"; then acl_final_exec_prefix='${prefix}' else acl_final_exec_prefix="$exec_prefix" fi acl_save_prefix="$prefix" prefix="$acl_final_prefix" eval acl_final_exec_prefix=\"$acl_final_exec_prefix\" prefix="$acl_save_prefix" ]) dnl AC_LIB_WITH_FINAL_PREFIX([statement]) evaluates statement, with the dnl variables prefix and exec_prefix bound to the values they will have dnl at the end of the configure script. AC_DEFUN([AC_LIB_WITH_FINAL_PREFIX], [ acl_save_prefix="$prefix" prefix="$acl_final_prefix" acl_save_exec_prefix="$exec_prefix" exec_prefix="$acl_final_exec_prefix" $1 exec_prefix="$acl_save_exec_prefix" prefix="$acl_save_prefix" ]) dnl AC_LIB_PREPARE_MULTILIB creates a variable acl_libdirstem, containing dnl the basename of the libdir, either "lib" or "lib64". AC_DEFUN([AC_LIB_PREPARE_MULTILIB], [ dnl There is no formal standard regarding lib and lib64. The current dnl practice is that on a system supporting 32-bit and 64-bit instruction dnl sets or ABIs, 64-bit libraries go under $prefix/lib64 and 32-bit dnl libraries go under $prefix/lib. We determine the compiler's default dnl mode by looking at the compiler's library search path. If at least dnl of its elements ends in /lib64 or points to a directory whose absolute dnl pathname ends in /lib64, we assume a 64-bit ABI. Otherwise we use the dnl default, namely "lib". acl_libdirstem=lib searchpath=`(LC_ALL=C $CC -print-search-dirs) 2>/dev/null | sed -n -e 's,^libraries: ,,p' | sed -e 's,^=,,'` if test -n "$searchpath"; then acl_save_IFS="${IFS= }"; IFS=":" for searchdir in $searchpath; do if test -d "$searchdir"; then case "$searchdir" in */lib64/ | */lib64 ) acl_libdirstem=lib64 ;; *) searchdir=`cd "$searchdir" && pwd` case "$searchdir" in */lib64 ) acl_libdirstem=lib64 ;; esac ;; esac fi done IFS="$acl_save_IFS" fi ]) # lib-link.m4 serial 9 (gettext-0.16) dnl Copyright (C) 2001-2006 Free Software Foundation, Inc. dnl This file is free software; the Free Software Foundation dnl gives unlimited permission to copy and/or distribute it, dnl with or without modifications, as long as this notice is preserved. dnl From Bruno Haible. AC_PREREQ(2.50) dnl AC_LIB_LINKFLAGS(name [, dependencies]) searches for libname and dnl the libraries corresponding to explicit and implicit dependencies. dnl Sets and AC_SUBSTs the LIB${NAME} and LTLIB${NAME} variables and dnl augments the CPPFLAGS variable. AC_DEFUN([AC_LIB_LINKFLAGS], [ AC_REQUIRE([AC_LIB_PREPARE_PREFIX]) AC_REQUIRE([AC_LIB_RPATH]) define([Name],[translit([$1],[./-], [___])]) define([NAME],[translit([$1],[abcdefghijklmnopqrstuvwxyz./-], [ABCDEFGHIJKLMNOPQRSTUVWXYZ___])]) AC_CACHE_CHECK([how to link with lib[]$1], [ac_cv_lib[]Name[]_libs], [ AC_LIB_LINKFLAGS_BODY([$1], [$2]) ac_cv_lib[]Name[]_libs="$LIB[]NAME" ac_cv_lib[]Name[]_ltlibs="$LTLIB[]NAME" ac_cv_lib[]Name[]_cppflags="$INC[]NAME" ]) LIB[]NAME="$ac_cv_lib[]Name[]_libs" LTLIB[]NAME="$ac_cv_lib[]Name[]_ltlibs" INC[]NAME="$ac_cv_lib[]Name[]_cppflags" AC_LIB_APPENDTOVAR([CPPFLAGS], [$INC]NAME) AC_SUBST([LIB]NAME) AC_SUBST([LTLIB]NAME) dnl Also set HAVE_LIB[]NAME so that AC_LIB_HAVE_LINKFLAGS can reuse the dnl results of this search when this library appears as a dependency. HAVE_LIB[]NAME=yes undefine([Name]) undefine([NAME]) ]) dnl AC_LIB_HAVE_LINKFLAGS(name, dependencies, includes, testcode) dnl searches for libname and the libraries corresponding to explicit and dnl implicit dependencies, together with the specified include files and dnl the ability to compile and link the specified testcode. If found, it dnl sets and AC_SUBSTs HAVE_LIB${NAME}=yes and the LIB${NAME} and dnl LTLIB${NAME} variables and augments the CPPFLAGS variable, and dnl #defines HAVE_LIB${NAME} to 1. Otherwise, it sets and AC_SUBSTs dnl HAVE_LIB${NAME}=no and LIB${NAME} and LTLIB${NAME} to empty. AC_DEFUN([AC_LIB_HAVE_LINKFLAGS], [ AC_REQUIRE([AC_LIB_PREPARE_PREFIX]) AC_REQUIRE([AC_LIB_RPATH]) define([Name],[translit([$1],[./-], [___])]) define([NAME],[translit([$1],[abcdefghijklmnopqrstuvwxyz./-], [ABCDEFGHIJKLMNOPQRSTUVWXYZ___])]) dnl Search for lib[]Name and define LIB[]NAME, LTLIB[]NAME and INC[]NAME dnl accordingly. AC_LIB_LINKFLAGS_BODY([$1], [$2]) dnl Add $INC[]NAME to CPPFLAGS before performing the following checks, dnl because if the user has installed lib[]Name and not disabled its use dnl via --without-lib[]Name-prefix, he wants to use it. ac_save_CPPFLAGS="$CPPFLAGS" AC_LIB_APPENDTOVAR([CPPFLAGS], [$INC]NAME) AC_CACHE_CHECK([for lib[]$1], [ac_cv_lib[]Name], [ ac_save_LIBS="$LIBS" LIBS="$LIBS $LIB[]NAME" AC_TRY_LINK([$3], [$4], [ac_cv_lib[]Name=yes], [ac_cv_lib[]Name=no]) LIBS="$ac_save_LIBS" ]) if test "$ac_cv_lib[]Name" = yes; then HAVE_LIB[]NAME=yes AC_DEFINE([HAVE_LIB]NAME, 1, [Define if you have the $1 library.]) AC_MSG_CHECKING([how to link with lib[]$1]) AC_MSG_RESULT([$LIB[]NAME]) else HAVE_LIB[]NAME=no dnl If $LIB[]NAME didn't lead to a usable library, we don't need dnl $INC[]NAME either. CPPFLAGS="$ac_save_CPPFLAGS" LIB[]NAME= LTLIB[]NAME= fi AC_SUBST([HAVE_LIB]NAME) AC_SUBST([LIB]NAME) AC_SUBST([LTLIB]NAME) undefine([Name]) undefine([NAME]) ]) dnl Determine the platform dependent parameters needed to use rpath: dnl libext, shlibext, hardcode_libdir_flag_spec, hardcode_libdir_separator, dnl hardcode_direct, hardcode_minus_L. AC_DEFUN([AC_LIB_RPATH], [ dnl Tell automake >= 1.10 to complain if config.rpath is missing. m4_ifdef([AC_REQUIRE_AUX_FILE], [AC_REQUIRE_AUX_FILE([config.rpath])]) AC_REQUIRE([AC_PROG_CC]) dnl we use $CC, $GCC, $LDFLAGS AC_REQUIRE([AC_LIB_PROG_LD]) dnl we use $LD, $with_gnu_ld AC_REQUIRE([AC_CANONICAL_HOST]) dnl we use $host AC_REQUIRE([AC_CONFIG_AUX_DIR_DEFAULT]) dnl we use $ac_aux_dir AC_CACHE_CHECK([for shared library run path origin], acl_cv_rpath, [ CC="$CC" GCC="$GCC" LDFLAGS="$LDFLAGS" LD="$LD" with_gnu_ld="$with_gnu_ld" \ ${CONFIG_SHELL-/bin/sh} "$ac_aux_dir/config.rpath" "$host" > conftest.sh . ./conftest.sh rm -f ./conftest.sh acl_cv_rpath=done ]) wl="$acl_cv_wl" libext="$acl_cv_libext" shlibext="$acl_cv_shlibext" hardcode_libdir_flag_spec="$acl_cv_hardcode_libdir_flag_spec" hardcode_libdir_separator="$acl_cv_hardcode_libdir_separator" hardcode_direct="$acl_cv_hardcode_direct" hardcode_minus_L="$acl_cv_hardcode_minus_L" dnl Determine whether the user wants rpath handling at all. AC_ARG_ENABLE(rpath, [ --disable-rpath do not hardcode runtime library paths], :, enable_rpath=yes) ]) dnl AC_LIB_LINKFLAGS_BODY(name [, dependencies]) searches for libname and dnl the libraries corresponding to explicit and implicit dependencies. dnl Sets the LIB${NAME}, LTLIB${NAME} and INC${NAME} variables. AC_DEFUN([AC_LIB_LINKFLAGS_BODY], [ AC_REQUIRE([AC_LIB_PREPARE_MULTILIB]) define([NAME],[translit([$1],[abcdefghijklmnopqrstuvwxyz./-], [ABCDEFGHIJKLMNOPQRSTUVWXYZ___])]) dnl By default, look in $includedir and $libdir. use_additional=yes AC_LIB_WITH_FINAL_PREFIX([ eval additional_includedir=\"$includedir\" eval additional_libdir=\"$libdir\" ]) AC_LIB_ARG_WITH([lib$1-prefix], [ --with-lib$1-prefix[=DIR] search for lib$1 in DIR/include and DIR/lib --without-lib$1-prefix don't search for lib$1 in includedir and libdir], [ if test "X$withval" = "Xno"; then use_additional=no else if test "X$withval" = "X"; then AC_LIB_WITH_FINAL_PREFIX([ eval additional_includedir=\"$includedir\" eval additional_libdir=\"$libdir\" ]) else additional_includedir="$withval/include" additional_libdir="$withval/$acl_libdirstem" fi fi ]) dnl Search the library and its dependencies in $additional_libdir and dnl $LDFLAGS. Using breadth-first-seach. LIB[]NAME= LTLIB[]NAME= INC[]NAME= rpathdirs= ltrpathdirs= names_already_handled= names_next_round='$1 $2' while test -n "$names_next_round"; do names_this_round="$names_next_round" names_next_round= for name in $names_this_round; do already_handled= for n in $names_already_handled; do if test "$n" = "$name"; then already_handled=yes break fi done if test -z "$already_handled"; then names_already_handled="$names_already_handled $name" dnl See if it was already located by an earlier AC_LIB_LINKFLAGS dnl or AC_LIB_HAVE_LINKFLAGS call. uppername=`echo "$name" | sed -e 'y|abcdefghijklmnopqrstuvwxyz./-|ABCDEFGHIJKLMNOPQRSTUVWXYZ___|'` eval value=\"\$HAVE_LIB$uppername\" if test -n "$value"; then if test "$value" = yes; then eval value=\"\$LIB$uppername\" test -z "$value" || LIB[]NAME="${LIB[]NAME}${LIB[]NAME:+ }$value" eval value=\"\$LTLIB$uppername\" test -z "$value" || LTLIB[]NAME="${LTLIB[]NAME}${LTLIB[]NAME:+ }$value" else dnl An earlier call to AC_LIB_HAVE_LINKFLAGS has determined dnl that this library doesn't exist. So just drop it. : fi else dnl Search the library lib$name in $additional_libdir and $LDFLAGS dnl and the already constructed $LIBNAME/$LTLIBNAME. found_dir= found_la= found_so= found_a= if test $use_additional = yes; then if test -n "$shlibext" \ && { test -f "$additional_libdir/lib$name.$shlibext" \ || { test "$shlibext" = dll \ && test -f "$additional_libdir/lib$name.dll.a"; }; }; then found_dir="$additional_libdir" if test -f "$additional_libdir/lib$name.$shlibext"; then found_so="$additional_libdir/lib$name.$shlibext" else found_so="$additional_libdir/lib$name.dll.a" fi if test -f "$additional_libdir/lib$name.la"; then found_la="$additional_libdir/lib$name.la" fi else if test -f "$additional_libdir/lib$name.$libext"; then found_dir="$additional_libdir" found_a="$additional_libdir/lib$name.$libext" if test -f "$additional_libdir/lib$name.la"; then found_la="$additional_libdir/lib$name.la" fi fi fi fi if test "X$found_dir" = "X"; then for x in $LDFLAGS $LTLIB[]NAME; do AC_LIB_WITH_FINAL_PREFIX([eval x=\"$x\"]) case "$x" in -L*) dir=`echo "X$x" | sed -e 's/^X-L//'` if test -n "$shlibext" \ && { test -f "$dir/lib$name.$shlibext" \ || { test "$shlibext" = dll \ && test -f "$dir/lib$name.dll.a"; }; }; then found_dir="$dir" if test -f "$dir/lib$name.$shlibext"; then found_so="$dir/lib$name.$shlibext" else found_so="$dir/lib$name.dll.a" fi if test -f "$dir/lib$name.la"; then found_la="$dir/lib$name.la" fi else if test -f "$dir/lib$name.$libext"; then found_dir="$dir" found_a="$dir/lib$name.$libext" if test -f "$dir/lib$name.la"; then found_la="$dir/lib$name.la" fi fi fi ;; esac if test "X$found_dir" != "X"; then break fi done fi if test "X$found_dir" != "X"; then dnl Found the library. LTLIB[]NAME="${LTLIB[]NAME}${LTLIB[]NAME:+ }-L$found_dir -l$name" if test "X$found_so" != "X"; then dnl Linking with a shared library. We attempt to hardcode its dnl directory into the executable's runpath, unless it's the dnl standard /usr/lib. if test "$enable_rpath" = no || test "X$found_dir" = "X/usr/$acl_libdirstem"; then dnl No hardcoding is needed. LIB[]NAME="${LIB[]NAME}${LIB[]NAME:+ }$found_so" else dnl Use an explicit option to hardcode DIR into the resulting dnl binary. dnl Potentially add DIR to ltrpathdirs. dnl The ltrpathdirs will be appended to $LTLIBNAME at the end. haveit= for x in $ltrpathdirs; do if test "X$x" = "X$found_dir"; then haveit=yes break fi done if test -z "$haveit"; then ltrpathdirs="$ltrpathdirs $found_dir" fi dnl The hardcoding into $LIBNAME is system dependent. if test "$hardcode_direct" = yes; then dnl Using DIR/libNAME.so during linking hardcodes DIR into the dnl resulting binary. LIB[]NAME="${LIB[]NAME}${LIB[]NAME:+ }$found_so" else if test -n "$hardcode_libdir_flag_spec" && test "$hardcode_minus_L" = no; then dnl Use an explicit option to hardcode DIR into the resulting dnl binary. LIB[]NAME="${LIB[]NAME}${LIB[]NAME:+ }$found_so" dnl Potentially add DIR to rpathdirs. dnl The rpathdirs will be appended to $LIBNAME at the end. haveit= for x in $rpathdirs; do if test "X$x" = "X$found_dir"; then haveit=yes break fi done if test -z "$haveit"; then rpathdirs="$rpathdirs $found_dir" fi else dnl Rely on "-L$found_dir". dnl But don't add it if it's already contained in the LDFLAGS dnl or the already constructed $LIBNAME haveit= for x in $LDFLAGS $LIB[]NAME; do AC_LIB_WITH_FINAL_PREFIX([eval x=\"$x\"]) if test "X$x" = "X-L$found_dir"; then haveit=yes break fi done if test -z "$haveit"; then LIB[]NAME="${LIB[]NAME}${LIB[]NAME:+ }-L$found_dir" fi if test "$hardcode_minus_L" != no; then dnl FIXME: Not sure whether we should use dnl "-L$found_dir -l$name" or "-L$found_dir $found_so" dnl here. LIB[]NAME="${LIB[]NAME}${LIB[]NAME:+ }$found_so" else dnl We cannot use $hardcode_runpath_var and LD_RUN_PATH dnl here, because this doesn't fit in flags passed to the dnl compiler. So give up. No hardcoding. This affects only dnl very old systems. dnl FIXME: Not sure whether we should use dnl "-L$found_dir -l$name" or "-L$found_dir $found_so" dnl here. LIB[]NAME="${LIB[]NAME}${LIB[]NAME:+ }-l$name" fi fi fi fi else if test "X$found_a" != "X"; then dnl Linking with a static library. LIB[]NAME="${LIB[]NAME}${LIB[]NAME:+ }$found_a" else dnl We shouldn't come here, but anyway it's good to have a dnl fallback. LIB[]NAME="${LIB[]NAME}${LIB[]NAME:+ }-L$found_dir -l$name" fi fi dnl Assume the include files are nearby. additional_includedir= case "$found_dir" in */$acl_libdirstem | */$acl_libdirstem/) basedir=`echo "X$found_dir" | sed -e 's,^X,,' -e "s,/$acl_libdirstem/"'*$,,'` additional_includedir="$basedir/include" ;; esac if test "X$additional_includedir" != "X"; then dnl Potentially add $additional_includedir to $INCNAME. dnl But don't add it dnl 1. if it's the standard /usr/include, dnl 2. if it's /usr/local/include and we are using GCC on Linux, dnl 3. if it's already present in $CPPFLAGS or the already dnl constructed $INCNAME, dnl 4. if it doesn't exist as a directory. if test "X$additional_includedir" != "X/usr/include"; then haveit= if test "X$additional_includedir" = "X/usr/local/include"; then if test -n "$GCC"; then case $host_os in linux* | gnu* | k*bsd*-gnu) haveit=yes;; esac fi fi if test -z "$haveit"; then for x in $CPPFLAGS $INC[]NAME; do AC_LIB_WITH_FINAL_PREFIX([eval x=\"$x\"]) if test "X$x" = "X-I$additional_includedir"; then haveit=yes break fi done if test -z "$haveit"; then if test -d "$additional_includedir"; then dnl Really add $additional_includedir to $INCNAME. INC[]NAME="${INC[]NAME}${INC[]NAME:+ }-I$additional_includedir" fi fi fi fi fi dnl Look for dependencies. if test -n "$found_la"; then dnl Read the .la file. It defines the variables dnl dlname, library_names, old_library, dependency_libs, current, dnl age, revision, installed, dlopen, dlpreopen, libdir. save_libdir="$libdir" case "$found_la" in */* | *\\*) . "$found_la" ;; *) . "./$found_la" ;; esac libdir="$save_libdir" dnl We use only dependency_libs. for dep in $dependency_libs; do case "$dep" in -L*) additional_libdir=`echo "X$dep" | sed -e 's/^X-L//'` dnl Potentially add $additional_libdir to $LIBNAME and $LTLIBNAME. dnl But don't add it dnl 1. if it's the standard /usr/lib, dnl 2. if it's /usr/local/lib and we are using GCC on Linux, dnl 3. if it's already present in $LDFLAGS or the already dnl constructed $LIBNAME, dnl 4. if it doesn't exist as a directory. if test "X$additional_libdir" != "X/usr/$acl_libdirstem"; then haveit= if test "X$additional_libdir" = "X/usr/local/$acl_libdirstem"; then if test -n "$GCC"; then case $host_os in linux* | gnu* | k*bsd*-gnu) haveit=yes;; esac fi fi if test -z "$haveit"; then haveit= for x in $LDFLAGS $LIB[]NAME; do AC_LIB_WITH_FINAL_PREFIX([eval x=\"$x\"]) if test "X$x" = "X-L$additional_libdir"; then haveit=yes break fi done if test -z "$haveit"; then if test -d "$additional_libdir"; then dnl Really add $additional_libdir to $LIBNAME. LIB[]NAME="${LIB[]NAME}${LIB[]NAME:+ }-L$additional_libdir" fi fi haveit= for x in $LDFLAGS $LTLIB[]NAME; do AC_LIB_WITH_FINAL_PREFIX([eval x=\"$x\"]) if test "X$x" = "X-L$additional_libdir"; then haveit=yes break fi done if test -z "$haveit"; then if test -d "$additional_libdir"; then dnl Really add $additional_libdir to $LTLIBNAME. LTLIB[]NAME="${LTLIB[]NAME}${LTLIB[]NAME:+ }-L$additional_libdir" fi fi fi fi ;; -R*) dir=`echo "X$dep" | sed -e 's/^X-R//'` if test "$enable_rpath" != no; then dnl Potentially add DIR to rpathdirs. dnl The rpathdirs will be appended to $LIBNAME at the end. haveit= for x in $rpathdirs; do if test "X$x" = "X$dir"; then haveit=yes break fi done if test -z "$haveit"; then rpathdirs="$rpathdirs $dir" fi dnl Potentially add DIR to ltrpathdirs. dnl The ltrpathdirs will be appended to $LTLIBNAME at the end. haveit= for x in $ltrpathdirs; do if test "X$x" = "X$dir"; then haveit=yes break fi done if test -z "$haveit"; then ltrpathdirs="$ltrpathdirs $dir" fi fi ;; -l*) dnl Handle this in the next round. names_next_round="$names_next_round "`echo "X$dep" | sed -e 's/^X-l//'` ;; *.la) dnl Handle this in the next round. Throw away the .la's dnl directory; it is already contained in a preceding -L dnl option. names_next_round="$names_next_round "`echo "X$dep" | sed -e 's,^X.*/,,' -e 's,^lib,,' -e 's,\.la$,,'` ;; *) dnl Most likely an immediate library name. LIB[]NAME="${LIB[]NAME}${LIB[]NAME:+ }$dep" LTLIB[]NAME="${LTLIB[]NAME}${LTLIB[]NAME:+ }$dep" ;; esac done fi else dnl Didn't find the library; assume it is in the system directories dnl known to the linker and runtime loader. (All the system dnl directories known to the linker should also be known to the dnl runtime loader, otherwise the system is severely misconfigured.) LIB[]NAME="${LIB[]NAME}${LIB[]NAME:+ }-l$name" LTLIB[]NAME="${LTLIB[]NAME}${LTLIB[]NAME:+ }-l$name" fi fi fi done done if test "X$rpathdirs" != "X"; then if test -n "$hardcode_libdir_separator"; then dnl Weird platform: only the last -rpath option counts, the user must dnl pass all path elements in one option. We can arrange that for a dnl single library, but not when more than one $LIBNAMEs are used. alldirs= for found_dir in $rpathdirs; do alldirs="${alldirs}${alldirs:+$hardcode_libdir_separator}$found_dir" done dnl Note: hardcode_libdir_flag_spec uses $libdir and $wl. acl_save_libdir="$libdir" libdir="$alldirs" eval flag=\"$hardcode_libdir_flag_spec\" libdir="$acl_save_libdir" LIB[]NAME="${LIB[]NAME}${LIB[]NAME:+ }$flag" else dnl The -rpath options are cumulative. for found_dir in $rpathdirs; do acl_save_libdir="$libdir" libdir="$found_dir" eval flag=\"$hardcode_libdir_flag_spec\" libdir="$acl_save_libdir" LIB[]NAME="${LIB[]NAME}${LIB[]NAME:+ }$flag" done fi fi if test "X$ltrpathdirs" != "X"; then dnl When using libtool, the option that works for both libraries and dnl executables is -R. The -R options are cumulative. for found_dir in $ltrpathdirs; do LTLIB[]NAME="${LTLIB[]NAME}${LTLIB[]NAME:+ }-R$found_dir" done fi ]) dnl AC_LIB_APPENDTOVAR(VAR, CONTENTS) appends the elements of CONTENTS to VAR, dnl unless already present in VAR. dnl Works only for CPPFLAGS, not for LIB* variables because that sometimes dnl contains two or three consecutive elements that belong together. AC_DEFUN([AC_LIB_APPENDTOVAR], [ for element in [$2]; do haveit= for x in $[$1]; do AC_LIB_WITH_FINAL_PREFIX([eval x=\"$x\"]) if test "X$x" = "X$element"; then haveit=yes break fi done if test -z "$haveit"; then [$1]="${[$1]}${[$1]:+ }$element" fi done ]) dnl For those cases where a variable contains several -L and -l options dnl referring to unknown libraries and directories, this macro determines the dnl necessary additional linker options for the runtime path. dnl AC_LIB_LINKFLAGS_FROM_LIBS([LDADDVAR], [LIBSVALUE], [USE-LIBTOOL]) dnl sets LDADDVAR to linker options needed together with LIBSVALUE. dnl If USE-LIBTOOL evaluates to non-empty, linking with libtool is assumed, dnl otherwise linking without libtool is assumed. AC_DEFUN([AC_LIB_LINKFLAGS_FROM_LIBS], [ AC_REQUIRE([AC_LIB_RPATH]) AC_REQUIRE([AC_LIB_PREPARE_MULTILIB]) $1= if test "$enable_rpath" != no; then if test -n "$hardcode_libdir_flag_spec" && test "$hardcode_minus_L" = no; then dnl Use an explicit option to hardcode directories into the resulting dnl binary. rpathdirs= next= for opt in $2; do if test -n "$next"; then dir="$next" dnl No need to hardcode the standard /usr/lib. if test "X$dir" != "X/usr/$acl_libdirstem"; then rpathdirs="$rpathdirs $dir" fi next= else case $opt in -L) next=yes ;; -L*) dir=`echo "X$opt" | sed -e 's,^X-L,,'` dnl No need to hardcode the standard /usr/lib. if test "X$dir" != "X/usr/$acl_libdirstem"; then rpathdirs="$rpathdirs $dir" fi next= ;; *) next= ;; esac fi done if test "X$rpathdirs" != "X"; then if test -n ""$3""; then dnl libtool is used for linking. Use -R options. for dir in $rpathdirs; do $1="${$1}${$1:+ }-R$dir" done else dnl The linker is used for linking directly. if test -n "$hardcode_libdir_separator"; then dnl Weird platform: only the last -rpath option counts, the user dnl must pass all path elements in one option. alldirs= for dir in $rpathdirs; do alldirs="${alldirs}${alldirs:+$hardcode_libdir_separator}$dir" done acl_save_libdir="$libdir" libdir="$alldirs" eval flag=\"$hardcode_libdir_flag_spec\" libdir="$acl_save_libdir" $1="$flag" else dnl The -rpath options are cumulative. for dir in $rpathdirs; do acl_save_libdir="$libdir" libdir="$dir" eval flag=\"$hardcode_libdir_flag_spec\" libdir="$acl_save_libdir" $1="${$1}${$1:+ }$flag" done fi fi fi fi fi AC_SUBST([$1]) ]) # lib-ld.m4 serial 3 (gettext-0.13) dnl Copyright (C) 1996-2003 Free Software Foundation, Inc. dnl This file is free software; the Free Software Foundation dnl gives unlimited permission to copy and/or distribute it, dnl with or without modifications, as long as this notice is preserved. dnl Subroutines of libtool.m4, dnl with replacements s/AC_/AC_LIB/ and s/lt_cv/acl_cv/ to avoid collision dnl with libtool.m4. dnl From libtool-1.4. Sets the variable with_gnu_ld to yes or no. AC_DEFUN([AC_LIB_PROG_LD_GNU], [AC_CACHE_CHECK([if the linker ($LD) is GNU ld], acl_cv_prog_gnu_ld, [# I'd rather use --version here, but apparently some GNU ld's only accept -v. case `$LD -v 2>&1 conf$$.sh echo "exit 0" >>conf$$.sh chmod +x conf$$.sh if (PATH="/nonexistent;."; conf$$.sh) >/dev/null 2>&1; then PATH_SEPARATOR=';' else PATH_SEPARATOR=: fi rm -f conf$$.sh fi ac_prog=ld if test "$GCC" = yes; then # Check if gcc -print-prog-name=ld gives a path. AC_MSG_CHECKING([for ld used by GCC]) case $host in *-*-mingw*) # gcc leaves a trailing carriage return which upsets mingw ac_prog=`($CC -print-prog-name=ld) 2>&5 | tr -d '\015'` ;; *) ac_prog=`($CC -print-prog-name=ld) 2>&5` ;; esac case $ac_prog in # Accept absolute paths. [[\\/]* | [A-Za-z]:[\\/]*)] [re_direlt='/[^/][^/]*/\.\./'] # Canonicalize the path of ld ac_prog=`echo $ac_prog| sed 's%\\\\%/%g'` while echo $ac_prog | grep "$re_direlt" > /dev/null 2>&1; do ac_prog=`echo $ac_prog| sed "s%$re_direlt%/%"` done test -z "$LD" && LD="$ac_prog" ;; "") # If it fails, then pretend we aren't using GCC. ac_prog=ld ;; *) # If it is relative, then search for the first ld in PATH. with_gnu_ld=unknown ;; esac elif test "$with_gnu_ld" = yes; then AC_MSG_CHECKING([for GNU ld]) else AC_MSG_CHECKING([for non-GNU ld]) fi AC_CACHE_VAL(acl_cv_path_LD, [if test -z "$LD"; then IFS="${IFS= }"; ac_save_ifs="$IFS"; IFS="${IFS}${PATH_SEPARATOR-:}" for ac_dir in $PATH; do test -z "$ac_dir" && ac_dir=. if test -f "$ac_dir/$ac_prog" || test -f "$ac_dir/$ac_prog$ac_exeext"; then acl_cv_path_LD="$ac_dir/$ac_prog" # Check to see if the program is GNU ld. I'd rather use --version, # but apparently some GNU ld's only accept -v. # Break only if it was the GNU/non-GNU ld that we prefer. case `"$acl_cv_path_LD" -v 2>&1 < /dev/null` in *GNU* | *'with BFD'*) test "$with_gnu_ld" != no && break ;; *) test "$with_gnu_ld" != yes && break ;; esac fi done IFS="$ac_save_ifs" else acl_cv_path_LD="$LD" # Let the user override the test with a path. fi]) LD="$acl_cv_path_LD" if test -n "$LD"; then AC_MSG_RESULT($LD) else AC_MSG_RESULT(no) fi test -z "$LD" && AC_MSG_ERROR([no acceptable ld found in \$PATH]) AC_LIB_PROG_LD_GNU ]) # Do all the work for Automake. This macro actually does too much -- # some checks are only needed if your package does certain things. # But this isn't really a big deal. # serial 1 dnl Usage: dnl AM_INIT_AUTOMAKE(package,version, [no-define]) AC_DEFUN([AM_INIT_AUTOMAKE], [AC_REQUIRE([AM_SET_CURRENT_AUTOMAKE_VERSION])dnl AC_REQUIRE([AC_PROG_INSTALL]) PACKAGE=[$1] AC_SUBST(PACKAGE) VERSION=[$2] AC_SUBST(VERSION) dnl test to see if srcdir already configured if test "`cd $srcdir && pwd`" != "`pwd`" && test -f $srcdir/config.status; then AC_MSG_ERROR([source directory already configured; run "make distclean" there first]) fi ifelse([$3],, AC_DEFINE_UNQUOTED(PACKAGE, "$PACKAGE", [Name of package]) AC_DEFINE_UNQUOTED(VERSION, "$VERSION", [Version number of package])) AC_REQUIRE([AM_SANITY_CHECK]) AC_REQUIRE([AC_ARG_PROGRAM]) dnl FIXME This is truly gross. missing_dir=`cd $ac_aux_dir && pwd` AM_MISSING_PROG(ACLOCAL, aclocal-${am__api_version}, $missing_dir) AM_MISSING_PROG(AUTOCONF, autoconf, $missing_dir) AM_MISSING_PROG(AUTOMAKE, automake-${am__api_version}, $missing_dir) AM_MISSING_PROG(AUTOHEADER, autoheader, $missing_dir) AM_MISSING_PROG(MAKEINFO, makeinfo, $missing_dir) AC_REQUIRE([AC_PROG_MAKE_SET])]) # Copyright 2002 Free Software Foundation, Inc. # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2, or (at your option) # any later version. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA # AM_AUTOMAKE_VERSION(VERSION) # ---------------------------- # Automake X.Y traces this macro to ensure aclocal.m4 has been # generated from the m4 files accompanying Automake X.Y. AC_DEFUN([AM_AUTOMAKE_VERSION],[am__api_version="1.4"]) # AM_SET_CURRENT_AUTOMAKE_VERSION # ------------------------------- # Call AM_AUTOMAKE_VERSION so it can be traced. # This function is AC_REQUIREd by AC_INIT_AUTOMAKE. AC_DEFUN([AM_SET_CURRENT_AUTOMAKE_VERSION], [AM_AUTOMAKE_VERSION([1.4-p6])]) # # Check to make sure that the build environment is sane. # AC_DEFUN([AM_SANITY_CHECK], [AC_MSG_CHECKING([whether build environment is sane]) # Just in case sleep 1 echo timestamp > conftestfile # Do `set' in a subshell so we don't clobber the current shell's # arguments. Must try -L first in case configure is actually a # symlink; some systems play weird games with the mod time of symlinks # (eg FreeBSD returns the mod time of the symlink's containing # directory). if ( set X `ls -Lt $srcdir/configure conftestfile 2> /dev/null` if test "[$]*" = "X"; then # -L didn't work. set X `ls -t $srcdir/configure conftestfile` fi if test "[$]*" != "X $srcdir/configure conftestfile" \ && test "[$]*" != "X conftestfile $srcdir/configure"; then # If neither matched, then we have a broken ls. This can happen # if, for instance, CONFIG_SHELL is bash and it inherits a # broken ls alias from the environment. This has actually # happened. Such a system could not be considered "sane". AC_MSG_ERROR([ls -t appears to fail. Make sure there is not a broken alias in your environment]) fi test "[$]2" = conftestfile ) then # Ok. : else AC_MSG_ERROR([newly created file is older than distributed files! Check your system clock]) fi rm -f conftest* AC_MSG_RESULT(yes)]) dnl AM_MISSING_PROG(NAME, PROGRAM, DIRECTORY) dnl The program must properly implement --version. AC_DEFUN([AM_MISSING_PROG], [AC_MSG_CHECKING(for working $2) # Run test in a subshell; some versions of sh will print an error if # an executable is not found, even if stderr is redirected. # Redirect stdin to placate older versions of autoconf. Sigh. if ($2 --version) < /dev/null > /dev/null 2>&1; then $1=$2 AC_MSG_RESULT(found) else $1="$3/missing $2" AC_MSG_RESULT(missing) fi AC_SUBST($1)]) exonerate-2.4.0/config.guess0000744000175000017500000011315010371635214012751 00000000000000#! /bin/sh # Attempt to guess a canonical system name. # Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001 # Free Software Foundation, Inc. timestamp='2001-09-04' # This file is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. # # As a special exception to the GNU General Public License, if you # distribute this file as part of a program that contains a # configuration script generated by Autoconf, you may include it under # the same distribution terms that you use for the rest of that program. # Written by Per Bothner . # Please send patches to . # # This script attempts to guess a canonical system name similar to # config.sub. If it succeeds, it prints the system name on stdout, and # exits with 0. Otherwise, it exits with 1. # # The plan is that this can be called by configure scripts if you # don't specify an explicit build system type. me=`echo "$0" | sed -e 's,.*/,,'` usage="\ Usage: $0 [OPTION] Output the configuration name of the system \`$me' is run on. Operation modes: -h, --help print this help, then exit -t, --time-stamp print date of last modification, then exit -v, --version print version number, then exit Report bugs and patches to ." version="\ GNU config.guess ($timestamp) Originally written by Per Bothner. Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE." help=" Try \`$me --help' for more information." # Parse command line while test $# -gt 0 ; do case $1 in --time-stamp | --time* | -t ) echo "$timestamp" ; exit 0 ;; --version | -v ) echo "$version" ; exit 0 ;; --help | --h* | -h ) echo "$usage"; exit 0 ;; -- ) # Stop option processing shift; break ;; - ) # Use stdin as input. break ;; -* ) echo "$me: invalid option $1$help" >&2 exit 1 ;; * ) break ;; esac done if test $# != 0; then echo "$me: too many arguments$help" >&2 exit 1 fi dummy=dummy-$$ trap 'rm -f $dummy.c $dummy.o $dummy.rel $dummy; exit 1' 1 2 15 # CC_FOR_BUILD -- compiler used by this script. # Historically, `CC_FOR_BUILD' used to be named `HOST_CC'. We still # use `HOST_CC' if defined, but it is deprecated. set_cc_for_build='case $CC_FOR_BUILD,$HOST_CC,$CC in ,,) echo "int dummy(){}" > $dummy.c ; for c in cc gcc c89 ; do ($c $dummy.c -c -o $dummy.o) >/dev/null 2>&1 ; if test $? = 0 ; then CC_FOR_BUILD="$c"; break ; fi ; done ; rm -f $dummy.c $dummy.o $dummy.rel ; if test x"$CC_FOR_BUILD" = x ; then CC_FOR_BUILD=no_compiler_found ; fi ;; ,,*) CC_FOR_BUILD=$CC ;; ,*,*) CC_FOR_BUILD=$HOST_CC ;; esac' # This is needed to find uname on a Pyramid OSx when run in the BSD universe. # (ghazi@noc.rutgers.edu 1994-08-24) if (test -f /.attbin/uname) >/dev/null 2>&1 ; then PATH=$PATH:/.attbin ; export PATH fi UNAME_MACHINE=`(uname -m) 2>/dev/null` || UNAME_MACHINE=unknown UNAME_RELEASE=`(uname -r) 2>/dev/null` || UNAME_RELEASE=unknown UNAME_SYSTEM=`(uname -s) 2>/dev/null` || UNAME_SYSTEM=unknown UNAME_VERSION=`(uname -v) 2>/dev/null` || UNAME_VERSION=unknown # Note: order is significant - the case branches are not exclusive. case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in *:NetBSD:*:*) # Netbsd (nbsd) targets should (where applicable) match one or # more of the tupples: *-*-netbsdelf*, *-*-netbsdaout*, # *-*-netbsdecoff* and *-*-netbsd*. For targets that recently # switched to ELF, *-*-netbsd* would select the old # object file format. This provides both forward # compatibility and a consistent mechanism for selecting the # object file format. # Determine the machine/vendor (is the vendor relevant). case "${UNAME_MACHINE}" in amiga) machine=m68k-unknown ;; arm32) machine=arm-unknown ;; atari*) machine=m68k-atari ;; sun3*) machine=m68k-sun ;; mac68k) machine=m68k-apple ;; macppc) machine=powerpc-apple ;; hp3[0-9][05]) machine=m68k-hp ;; ibmrt|romp-ibm) machine=romp-ibm ;; *) machine=${UNAME_MACHINE}-unknown ;; esac # The Operating System including object format, if it has switched # to ELF recently, or will in the future. case "${UNAME_MACHINE}" in i386|sparc|amiga|arm*|hp300|mvme68k|vax|atari|luna68k|mac68k|news68k|next68k|pc532|sun3*|x68k) eval $set_cc_for_build if echo __ELF__ | $CC_FOR_BUILD -E - 2>/dev/null \ | grep __ELF__ >/dev/null then # Once all utilities can be ECOFF (netbsdecoff) or a.out (netbsdaout). # Return netbsd for either. FIX? os=netbsd else os=netbsdelf fi ;; *) os=netbsd ;; esac # The OS release release=`echo ${UNAME_RELEASE}|sed -e 's/[-_].*/\./'` # Since CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM: # contains redundant information, the shorter form: # CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM is used. echo "${machine}-${os}${release}" exit 0 ;; alpha:OSF1:*:*) if test $UNAME_RELEASE = "V4.0"; then UNAME_RELEASE=`/usr/sbin/sizer -v | awk '{print $3}'` fi # A Vn.n version is a released version. # A Tn.n version is a released field test version. # A Xn.n version is an unreleased experimental baselevel. # 1.2 uses "1.2" for uname -r. cat <$dummy.s .data \$Lformat: .byte 37,100,45,37,120,10,0 # "%d-%x\n" .text .globl main .align 4 .ent main main: .frame \$30,16,\$26,0 ldgp \$29,0(\$27) .prologue 1 .long 0x47e03d80 # implver \$0 lda \$2,-1 .long 0x47e20c21 # amask \$2,\$1 lda \$16,\$Lformat mov \$0,\$17 not \$1,\$18 jsr \$26,printf ldgp \$29,0(\$26) mov 0,\$16 jsr \$26,exit .end main EOF eval $set_cc_for_build $CC_FOR_BUILD $dummy.s -o $dummy 2>/dev/null if test "$?" = 0 ; then case `./$dummy` in 0-0) UNAME_MACHINE="alpha" ;; 1-0) UNAME_MACHINE="alphaev5" ;; 1-1) UNAME_MACHINE="alphaev56" ;; 1-101) UNAME_MACHINE="alphapca56" ;; 2-303) UNAME_MACHINE="alphaev6" ;; 2-307) UNAME_MACHINE="alphaev67" ;; 2-1307) UNAME_MACHINE="alphaev68" ;; esac fi rm -f $dummy.s $dummy echo ${UNAME_MACHINE}-dec-osf`echo ${UNAME_RELEASE} | sed -e 's/^[VTX]//' | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz'` exit 0 ;; Alpha\ *:Windows_NT*:*) # How do we know it's Interix rather than the generic POSIX subsystem? # Should we change UNAME_MACHINE based on the output of uname instead # of the specific Alpha model? echo alpha-pc-interix exit 0 ;; 21064:Windows_NT:50:3) echo alpha-dec-winnt3.5 exit 0 ;; Amiga*:UNIX_System_V:4.0:*) echo m68k-unknown-sysv4 exit 0;; amiga:OpenBSD:*:*) echo m68k-unknown-openbsd${UNAME_RELEASE} exit 0 ;; *:[Aa]miga[Oo][Ss]:*:*) echo ${UNAME_MACHINE}-unknown-amigaos exit 0 ;; arc64:OpenBSD:*:*) echo mips64el-unknown-openbsd${UNAME_RELEASE} exit 0 ;; arc:OpenBSD:*:*) echo mipsel-unknown-openbsd${UNAME_RELEASE} exit 0 ;; hkmips:OpenBSD:*:*) echo mips-unknown-openbsd${UNAME_RELEASE} exit 0 ;; pmax:OpenBSD:*:*) echo mipsel-unknown-openbsd${UNAME_RELEASE} exit 0 ;; sgi:OpenBSD:*:*) echo mips-unknown-openbsd${UNAME_RELEASE} exit 0 ;; wgrisc:OpenBSD:*:*) echo mipsel-unknown-openbsd${UNAME_RELEASE} exit 0 ;; *:OS/390:*:*) echo i370-ibm-openedition exit 0 ;; arm:RISC*:1.[012]*:*|arm:riscix:1.[012]*:*) echo arm-acorn-riscix${UNAME_RELEASE} exit 0;; SR2?01:HI-UX/MPP:*:* | SR8000:HI-UX/MPP:*:*) echo hppa1.1-hitachi-hiuxmpp exit 0;; Pyramid*:OSx*:*:* | MIS*:OSx*:*:* | MIS*:SMP_DC-OSx*:*:*) # akee@wpdis03.wpafb.af.mil (Earle F. Ake) contributed MIS and NILE. if test "`(/bin/universe) 2>/dev/null`" = att ; then echo pyramid-pyramid-sysv3 else echo pyramid-pyramid-bsd fi exit 0 ;; NILE*:*:*:dcosx) echo pyramid-pyramid-svr4 exit 0 ;; sun4H:SunOS:5.*:*) echo sparc-hal-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'` exit 0 ;; sun4*:SunOS:5.*:* | tadpole*:SunOS:5.*:*) echo sparc-sun-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'` exit 0 ;; i86pc:SunOS:5.*:*) echo i386-pc-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'` exit 0 ;; sun4*:SunOS:6*:*) # According to config.sub, this is the proper way to canonicalize # SunOS6. Hard to guess exactly what SunOS6 will be like, but # it's likely to be more like Solaris than SunOS4. echo sparc-sun-solaris3`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'` exit 0 ;; sun4*:SunOS:*:*) case "`/usr/bin/arch -k`" in Series*|S4*) UNAME_RELEASE=`uname -v` ;; esac # Japanese Language versions have a version number like `4.1.3-JL'. echo sparc-sun-sunos`echo ${UNAME_RELEASE}|sed -e 's/-/_/'` exit 0 ;; sun3*:SunOS:*:*) echo m68k-sun-sunos${UNAME_RELEASE} exit 0 ;; sun*:*:4.2BSD:*) UNAME_RELEASE=`(head -1 /etc/motd | awk '{print substr($5,1,3)}') 2>/dev/null` test "x${UNAME_RELEASE}" = "x" && UNAME_RELEASE=3 case "`/bin/arch`" in sun3) echo m68k-sun-sunos${UNAME_RELEASE} ;; sun4) echo sparc-sun-sunos${UNAME_RELEASE} ;; esac exit 0 ;; aushp:SunOS:*:*) echo sparc-auspex-sunos${UNAME_RELEASE} exit 0 ;; sparc*:NetBSD:*) echo `uname -p`-unknown-netbsd${UNAME_RELEASE} exit 0 ;; atari*:OpenBSD:*:*) echo m68k-unknown-openbsd${UNAME_RELEASE} exit 0 ;; # The situation for MiNT is a little confusing. The machine name # can be virtually everything (everything which is not # "atarist" or "atariste" at least should have a processor # > m68000). The system name ranges from "MiNT" over "FreeMiNT" # to the lowercase version "mint" (or "freemint"). Finally # the system name "TOS" denotes a system which is actually not # MiNT. But MiNT is downward compatible to TOS, so this should # be no problem. atarist[e]:*MiNT:*:* | atarist[e]:*mint:*:* | atarist[e]:*TOS:*:*) echo m68k-atari-mint${UNAME_RELEASE} exit 0 ;; atari*:*MiNT:*:* | atari*:*mint:*:* | atarist[e]:*TOS:*:*) echo m68k-atari-mint${UNAME_RELEASE} exit 0 ;; *falcon*:*MiNT:*:* | *falcon*:*mint:*:* | *falcon*:*TOS:*:*) echo m68k-atari-mint${UNAME_RELEASE} exit 0 ;; milan*:*MiNT:*:* | milan*:*mint:*:* | *milan*:*TOS:*:*) echo m68k-milan-mint${UNAME_RELEASE} exit 0 ;; hades*:*MiNT:*:* | hades*:*mint:*:* | *hades*:*TOS:*:*) echo m68k-hades-mint${UNAME_RELEASE} exit 0 ;; *:*MiNT:*:* | *:*mint:*:* | *:*TOS:*:*) echo m68k-unknown-mint${UNAME_RELEASE} exit 0 ;; sun3*:OpenBSD:*:*) echo m68k-unknown-openbsd${UNAME_RELEASE} exit 0 ;; mac68k:OpenBSD:*:*) echo m68k-unknown-openbsd${UNAME_RELEASE} exit 0 ;; mvme68k:OpenBSD:*:*) echo m68k-unknown-openbsd${UNAME_RELEASE} exit 0 ;; mvme88k:OpenBSD:*:*) echo m88k-unknown-openbsd${UNAME_RELEASE} exit 0 ;; powerpc:machten:*:*) echo powerpc-apple-machten${UNAME_RELEASE} exit 0 ;; RISC*:Mach:*:*) echo mips-dec-mach_bsd4.3 exit 0 ;; RISC*:ULTRIX:*:*) echo mips-dec-ultrix${UNAME_RELEASE} exit 0 ;; VAX*:ULTRIX*:*:*) echo vax-dec-ultrix${UNAME_RELEASE} exit 0 ;; 2020:CLIX:*:* | 2430:CLIX:*:*) echo clipper-intergraph-clix${UNAME_RELEASE} exit 0 ;; mips:*:*:UMIPS | mips:*:*:RISCos) eval $set_cc_for_build sed 's/^ //' << EOF >$dummy.c #ifdef __cplusplus #include /* for printf() prototype */ int main (int argc, char *argv[]) { #else int main (argc, argv) int argc; char *argv[]; { #endif #if defined (host_mips) && defined (MIPSEB) #if defined (SYSTYPE_SYSV) printf ("mips-mips-riscos%ssysv\n", argv[1]); exit (0); #endif #if defined (SYSTYPE_SVR4) printf ("mips-mips-riscos%ssvr4\n", argv[1]); exit (0); #endif #if defined (SYSTYPE_BSD43) || defined(SYSTYPE_BSD) printf ("mips-mips-riscos%sbsd\n", argv[1]); exit (0); #endif #endif exit (-1); } EOF $CC_FOR_BUILD $dummy.c -o $dummy \ && ./$dummy `echo "${UNAME_RELEASE}" | sed -n 's/\([0-9]*\).*/\1/p'` \ && rm -f $dummy.c $dummy && exit 0 rm -f $dummy.c $dummy echo mips-mips-riscos${UNAME_RELEASE} exit 0 ;; Motorola:PowerMAX_OS:*:*) echo powerpc-motorola-powermax exit 0 ;; Night_Hawk:Power_UNIX:*:*) echo powerpc-harris-powerunix exit 0 ;; m88k:CX/UX:7*:*) echo m88k-harris-cxux7 exit 0 ;; m88k:*:4*:R4*) echo m88k-motorola-sysv4 exit 0 ;; m88k:*:3*:R3*) echo m88k-motorola-sysv3 exit 0 ;; AViiON:dgux:*:*) # DG/UX returns AViiON for all architectures UNAME_PROCESSOR=`/usr/bin/uname -p` if [ $UNAME_PROCESSOR = mc88100 ] || [ $UNAME_PROCESSOR = mc88110 ] then if [ ${TARGET_BINARY_INTERFACE}x = m88kdguxelfx ] || \ [ ${TARGET_BINARY_INTERFACE}x = x ] then echo m88k-dg-dgux${UNAME_RELEASE} else echo m88k-dg-dguxbcs${UNAME_RELEASE} fi else echo i586-dg-dgux${UNAME_RELEASE} fi exit 0 ;; M88*:DolphinOS:*:*) # DolphinOS (SVR3) echo m88k-dolphin-sysv3 exit 0 ;; M88*:*:R3*:*) # Delta 88k system running SVR3 echo m88k-motorola-sysv3 exit 0 ;; XD88*:*:*:*) # Tektronix XD88 system running UTekV (SVR3) echo m88k-tektronix-sysv3 exit 0 ;; Tek43[0-9][0-9]:UTek:*:*) # Tektronix 4300 system running UTek (BSD) echo m68k-tektronix-bsd exit 0 ;; *:IRIX*:*:*) echo mips-sgi-irix`echo ${UNAME_RELEASE}|sed -e 's/-/_/g'` exit 0 ;; ????????:AIX?:[12].1:2) # AIX 2.2.1 or AIX 2.1.1 is RT/PC AIX. echo romp-ibm-aix # uname -m gives an 8 hex-code CPU id exit 0 ;; # Note that: echo "'`uname -s`'" gives 'AIX ' i*86:AIX:*:*) echo i386-ibm-aix exit 0 ;; ia64:AIX:*:*) if [ -x /usr/bin/oslevel ] ; then IBM_REV=`/usr/bin/oslevel` else IBM_REV=${UNAME_VERSION}.${UNAME_RELEASE} fi echo ${UNAME_MACHINE}-ibm-aix${IBM_REV} exit 0 ;; *:AIX:2:3) if grep bos325 /usr/include/stdio.h >/dev/null 2>&1; then eval $set_cc_for_build sed 's/^ //' << EOF >$dummy.c #include main() { if (!__power_pc()) exit(1); puts("powerpc-ibm-aix3.2.5"); exit(0); } EOF $CC_FOR_BUILD $dummy.c -o $dummy && ./$dummy && rm -f $dummy.c $dummy && exit 0 rm -f $dummy.c $dummy echo rs6000-ibm-aix3.2.5 elif grep bos324 /usr/include/stdio.h >/dev/null 2>&1; then echo rs6000-ibm-aix3.2.4 else echo rs6000-ibm-aix3.2 fi exit 0 ;; *:AIX:*:[45]) IBM_CPU_ID=`/usr/sbin/lsdev -C -c processor -S available | head -1 | awk '{ print $1 }'` if /usr/sbin/lsattr -El ${IBM_CPU_ID} | grep ' POWER' >/dev/null 2>&1; then IBM_ARCH=rs6000 else IBM_ARCH=powerpc fi if [ -x /usr/bin/oslevel ] ; then IBM_REV=`/usr/bin/oslevel` else IBM_REV=${UNAME_VERSION}.${UNAME_RELEASE} fi echo ${IBM_ARCH}-ibm-aix${IBM_REV} exit 0 ;; *:AIX:*:*) echo rs6000-ibm-aix exit 0 ;; ibmrt:4.4BSD:*|romp-ibm:BSD:*) echo romp-ibm-bsd4.4 exit 0 ;; ibmrt:*BSD:*|romp-ibm:BSD:*) # covers RT/PC BSD and echo romp-ibm-bsd${UNAME_RELEASE} # 4.3 with uname added to exit 0 ;; # report: romp-ibm BSD 4.3 *:BOSX:*:*) echo rs6000-bull-bosx exit 0 ;; DPX/2?00:B.O.S.:*:*) echo m68k-bull-sysv3 exit 0 ;; 9000/[34]??:4.3bsd:1.*:*) echo m68k-hp-bsd exit 0 ;; hp300:4.4BSD:*:* | 9000/[34]??:4.3bsd:2.*:*) echo m68k-hp-bsd4.4 exit 0 ;; 9000/[34678]??:HP-UX:*:*) HPUX_REV=`echo ${UNAME_RELEASE}|sed -e 's/[^.]*.[0B]*//'` case "${UNAME_MACHINE}" in 9000/31? ) HP_ARCH=m68000 ;; 9000/[34]?? ) HP_ARCH=m68k ;; 9000/[678][0-9][0-9]) case "${HPUX_REV}" in 11.[0-9][0-9]) if [ -x /usr/bin/getconf ]; then sc_cpu_version=`/usr/bin/getconf SC_CPU_VERSION 2>/dev/null` sc_kernel_bits=`/usr/bin/getconf SC_KERNEL_BITS 2>/dev/null` case "${sc_cpu_version}" in 523) HP_ARCH="hppa1.0" ;; # CPU_PA_RISC1_0 528) HP_ARCH="hppa1.1" ;; # CPU_PA_RISC1_1 532) # CPU_PA_RISC2_0 case "${sc_kernel_bits}" in 32) HP_ARCH="hppa2.0n" ;; 64) HP_ARCH="hppa2.0w" ;; esac ;; esac fi ;; esac if [ "${HP_ARCH}" = "" ]; then eval $set_cc_for_build sed 's/^ //' << EOF >$dummy.c #define _HPUX_SOURCE #include #include int main () { #if defined(_SC_KERNEL_BITS) long bits = sysconf(_SC_KERNEL_BITS); #endif long cpu = sysconf (_SC_CPU_VERSION); switch (cpu) { case CPU_PA_RISC1_0: puts ("hppa1.0"); break; case CPU_PA_RISC1_1: puts ("hppa1.1"); break; case CPU_PA_RISC2_0: #if defined(_SC_KERNEL_BITS) switch (bits) { case 64: puts ("hppa2.0w"); break; case 32: puts ("hppa2.0n"); break; default: puts ("hppa2.0"); break; } break; #else /* !defined(_SC_KERNEL_BITS) */ puts ("hppa2.0"); break; #endif default: puts ("hppa1.0"); break; } exit (0); } EOF (CCOPTS= $CC_FOR_BUILD $dummy.c -o $dummy 2>/dev/null ) && HP_ARCH=`./$dummy` if test -z "$HP_ARCH"; then HP_ARCH=hppa; fi rm -f $dummy.c $dummy fi ;; esac echo ${HP_ARCH}-hp-hpux${HPUX_REV} exit 0 ;; ia64:HP-UX:*:*) HPUX_REV=`echo ${UNAME_RELEASE}|sed -e 's/[^.]*.[0B]*//'` echo ia64-hp-hpux${HPUX_REV} exit 0 ;; 3050*:HI-UX:*:*) eval $set_cc_for_build sed 's/^ //' << EOF >$dummy.c #include int main () { long cpu = sysconf (_SC_CPU_VERSION); /* The order matters, because CPU_IS_HP_MC68K erroneously returns true for CPU_PA_RISC1_0. CPU_IS_PA_RISC returns correct results, however. */ if (CPU_IS_PA_RISC (cpu)) { switch (cpu) { case CPU_PA_RISC1_0: puts ("hppa1.0-hitachi-hiuxwe2"); break; case CPU_PA_RISC1_1: puts ("hppa1.1-hitachi-hiuxwe2"); break; case CPU_PA_RISC2_0: puts ("hppa2.0-hitachi-hiuxwe2"); break; default: puts ("hppa-hitachi-hiuxwe2"); break; } } else if (CPU_IS_HP_MC68K (cpu)) puts ("m68k-hitachi-hiuxwe2"); else puts ("unknown-hitachi-hiuxwe2"); exit (0); } EOF $CC_FOR_BUILD $dummy.c -o $dummy && ./$dummy && rm -f $dummy.c $dummy && exit 0 rm -f $dummy.c $dummy echo unknown-hitachi-hiuxwe2 exit 0 ;; 9000/7??:4.3bsd:*:* | 9000/8?[79]:4.3bsd:*:* ) echo hppa1.1-hp-bsd exit 0 ;; 9000/8??:4.3bsd:*:*) echo hppa1.0-hp-bsd exit 0 ;; *9??*:MPE/iX:*:* | *3000*:MPE/iX:*:*) echo hppa1.0-hp-mpeix exit 0 ;; hp7??:OSF1:*:* | hp8?[79]:OSF1:*:* ) echo hppa1.1-hp-osf exit 0 ;; hp8??:OSF1:*:*) echo hppa1.0-hp-osf exit 0 ;; i*86:OSF1:*:*) if [ -x /usr/sbin/sysversion ] ; then echo ${UNAME_MACHINE}-unknown-osf1mk else echo ${UNAME_MACHINE}-unknown-osf1 fi exit 0 ;; parisc*:Lites*:*:*) echo hppa1.1-hp-lites exit 0 ;; hppa*:OpenBSD:*:*) echo hppa-unknown-openbsd exit 0 ;; C1*:ConvexOS:*:* | convex:ConvexOS:C1*:*) echo c1-convex-bsd exit 0 ;; C2*:ConvexOS:*:* | convex:ConvexOS:C2*:*) if getsysinfo -f scalar_acc then echo c32-convex-bsd else echo c2-convex-bsd fi exit 0 ;; C34*:ConvexOS:*:* | convex:ConvexOS:C34*:*) echo c34-convex-bsd exit 0 ;; C38*:ConvexOS:*:* | convex:ConvexOS:C38*:*) echo c38-convex-bsd exit 0 ;; C4*:ConvexOS:*:* | convex:ConvexOS:C4*:*) echo c4-convex-bsd exit 0 ;; CRAY*X-MP:*:*:*) echo xmp-cray-unicos exit 0 ;; CRAY*Y-MP:*:*:*) echo ymp-cray-unicos${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/' exit 0 ;; CRAY*[A-Z]90:*:*:*) echo ${UNAME_MACHINE}-cray-unicos${UNAME_RELEASE} \ | sed -e 's/CRAY.*\([A-Z]90\)/\1/' \ -e y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/ \ -e 's/\.[^.]*$/.X/' exit 0 ;; CRAY*TS:*:*:*) echo t90-cray-unicos${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/' exit 0 ;; CRAY*T3D:*:*:*) echo alpha-cray-unicosmk${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/' exit 0 ;; CRAY*T3E:*:*:*) echo alphaev5-cray-unicosmk${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/' exit 0 ;; CRAY*SV1:*:*:*) echo sv1-cray-unicos${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/' exit 0 ;; CRAY-2:*:*:*) echo cray2-cray-unicos exit 0 ;; F30[01]:UNIX_System_V:*:* | F700:UNIX_System_V:*:*) FUJITSU_PROC=`uname -m | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz'` FUJITSU_SYS=`uname -p | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' | sed -e 's/\///'` FUJITSU_REL=`echo ${UNAME_RELEASE} | sed -e 's/ /_/'` echo "${FUJITSU_PROC}-fujitsu-${FUJITSU_SYS}${FUJITSU_REL}" exit 0 ;; hp300:OpenBSD:*:*) echo m68k-unknown-openbsd${UNAME_RELEASE} exit 0 ;; i*86:BSD/386:*:* | i*86:BSD/OS:*:* | *:Ascend\ Embedded/OS:*:*) echo ${UNAME_MACHINE}-pc-bsdi${UNAME_RELEASE} exit 0 ;; sparc*:BSD/OS:*:*) echo sparc-unknown-bsdi${UNAME_RELEASE} exit 0 ;; *:BSD/OS:*:*) echo ${UNAME_MACHINE}-unknown-bsdi${UNAME_RELEASE} exit 0 ;; *:FreeBSD:*:*) echo ${UNAME_MACHINE}-unknown-freebsd`echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'` exit 0 ;; *:OpenBSD:*:*) echo ${UNAME_MACHINE}-unknown-openbsd`echo ${UNAME_RELEASE}|sed -e 's/[-_].*/\./'` exit 0 ;; i*:CYGWIN*:*) echo ${UNAME_MACHINE}-pc-cygwin exit 0 ;; i*:MINGW*:*) echo ${UNAME_MACHINE}-pc-mingw32 exit 0 ;; i*:PW*:*) echo ${UNAME_MACHINE}-pc-pw32 exit 0 ;; i*:Windows_NT*:* | Pentium*:Windows_NT*:*) # How do we know it's Interix rather than the generic POSIX subsystem? # It also conflicts with pre-2.0 versions of AT&T UWIN. Should we # UNAME_MACHINE based on the output of uname instead of i386? echo i386-pc-interix exit 0 ;; i*:UWIN*:*) echo ${UNAME_MACHINE}-pc-uwin exit 0 ;; p*:CYGWIN*:*) echo powerpcle-unknown-cygwin exit 0 ;; prep*:SunOS:5.*:*) echo powerpcle-unknown-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'` exit 0 ;; *:GNU:*:*) echo `echo ${UNAME_MACHINE}|sed -e 's,[-/].*$,,'`-unknown-gnu`echo ${UNAME_RELEASE}|sed -e 's,/.*$,,'` exit 0 ;; i*86:Minix:*:*) echo ${UNAME_MACHINE}-pc-minix exit 0 ;; arm*:Linux:*:*) echo ${UNAME_MACHINE}-unknown-linux-gnu exit 0 ;; ia64:Linux:*:*) echo ${UNAME_MACHINE}-unknown-linux exit 0 ;; m68*:Linux:*:*) echo ${UNAME_MACHINE}-unknown-linux-gnu exit 0 ;; mips:Linux:*:*) case `sed -n '/^byte/s/^.*: \(.*\) endian/\1/p' < /proc/cpuinfo` in big) echo mips-unknown-linux-gnu && exit 0 ;; little) echo mipsel-unknown-linux-gnu && exit 0 ;; esac ;; ppc:Linux:*:*) echo powerpc-unknown-linux-gnu exit 0 ;; ppc64:Linux:*:*) echo powerpc64-unknown-linux-gnu exit 0 ;; alpha:Linux:*:*) case `sed -n '/^cpu model/s/^.*: \(.*\)/\1/p' < /proc/cpuinfo` in EV5) UNAME_MACHINE=alphaev5 ;; EV56) UNAME_MACHINE=alphaev56 ;; PCA56) UNAME_MACHINE=alphapca56 ;; PCA57) UNAME_MACHINE=alphapca56 ;; EV6) UNAME_MACHINE=alphaev6 ;; EV67) UNAME_MACHINE=alphaev67 ;; EV68*) UNAME_MACHINE=alphaev68 ;; esac objdump --private-headers /bin/sh | grep ld.so.1 >/dev/null if test "$?" = 0 ; then LIBC="libc1" ; else LIBC="" ; fi echo ${UNAME_MACHINE}-unknown-linux-gnu${LIBC} exit 0 ;; parisc:Linux:*:* | hppa:Linux:*:*) # Look for CPU level case `grep '^cpu[^a-z]*:' /proc/cpuinfo 2>/dev/null | cut -d' ' -f2` in PA7*) echo hppa1.1-unknown-linux-gnu ;; PA8*) echo hppa2.0-unknown-linux-gnu ;; *) echo hppa-unknown-linux-gnu ;; esac exit 0 ;; parisc64:Linux:*:* | hppa64:Linux:*:*) echo hppa64-unknown-linux-gnu exit 0 ;; s390:Linux:*:* | s390x:Linux:*:*) echo ${UNAME_MACHINE}-ibm-linux exit 0 ;; sh*:Linux:*:*) echo ${UNAME_MACHINE}-unknown-linux-gnu exit 0 ;; sparc:Linux:*:* | sparc64:Linux:*:*) echo ${UNAME_MACHINE}-unknown-linux-gnu exit 0 ;; x86_64:Linux:*:*) echo x86_64-unknown-linux-gnu exit 0 ;; i*86:Linux:*:*) # The BFD linker knows what the default object file format is, so # first see if it will tell us. cd to the root directory to prevent # problems with other programs or directories called `ld' in the path. ld_supported_targets=`cd /; ld --help 2>&1 \ | sed -ne '/supported targets:/!d s/[ ][ ]*/ /g s/.*supported targets: *// s/ .*// p'` case "$ld_supported_targets" in elf32-i386) TENTATIVE="${UNAME_MACHINE}-pc-linux-gnu" ;; a.out-i386-linux) echo "${UNAME_MACHINE}-pc-linux-gnuaout" exit 0 ;; coff-i386) echo "${UNAME_MACHINE}-pc-linux-gnucoff" exit 0 ;; "") # Either a pre-BFD a.out linker (linux-gnuoldld) or # one that does not give us useful --help. echo "${UNAME_MACHINE}-pc-linux-gnuoldld" exit 0 ;; esac # Determine whether the default compiler is a.out or elf eval $set_cc_for_build cat >$dummy.c < #ifdef __cplusplus #include /* for printf() prototype */ int main (int argc, char *argv[]) { #else int main (argc, argv) int argc; char *argv[]; { #endif #ifdef __ELF__ # ifdef __GLIBC__ # if __GLIBC__ >= 2 printf ("%s-pc-linux-gnu\n", argv[1]); # else printf ("%s-pc-linux-gnulibc1\n", argv[1]); # endif # else printf ("%s-pc-linux-gnulibc1\n", argv[1]); # endif #else printf ("%s-pc-linux-gnuaout\n", argv[1]); #endif return 0; } EOF $CC_FOR_BUILD $dummy.c -o $dummy 2>/dev/null && ./$dummy "${UNAME_MACHINE}" && rm -f $dummy.c $dummy && exit 0 rm -f $dummy.c $dummy test x"${TENTATIVE}" != x && echo "${TENTATIVE}" && exit 0 ;; i*86:DYNIX/ptx:4*:*) # ptx 4.0 does uname -s correctly, with DYNIX/ptx in there. # earlier versions are messed up and put the nodename in both # sysname and nodename. echo i386-sequent-sysv4 exit 0 ;; i*86:UNIX_SV:4.2MP:2.*) # Unixware is an offshoot of SVR4, but it has its own version # number series starting with 2... # I am not positive that other SVR4 systems won't match this, # I just have to hope. -- rms. # Use sysv4.2uw... so that sysv4* matches it. echo ${UNAME_MACHINE}-pc-sysv4.2uw${UNAME_VERSION} exit 0 ;; i*86:*:4.*:* | i*86:SYSTEM_V:4.*:*) UNAME_REL=`echo ${UNAME_RELEASE} | sed 's/\/MP$//'` if grep Novell /usr/include/link.h >/dev/null 2>/dev/null; then echo ${UNAME_MACHINE}-univel-sysv${UNAME_REL} else echo ${UNAME_MACHINE}-pc-sysv${UNAME_REL} fi exit 0 ;; i*86:*:5:[78]*) case `/bin/uname -X | grep "^Machine"` in *486*) UNAME_MACHINE=i486 ;; *Pentium) UNAME_MACHINE=i586 ;; *Pent*|*Celeron) UNAME_MACHINE=i686 ;; esac echo ${UNAME_MACHINE}-unknown-sysv${UNAME_RELEASE}${UNAME_SYSTEM}${UNAME_VERSION} exit 0 ;; i*86:*:3.2:*) if test -f /usr/options/cb.name; then UNAME_REL=`sed -n 's/.*Version //p' /dev/null >/dev/null ; then UNAME_REL=`(/bin/uname -X|egrep Release|sed -e 's/.*= //')` (/bin/uname -X|egrep i80486 >/dev/null) && UNAME_MACHINE=i486 (/bin/uname -X|egrep '^Machine.*Pentium' >/dev/null) \ && UNAME_MACHINE=i586 (/bin/uname -X|egrep '^Machine.*Pent ?II' >/dev/null) \ && UNAME_MACHINE=i686 (/bin/uname -X|egrep '^Machine.*Pentium Pro' >/dev/null) \ && UNAME_MACHINE=i686 echo ${UNAME_MACHINE}-pc-sco$UNAME_REL else echo ${UNAME_MACHINE}-pc-sysv32 fi exit 0 ;; i*86:*DOS:*:*) echo ${UNAME_MACHINE}-pc-msdosdjgpp exit 0 ;; pc:*:*:*) # Left here for compatibility: # uname -m prints for DJGPP always 'pc', but it prints nothing about # the processor, so we play safe by assuming i386. echo i386-pc-msdosdjgpp exit 0 ;; Intel:Mach:3*:*) echo i386-pc-mach3 exit 0 ;; paragon:*:*:*) echo i860-intel-osf1 exit 0 ;; i860:*:4.*:*) # i860-SVR4 if grep Stardent /usr/include/sys/uadmin.h >/dev/null 2>&1 ; then echo i860-stardent-sysv${UNAME_RELEASE} # Stardent Vistra i860-SVR4 else # Add other i860-SVR4 vendors below as they are discovered. echo i860-unknown-sysv${UNAME_RELEASE} # Unknown i860-SVR4 fi exit 0 ;; mini*:CTIX:SYS*5:*) # "miniframe" echo m68010-convergent-sysv exit 0 ;; M68*:*:R3V[567]*:*) test -r /sysV68 && echo 'm68k-motorola-sysv' && exit 0 ;; 3[34]??:*:4.0:3.0 | 3[34]??A:*:4.0:3.0 | 3[34]??,*:*:4.0:3.0 | 4850:*:4.0:3.0) OS_REL='' test -r /etc/.relid \ && OS_REL=.`sed -n 's/[^ ]* [^ ]* \([0-9][0-9]\).*/\1/p' < /etc/.relid` /bin/uname -p 2>/dev/null | grep 86 >/dev/null \ && echo i486-ncr-sysv4.3${OS_REL} && exit 0 /bin/uname -p 2>/dev/null | /bin/grep entium >/dev/null \ && echo i586-ncr-sysv4.3${OS_REL} && exit 0 ;; 3[34]??:*:4.0:* | 3[34]??,*:*:4.0:*) /bin/uname -p 2>/dev/null | grep 86 >/dev/null \ && echo i486-ncr-sysv4 && exit 0 ;; m68*:LynxOS:2.*:* | m68*:LynxOS:3.0*:*) echo m68k-unknown-lynxos${UNAME_RELEASE} exit 0 ;; mc68030:UNIX_System_V:4.*:*) echo m68k-atari-sysv4 exit 0 ;; i*86:LynxOS:2.*:* | i*86:LynxOS:3.[01]*:* | i*86:LynxOS:4.0*:*) echo i386-unknown-lynxos${UNAME_RELEASE} exit 0 ;; TSUNAMI:LynxOS:2.*:*) echo sparc-unknown-lynxos${UNAME_RELEASE} exit 0 ;; rs6000:LynxOS:2.*:*) echo rs6000-unknown-lynxos${UNAME_RELEASE} exit 0 ;; PowerPC:LynxOS:2.*:* | PowerPC:LynxOS:3.[01]*:* | PowerPC:LynxOS:4.0*:*) echo powerpc-unknown-lynxos${UNAME_RELEASE} exit 0 ;; SM[BE]S:UNIX_SV:*:*) echo mips-dde-sysv${UNAME_RELEASE} exit 0 ;; RM*:ReliantUNIX-*:*:*) echo mips-sni-sysv4 exit 0 ;; RM*:SINIX-*:*:*) echo mips-sni-sysv4 exit 0 ;; *:SINIX-*:*:*) if uname -p 2>/dev/null >/dev/null ; then UNAME_MACHINE=`(uname -p) 2>/dev/null` echo ${UNAME_MACHINE}-sni-sysv4 else echo ns32k-sni-sysv fi exit 0 ;; PENTIUM:*:4.0*:*) # Unisys `ClearPath HMP IX 4000' SVR4/MP effort # says echo i586-unisys-sysv4 exit 0 ;; *:UNIX_System_V:4*:FTX*) # From Gerald Hewes . # How about differentiating between stratus architectures? -djm echo hppa1.1-stratus-sysv4 exit 0 ;; *:*:*:FTX*) # From seanf@swdc.stratus.com. echo i860-stratus-sysv4 exit 0 ;; *:VOS:*:*) # From Paul.Green@stratus.com. echo hppa1.1-stratus-vos exit 0 ;; mc68*:A/UX:*:*) echo m68k-apple-aux${UNAME_RELEASE} exit 0 ;; news*:NEWS-OS:6*:*) echo mips-sony-newsos6 exit 0 ;; R[34]000:*System_V*:*:* | R4000:UNIX_SYSV:*:* | R*000:UNIX_SV:*:*) if [ -d /usr/nec ]; then echo mips-nec-sysv${UNAME_RELEASE} else echo mips-unknown-sysv${UNAME_RELEASE} fi exit 0 ;; BeBox:BeOS:*:*) # BeOS running on hardware made by Be, PPC only. echo powerpc-be-beos exit 0 ;; BeMac:BeOS:*:*) # BeOS running on Mac or Mac clone, PPC only. echo powerpc-apple-beos exit 0 ;; BePC:BeOS:*:*) # BeOS running on Intel PC compatible. echo i586-pc-beos exit 0 ;; SX-4:SUPER-UX:*:*) echo sx4-nec-superux${UNAME_RELEASE} exit 0 ;; SX-5:SUPER-UX:*:*) echo sx5-nec-superux${UNAME_RELEASE} exit 0 ;; Power*:Rhapsody:*:*) echo powerpc-apple-rhapsody${UNAME_RELEASE} exit 0 ;; *:Rhapsody:*:*) echo ${UNAME_MACHINE}-apple-rhapsody${UNAME_RELEASE} exit 0 ;; *:Darwin:*:*) echo `uname -p`-apple-darwin${UNAME_RELEASE} exit 0 ;; *:procnto*:*:* | *:QNX:[0123456789]*:*) if test "${UNAME_MACHINE}" = "x86pc"; then UNAME_MACHINE=pc fi echo `uname -p`-${UNAME_MACHINE}-nto-qnx exit 0 ;; *:QNX:*:4*) echo i386-pc-qnx exit 0 ;; NSR-[KW]:NONSTOP_KERNEL:*:*) echo nsr-tandem-nsk${UNAME_RELEASE} exit 0 ;; *:NonStop-UX:*:*) echo mips-compaq-nonstopux exit 0 ;; BS2000:POSIX*:*:*) echo bs2000-siemens-sysv exit 0 ;; DS/*:UNIX_System_V:*:*) echo ${UNAME_MACHINE}-${UNAME_SYSTEM}-${UNAME_RELEASE} exit 0 ;; *:Plan9:*:*) # "uname -m" is not consistent, so use $cputype instead. 386 # is converted to i386 for consistency with other x86 # operating systems. if test "$cputype" = "386"; then UNAME_MACHINE=i386 else UNAME_MACHINE="$cputype" fi echo ${UNAME_MACHINE}-unknown-plan9 exit 0 ;; i*86:OS/2:*:*) # If we were able to find `uname', then EMX Unix compatibility # is probably installed. echo ${UNAME_MACHINE}-pc-os2-emx exit 0 ;; *:TOPS-10:*:*) echo pdp10-unknown-tops10 exit 0 ;; *:TENEX:*:*) echo pdp10-unknown-tenex exit 0 ;; KS10:TOPS-20:*:* | KL10:TOPS-20:*:* | TYPE4:TOPS-20:*:*) echo pdp10-dec-tops20 exit 0 ;; XKL-1:TOPS-20:*:* | TYPE5:TOPS-20:*:*) echo pdp10-xkl-tops20 exit 0 ;; *:TOPS-20:*:*) echo pdp10-unknown-tops20 exit 0 ;; *:ITS:*:*) echo pdp10-unknown-its exit 0 ;; i*86:XTS-300:*:STOP) echo ${UNAME_MACHINE}-unknown-stop exit 0 ;; i*86:atheos:*:*) echo ${UNAME_MACHINE}-unknown-atheos exit 0 ;; esac #echo '(No uname command or uname output not recognized.)' 1>&2 #echo "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" 1>&2 eval $set_cc_for_build cat >$dummy.c < # include #endif main () { #if defined (sony) #if defined (MIPSEB) /* BFD wants "bsd" instead of "newsos". Perhaps BFD should be changed, I don't know.... */ printf ("mips-sony-bsd\n"); exit (0); #else #include printf ("m68k-sony-newsos%s\n", #ifdef NEWSOS4 "4" #else "" #endif ); exit (0); #endif #endif #if defined (__arm) && defined (__acorn) && defined (__unix) printf ("arm-acorn-riscix"); exit (0); #endif #if defined (hp300) && !defined (hpux) printf ("m68k-hp-bsd\n"); exit (0); #endif #if defined (NeXT) #if !defined (__ARCHITECTURE__) #define __ARCHITECTURE__ "m68k" #endif int version; version=`(hostinfo | sed -n 's/.*NeXT Mach \([0-9]*\).*/\1/p') 2>/dev/null`; if (version < 4) printf ("%s-next-nextstep%d\n", __ARCHITECTURE__, version); else printf ("%s-next-openstep%d\n", __ARCHITECTURE__, version); exit (0); #endif #if defined (MULTIMAX) || defined (n16) #if defined (UMAXV) printf ("ns32k-encore-sysv\n"); exit (0); #else #if defined (CMU) printf ("ns32k-encore-mach\n"); exit (0); #else printf ("ns32k-encore-bsd\n"); exit (0); #endif #endif #endif #if defined (__386BSD__) printf ("i386-pc-bsd\n"); exit (0); #endif #if defined (sequent) #if defined (i386) printf ("i386-sequent-dynix\n"); exit (0); #endif #if defined (ns32000) printf ("ns32k-sequent-dynix\n"); exit (0); #endif #endif #if defined (_SEQUENT_) struct utsname un; uname(&un); if (strncmp(un.version, "V2", 2) == 0) { printf ("i386-sequent-ptx2\n"); exit (0); } if (strncmp(un.version, "V1", 2) == 0) { /* XXX is V1 correct? */ printf ("i386-sequent-ptx1\n"); exit (0); } printf ("i386-sequent-ptx\n"); exit (0); #endif #if defined (vax) # if !defined (ultrix) # include # if defined (BSD) # if BSD == 43 printf ("vax-dec-bsd4.3\n"); exit (0); # else # if BSD == 199006 printf ("vax-dec-bsd4.3reno\n"); exit (0); # else printf ("vax-dec-bsd\n"); exit (0); # endif # endif # else printf ("vax-dec-bsd\n"); exit (0); # endif # else printf ("vax-dec-ultrix\n"); exit (0); # endif #endif #if defined (alliant) && defined (i860) printf ("i860-alliant-bsd\n"); exit (0); #endif exit (1); } EOF $CC_FOR_BUILD $dummy.c -o $dummy 2>/dev/null && ./$dummy && rm -f $dummy.c $dummy && exit 0 rm -f $dummy.c $dummy # Apollos put the system type in the environment. test -d /usr/apollo && { echo ${ISP}-apollo-${SYSTYPE}; exit 0; } # Convex versions that predate uname can use getsysinfo(1) if [ -x /usr/convex/getsysinfo ] then case `getsysinfo -f cpu_type` in c1*) echo c1-convex-bsd exit 0 ;; c2*) if getsysinfo -f scalar_acc then echo c32-convex-bsd else echo c2-convex-bsd fi exit 0 ;; c34*) echo c34-convex-bsd exit 0 ;; c38*) echo c38-convex-bsd exit 0 ;; c4*) echo c4-convex-bsd exit 0 ;; esac fi cat >&2 < in order to provide the needed information to handle your system. config.guess timestamp = $timestamp uname -m = `(uname -m) 2>/dev/null || echo unknown` uname -r = `(uname -r) 2>/dev/null || echo unknown` uname -s = `(uname -s) 2>/dev/null || echo unknown` uname -v = `(uname -v) 2>/dev/null || echo unknown` /usr/bin/uname -p = `(/usr/bin/uname -p) 2>/dev/null` /bin/uname -X = `(/bin/uname -X) 2>/dev/null` hostinfo = `(hostinfo) 2>/dev/null` /bin/universe = `(/bin/universe) 2>/dev/null` /usr/bin/arch -k = `(/usr/bin/arch -k) 2>/dev/null` /bin/arch = `(/bin/arch) 2>/dev/null` /usr/bin/oslevel = `(/usr/bin/oslevel) 2>/dev/null` /usr/convex/getsysinfo = `(/usr/convex/getsysinfo) 2>/dev/null` UNAME_MACHINE = ${UNAME_MACHINE} UNAME_RELEASE = ${UNAME_RELEASE} UNAME_SYSTEM = ${UNAME_SYSTEM} UNAME_VERSION = ${UNAME_VERSION} EOF exit 1 # Local variables: # eval: (add-hook 'write-file-hooks 'time-stamp) # time-stamp-start: "timestamp='" # time-stamp-format: "%:y-%02m-%02d" # time-stamp-end: "'" # End: exonerate-2.4.0/config.sub0000744000175000017500000006710010371635214012417 00000000000000#! /bin/sh # Configuration validation subroutine script. # Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001 # Free Software Foundation, Inc. timestamp='2001-09-07' # This file is (in principle) common to ALL GNU software. # The presence of a machine in this file suggests that SOME GNU software # can handle that machine. It does not imply ALL GNU software can. # # This file is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place - Suite 330, # Boston, MA 02111-1307, USA. # As a special exception to the GNU General Public License, if you # distribute this file as part of a program that contains a # configuration script generated by Autoconf, you may include it under # the same distribution terms that you use for the rest of that program. # Please send patches to . # # Configuration subroutine to validate and canonicalize a configuration type. # Supply the specified configuration type as an argument. # If it is invalid, we print an error message on stderr and exit with code 1. # Otherwise, we print the canonical config type on stdout and succeed. # This file is supposed to be the same for all GNU packages # and recognize all the CPU types, system types and aliases # that are meaningful with *any* GNU software. # Each package is responsible for reporting which valid configurations # it does not support. The user should be able to distinguish # a failure to support a valid configuration from a meaningless # configuration. # The goal of this file is to map all the various variations of a given # machine specification into a single specification in the form: # CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM # or in some cases, the newer four-part form: # CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM # It is wrong to echo any other type of specification. me=`echo "$0" | sed -e 's,.*/,,'` usage="\ Usage: $0 [OPTION] CPU-MFR-OPSYS $0 [OPTION] ALIAS Canonicalize a configuration name. Operation modes: -h, --help print this help, then exit -t, --time-stamp print date of last modification, then exit -v, --version print version number, then exit Report bugs and patches to ." version="\ GNU config.sub ($timestamp) Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE." help=" Try \`$me --help' for more information." # Parse command line while test $# -gt 0 ; do case $1 in --time-stamp | --time* | -t ) echo "$timestamp" ; exit 0 ;; --version | -v ) echo "$version" ; exit 0 ;; --help | --h* | -h ) echo "$usage"; exit 0 ;; -- ) # Stop option processing shift; break ;; - ) # Use stdin as input. break ;; -* ) echo "$me: invalid option $1$help" exit 1 ;; *local*) # First pass through any local machine types. echo $1 exit 0;; * ) break ;; esac done case $# in 0) echo "$me: missing argument$help" >&2 exit 1;; 1) ;; *) echo "$me: too many arguments$help" >&2 exit 1;; esac # Separate what the user gave into CPU-COMPANY and OS or KERNEL-OS (if any). # Here we must recognize all the valid KERNEL-OS combinations. maybe_os=`echo $1 | sed 's/^\(.*\)-\([^-]*-[^-]*\)$/\2/'` case $maybe_os in nto-qnx* | linux-gnu* | storm-chaos* | os2-emx* | windows32-*) os=-$maybe_os basic_machine=`echo $1 | sed 's/^\(.*\)-\([^-]*-[^-]*\)$/\1/'` ;; *) basic_machine=`echo $1 | sed 's/-[^-]*$//'` if [ $basic_machine != $1 ] then os=`echo $1 | sed 's/.*-/-/'` else os=; fi ;; esac ### Let's recognize common machines as not being operating systems so ### that things like config.sub decstation-3100 work. We also ### recognize some manufacturers as not being operating systems, so we ### can provide default operating systems below. case $os in -sun*os*) # Prevent following clause from handling this invalid input. ;; -dec* | -mips* | -sequent* | -encore* | -pc532* | -sgi* | -sony* | \ -att* | -7300* | -3300* | -delta* | -motorola* | -sun[234]* | \ -unicom* | -ibm* | -next | -hp | -isi* | -apollo | -altos* | \ -convergent* | -ncr* | -news | -32* | -3600* | -3100* | -hitachi* |\ -c[123]* | -convex* | -sun | -crds | -omron* | -dg | -ultra | -tti* | \ -harris | -dolphin | -highlevel | -gould | -cbm | -ns | -masscomp | \ -apple | -axis) os= basic_machine=$1 ;; -sim | -cisco | -oki | -wec | -winbond) os= basic_machine=$1 ;; -scout) ;; -wrs) os=-vxworks basic_machine=$1 ;; -chorusos*) os=-chorusos basic_machine=$1 ;; -chorusrdb) os=-chorusrdb basic_machine=$1 ;; -hiux*) os=-hiuxwe2 ;; -sco5) os=-sco3.2v5 basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` ;; -sco4) os=-sco3.2v4 basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` ;; -sco3.2.[4-9]*) os=`echo $os | sed -e 's/sco3.2./sco3.2v/'` basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` ;; -sco3.2v[4-9]*) # Don't forget version if it is 3.2v4 or newer. basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` ;; -sco*) os=-sco3.2v2 basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` ;; -udk*) basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` ;; -isc) os=-isc2.2 basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` ;; -clix*) basic_machine=clipper-intergraph ;; -isc*) basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` ;; -lynx*) os=-lynxos ;; -ptx*) basic_machine=`echo $1 | sed -e 's/86-.*/86-sequent/'` ;; -windowsnt*) os=`echo $os | sed -e 's/windowsnt/winnt/'` ;; -psos*) os=-psos ;; -mint | -mint[0-9]*) basic_machine=m68k-atari os=-mint ;; esac # Decode aliases for certain CPU-COMPANY combinations. case $basic_machine in # Recognize the basic CPU types without company name. # Some are omitted here because they have special meanings below. 1750a | 580 \ | a29k \ | alpha | alphaev[4-8] | alphaev56 | alphaev6[78] | alphapca5[67] \ | arc | arm | arm[bl]e | arme[lb] | armv[2345] | armv[345][lb] | avr \ | c4x | clipper \ | d10v | d30v | dsp16xx \ | fr30 \ | h8300 | h8500 | hppa | hppa1.[01] | hppa2.0 | hppa2.0[nw] | hppa64 \ | i370 | i860 | i960 | ia64 \ | m32r | m68000 | m68k | m88k | mcore \ | mips16 | mips64 | mips64el | mips64orion | mips64orionel \ | mips64vr4100 | mips64vr4100el | mips64vr4300 \ | mips64vr4300el | mips64vr5000 | mips64vr5000el \ | mipsbe | mipseb | mipsel | mipsle | mipstx39 | mipstx39el \ | mipsisa32 \ | mn10200 | mn10300 \ | ns16k | ns32k \ | openrisc \ | pdp10 | pdp11 | pj | pjl \ | powerpc | powerpc64 | powerpc64le | powerpcle | ppcbe \ | pyramid \ | s390 | s390x \ | sh | sh[34] | sh[34]eb | shbe | shle \ | sparc | sparc64 | sparclet | sparclite | sparcv9 | sparcv9b \ | stormy16 | strongarm \ | tahoe | thumb | tic80 | tron \ | v850 \ | we32k \ | x86 | xscale \ | z8k) basic_machine=$basic_machine-unknown ;; m6811 | m68hc11 | m6812 | m68hc12) # Motorola 68HC11/12. basic_machine=$basic_machine-unknown os=-none ;; m88110 | m680[12346]0 | m683?2 | m68360 | m5200 | v70 | w65 | z8k) ;; # We use `pc' rather than `unknown' # because (1) that's what they normally are, and # (2) the word "unknown" tends to confuse beginning users. i*86 | x86_64) basic_machine=$basic_machine-pc ;; # Object if more than one company name word. *-*-*) echo Invalid configuration \`$1\': machine \`$basic_machine\' not recognized 1>&2 exit 1 ;; # Recognize the basic CPU types with company name. 580-* \ | a29k-* \ | alpha-* | alphaev[4-8]-* | alphaev56-* | alphaev6[78]-* \ | alphapca5[67]-* | arc-* \ | arm-* | armbe-* | armle-* | armv*-* \ | bs2000-* \ | c[123]* | c30-* | [cjt]90-* | c54x-* \ | clipper-* | cray2-* | cydra-* \ | d10v-* | d30v-* \ | elxsi-* \ | f30[01]-* | f700-* | fr30-* | fx80-* \ | h8300-* | h8500-* \ | hppa-* | hppa1.[01]-* | hppa2.0-* | hppa2.0[nw]-* | hppa64-* \ | i*86-* | i860-* | i960-* | ia64-* \ | m32r-* \ | m68000-* | m680[01234]0-* | m68360-* | m683?2-* | m68k-* \ | m88110-* | m88k-* | mcore-* \ | mips-* | mips16-* | mips64-* | mips64el-* | mips64orion-* \ | mips64orionel-* | mips64vr4100-* | mips64vr4100el-* \ | mips64vr4300-* | mips64vr4300el-* | mipsbe-* | mipseb-* \ | mipsle-* | mipsel-* | mipstx39-* | mipstx39el-* \ | none-* | np1-* | ns16k-* | ns32k-* \ | orion-* \ | pdp10-* | pdp11-* | pj-* | pjl-* | pn-* | power-* \ | powerpc-* | powerpc64-* | powerpc64le-* | powerpcle-* | ppcbe-* \ | pyramid-* \ | romp-* | rs6000-* \ | s390-* | s390x-* \ | sh-* | sh[34]-* | sh[34]eb-* | shbe-* | shle-* \ | sparc-* | sparc64-* | sparc86x-* | sparclite-* \ | sparcv9-* | sparcv9b-* | stormy16-* | strongarm-* | sv1-* \ | t3e-* | tahoe-* | thumb-* | tic30-* | tic54x-* | tic80-* | tron-* \ | v850-* | vax-* \ | we32k-* \ | x86-* | x86_64-* | xmp-* | xps100-* | xscale-* \ | ymp-* \ | z8k-*) ;; # Recognize the various machine names and aliases which stand # for a CPU type and a company and sometimes even an OS. 386bsd) basic_machine=i386-unknown os=-bsd ;; 3b1 | 7300 | 7300-att | att-7300 | pc7300 | safari | unixpc) basic_machine=m68000-att ;; 3b*) basic_machine=we32k-att ;; a29khif) basic_machine=a29k-amd os=-udi ;; adobe68k) basic_machine=m68010-adobe os=-scout ;; alliant | fx80) basic_machine=fx80-alliant ;; altos | altos3068) basic_machine=m68k-altos ;; am29k) basic_machine=a29k-none os=-bsd ;; amdahl) basic_machine=580-amdahl os=-sysv ;; amiga | amiga-*) basic_machine=m68k-unknown ;; amigaos | amigados) basic_machine=m68k-unknown os=-amigaos ;; amigaunix | amix) basic_machine=m68k-unknown os=-sysv4 ;; apollo68) basic_machine=m68k-apollo os=-sysv ;; apollo68bsd) basic_machine=m68k-apollo os=-bsd ;; aux) basic_machine=m68k-apple os=-aux ;; balance) basic_machine=ns32k-sequent os=-dynix ;; convex-c1) basic_machine=c1-convex os=-bsd ;; convex-c2) basic_machine=c2-convex os=-bsd ;; convex-c32) basic_machine=c32-convex os=-bsd ;; convex-c34) basic_machine=c34-convex os=-bsd ;; convex-c38) basic_machine=c38-convex os=-bsd ;; cray | ymp) basic_machine=ymp-cray os=-unicos ;; cray2) basic_machine=cray2-cray os=-unicos ;; [cjt]90) basic_machine=${basic_machine}-cray os=-unicos ;; crds | unos) basic_machine=m68k-crds ;; cris | cris-* | etrax*) basic_machine=cris-axis ;; da30 | da30-*) basic_machine=m68k-da30 ;; decstation | decstation-3100 | pmax | pmax-* | pmin | dec3100 | decstatn) basic_machine=mips-dec ;; delta | 3300 | motorola-3300 | motorola-delta \ | 3300-motorola | delta-motorola) basic_machine=m68k-motorola ;; delta88) basic_machine=m88k-motorola os=-sysv3 ;; dpx20 | dpx20-*) basic_machine=rs6000-bull os=-bosx ;; dpx2* | dpx2*-bull) basic_machine=m68k-bull os=-sysv3 ;; ebmon29k) basic_machine=a29k-amd os=-ebmon ;; elxsi) basic_machine=elxsi-elxsi os=-bsd ;; encore | umax | mmax) basic_machine=ns32k-encore ;; es1800 | OSE68k | ose68k | ose | OSE) basic_machine=m68k-ericsson os=-ose ;; fx2800) basic_machine=i860-alliant ;; genix) basic_machine=ns32k-ns ;; gmicro) basic_machine=tron-gmicro os=-sysv ;; go32) basic_machine=i386-pc os=-go32 ;; h3050r* | hiux*) basic_machine=hppa1.1-hitachi os=-hiuxwe2 ;; h8300hms) basic_machine=h8300-hitachi os=-hms ;; h8300xray) basic_machine=h8300-hitachi os=-xray ;; h8500hms) basic_machine=h8500-hitachi os=-hms ;; harris) basic_machine=m88k-harris os=-sysv3 ;; hp300-*) basic_machine=m68k-hp ;; hp300bsd) basic_machine=m68k-hp os=-bsd ;; hp300hpux) basic_machine=m68k-hp os=-hpux ;; hp3k9[0-9][0-9] | hp9[0-9][0-9]) basic_machine=hppa1.0-hp ;; hp9k2[0-9][0-9] | hp9k31[0-9]) basic_machine=m68000-hp ;; hp9k3[2-9][0-9]) basic_machine=m68k-hp ;; hp9k6[0-9][0-9] | hp6[0-9][0-9]) basic_machine=hppa1.0-hp ;; hp9k7[0-79][0-9] | hp7[0-79][0-9]) basic_machine=hppa1.1-hp ;; hp9k78[0-9] | hp78[0-9]) # FIXME: really hppa2.0-hp basic_machine=hppa1.1-hp ;; hp9k8[67]1 | hp8[67]1 | hp9k80[24] | hp80[24] | hp9k8[78]9 | hp8[78]9 | hp9k893 | hp893) # FIXME: really hppa2.0-hp basic_machine=hppa1.1-hp ;; hp9k8[0-9][13679] | hp8[0-9][13679]) basic_machine=hppa1.1-hp ;; hp9k8[0-9][0-9] | hp8[0-9][0-9]) basic_machine=hppa1.0-hp ;; hppa-next) os=-nextstep3 ;; hppaosf) basic_machine=hppa1.1-hp os=-osf ;; hppro) basic_machine=hppa1.1-hp os=-proelf ;; i370-ibm* | ibm*) basic_machine=i370-ibm ;; # I'm not sure what "Sysv32" means. Should this be sysv3.2? i*86v32) basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'` os=-sysv32 ;; i*86v4*) basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'` os=-sysv4 ;; i*86v) basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'` os=-sysv ;; i*86sol2) basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'` os=-solaris2 ;; i386mach) basic_machine=i386-mach os=-mach ;; i386-vsta | vsta) basic_machine=i386-unknown os=-vsta ;; iris | iris4d) basic_machine=mips-sgi case $os in -irix*) ;; *) os=-irix4 ;; esac ;; isi68 | isi) basic_machine=m68k-isi os=-sysv ;; m88k-omron*) basic_machine=m88k-omron ;; magnum | m3230) basic_machine=mips-mips os=-sysv ;; merlin) basic_machine=ns32k-utek os=-sysv ;; mingw32) basic_machine=i386-pc os=-mingw32 ;; miniframe) basic_machine=m68000-convergent ;; *mint | -mint[0-9]* | *MiNT | *MiNT[0-9]*) basic_machine=m68k-atari os=-mint ;; mipsel*-linux*) basic_machine=mipsel-unknown os=-linux-gnu ;; mips*-linux*) basic_machine=mips-unknown os=-linux-gnu ;; mips3*-*) basic_machine=`echo $basic_machine | sed -e 's/mips3/mips64/'` ;; mips3*) basic_machine=`echo $basic_machine | sed -e 's/mips3/mips64/'`-unknown ;; mmix*) basic_machine=mmix-knuth os=-mmixware ;; monitor) basic_machine=m68k-rom68k os=-coff ;; msdos) basic_machine=i386-pc os=-msdos ;; mvs) basic_machine=i370-ibm os=-mvs ;; ncr3000) basic_machine=i486-ncr os=-sysv4 ;; netbsd386) basic_machine=i386-unknown os=-netbsd ;; netwinder) basic_machine=armv4l-rebel os=-linux ;; news | news700 | news800 | news900) basic_machine=m68k-sony os=-newsos ;; news1000) basic_machine=m68030-sony os=-newsos ;; news-3600 | risc-news) basic_machine=mips-sony os=-newsos ;; necv70) basic_machine=v70-nec os=-sysv ;; next | m*-next ) basic_machine=m68k-next case $os in -nextstep* ) ;; -ns2*) os=-nextstep2 ;; *) os=-nextstep3 ;; esac ;; nh3000) basic_machine=m68k-harris os=-cxux ;; nh[45]000) basic_machine=m88k-harris os=-cxux ;; nindy960) basic_machine=i960-intel os=-nindy ;; mon960) basic_machine=i960-intel os=-mon960 ;; nonstopux) basic_machine=mips-compaq os=-nonstopux ;; np1) basic_machine=np1-gould ;; nsr-tandem) basic_machine=nsr-tandem ;; op50n-* | op60c-*) basic_machine=hppa1.1-oki os=-proelf ;; OSE68000 | ose68000) basic_machine=m68000-ericsson os=-ose ;; os68k) basic_machine=m68k-none os=-os68k ;; pa-hitachi) basic_machine=hppa1.1-hitachi os=-hiuxwe2 ;; paragon) basic_machine=i860-intel os=-osf ;; pbd) basic_machine=sparc-tti ;; pbb) basic_machine=m68k-tti ;; pc532 | pc532-*) basic_machine=ns32k-pc532 ;; pentium | p5 | k5 | k6 | nexgen) basic_machine=i586-pc ;; pentiumpro | p6 | 6x86 | athlon) basic_machine=i686-pc ;; pentiumii | pentium2) basic_machine=i686-pc ;; pentium-* | p5-* | k5-* | k6-* | nexgen-*) basic_machine=i586-`echo $basic_machine | sed 's/^[^-]*-//'` ;; pentiumpro-* | p6-* | 6x86-* | athlon-*) basic_machine=i686-`echo $basic_machine | sed 's/^[^-]*-//'` ;; pentiumii-* | pentium2-*) basic_machine=i686-`echo $basic_machine | sed 's/^[^-]*-//'` ;; pn) basic_machine=pn-gould ;; power) basic_machine=power-ibm ;; ppc) basic_machine=powerpc-unknown ;; ppc-*) basic_machine=powerpc-`echo $basic_machine | sed 's/^[^-]*-//'` ;; ppcle | powerpclittle | ppc-le | powerpc-little) basic_machine=powerpcle-unknown ;; ppcle-* | powerpclittle-*) basic_machine=powerpcle-`echo $basic_machine | sed 's/^[^-]*-//'` ;; ppc64) basic_machine=powerpc64-unknown ;; ppc64-*) basic_machine=powerpc64-`echo $basic_machine | sed 's/^[^-]*-//'` ;; ppc64le | powerpc64little | ppc64-le | powerpc64-little) basic_machine=powerpc64le-unknown ;; ppc64le-* | powerpc64little-*) basic_machine=powerpc64le-`echo $basic_machine | sed 's/^[^-]*-//'` ;; ps2) basic_machine=i386-ibm ;; pw32) basic_machine=i586-unknown os=-pw32 ;; rom68k) basic_machine=m68k-rom68k os=-coff ;; rm[46]00) basic_machine=mips-siemens ;; rtpc | rtpc-*) basic_machine=romp-ibm ;; sa29200) basic_machine=a29k-amd os=-udi ;; sequent) basic_machine=i386-sequent ;; sh) basic_machine=sh-hitachi os=-hms ;; sparclite-wrs) basic_machine=sparclite-wrs os=-vxworks ;; sps7) basic_machine=m68k-bull os=-sysv2 ;; spur) basic_machine=spur-unknown ;; st2000) basic_machine=m68k-tandem ;; stratus) basic_machine=i860-stratus os=-sysv4 ;; sun2) basic_machine=m68000-sun ;; sun2os3) basic_machine=m68000-sun os=-sunos3 ;; sun2os4) basic_machine=m68000-sun os=-sunos4 ;; sun3os3) basic_machine=m68k-sun os=-sunos3 ;; sun3os4) basic_machine=m68k-sun os=-sunos4 ;; sun4os3) basic_machine=sparc-sun os=-sunos3 ;; sun4os4) basic_machine=sparc-sun os=-sunos4 ;; sun4sol2) basic_machine=sparc-sun os=-solaris2 ;; sun3 | sun3-*) basic_machine=m68k-sun ;; sun4) basic_machine=sparc-sun ;; sun386 | sun386i | roadrunner) basic_machine=i386-sun ;; sv1) basic_machine=sv1-cray os=-unicos ;; symmetry) basic_machine=i386-sequent os=-dynix ;; t3e) basic_machine=t3e-cray os=-unicos ;; tic54x | c54x*) basic_machine=tic54x-unknown os=-coff ;; tx39) basic_machine=mipstx39-unknown ;; tx39el) basic_machine=mipstx39el-unknown ;; tower | tower-32) basic_machine=m68k-ncr ;; udi29k) basic_machine=a29k-amd os=-udi ;; ultra3) basic_machine=a29k-nyu os=-sym1 ;; v810 | necv810) basic_machine=v810-nec os=-none ;; vaxv) basic_machine=vax-dec os=-sysv ;; vms) basic_machine=vax-dec os=-vms ;; vpp*|vx|vx-*) basic_machine=f301-fujitsu ;; vxworks960) basic_machine=i960-wrs os=-vxworks ;; vxworks68) basic_machine=m68k-wrs os=-vxworks ;; vxworks29k) basic_machine=a29k-wrs os=-vxworks ;; w65*) basic_machine=w65-wdc os=-none ;; w89k-*) basic_machine=hppa1.1-winbond os=-proelf ;; windows32) basic_machine=i386-pc os=-windows32-msvcrt ;; xmp) basic_machine=xmp-cray os=-unicos ;; xps | xps100) basic_machine=xps100-honeywell ;; z8k-*-coff) basic_machine=z8k-unknown os=-sim ;; none) basic_machine=none-none os=-none ;; # Here we handle the default manufacturer of certain CPU types. It is in # some cases the only manufacturer, in others, it is the most popular. w89k) basic_machine=hppa1.1-winbond ;; op50n) basic_machine=hppa1.1-oki ;; op60c) basic_machine=hppa1.1-oki ;; mips) if [ x$os = x-linux-gnu ]; then basic_machine=mips-unknown else basic_machine=mips-mips fi ;; romp) basic_machine=romp-ibm ;; rs6000) basic_machine=rs6000-ibm ;; vax) basic_machine=vax-dec ;; pdp10) # there are many clones, so DEC is not a safe bet basic_machine=pdp10-unknown ;; pdp11) basic_machine=pdp11-dec ;; we32k) basic_machine=we32k-att ;; sh3 | sh4 | sh3eb | sh4eb) basic_machine=sh-unknown ;; sparc | sparcv9 | sparcv9b) basic_machine=sparc-sun ;; cydra) basic_machine=cydra-cydrome ;; orion) basic_machine=orion-highlevel ;; orion105) basic_machine=clipper-highlevel ;; mac | mpw | mac-mpw) basic_machine=m68k-apple ;; pmac | pmac-mpw) basic_machine=powerpc-apple ;; c4x*) basic_machine=c4x-none os=-coff ;; *-unknown) # Make sure to match an already-canonicalized machine name. ;; *) echo Invalid configuration \`$1\': machine \`$basic_machine\' not recognized 1>&2 exit 1 ;; esac # Here we canonicalize certain aliases for manufacturers. case $basic_machine in *-digital*) basic_machine=`echo $basic_machine | sed 's/digital.*/dec/'` ;; *-commodore*) basic_machine=`echo $basic_machine | sed 's/commodore.*/cbm/'` ;; *) ;; esac # Decode manufacturer-specific aliases for certain operating systems. if [ x"$os" != x"" ] then case $os in # First match some system type aliases # that might get confused with valid system types. # -solaris* is a basic system type, with this one exception. -solaris1 | -solaris1.*) os=`echo $os | sed -e 's|solaris1|sunos4|'` ;; -solaris) os=-solaris2 ;; -svr4*) os=-sysv4 ;; -unixware*) os=-sysv4.2uw ;; -gnu/linux*) os=`echo $os | sed -e 's|gnu/linux|linux-gnu|'` ;; # First accept the basic system types. # The portable systems comes first. # Each alternative MUST END IN A *, to match a version number. # -sysv* is not here because it comes later, after sysvr4. -gnu* | -bsd* | -mach* | -minix* | -genix* | -ultrix* | -irix* \ | -*vms* | -sco* | -esix* | -isc* | -aix* | -sunos | -sunos[34]*\ | -hpux* | -unos* | -osf* | -luna* | -dgux* | -solaris* | -sym* \ | -amigaos* | -amigados* | -msdos* | -newsos* | -unicos* | -aof* \ | -aos* \ | -nindy* | -vxsim* | -vxworks* | -ebmon* | -hms* | -mvs* \ | -clix* | -riscos* | -uniplus* | -iris* | -rtu* | -xenix* \ | -hiux* | -386bsd* | -netbsd* | -openbsd* | -freebsd* | -riscix* \ | -lynxos* | -bosx* | -nextstep* | -cxux* | -aout* | -elf* | -oabi* \ | -ptx* | -coff* | -ecoff* | -winnt* | -domain* | -vsta* \ | -udi* | -eabi* | -lites* | -ieee* | -go32* | -aux* \ | -chorusos* | -chorusrdb* \ | -cygwin* | -pe* | -psos* | -moss* | -proelf* | -rtems* \ | -mingw32* | -linux-gnu* | -uxpv* | -beos* | -mpeix* | -udk* \ | -interix* | -uwin* | -rhapsody* | -darwin* | -opened* \ | -openstep* | -oskit* | -conix* | -pw32* | -nonstopux* \ | -storm-chaos* | -tops10* | -tenex* | -tops20* | -its* \ | -os2* | -vos*) # Remember, each alternative MUST END IN *, to match a version number. ;; -qnx*) case $basic_machine in x86-* | i*86-*) ;; *) os=-nto$os ;; esac ;; -nto*) os=-nto-qnx ;; -sim | -es1800* | -hms* | -xray | -os68k* | -none* | -v88r* \ | -windows* | -osx | -abug | -netware* | -os9* | -beos* \ | -macos* | -mpw* | -magic* | -mmixware* | -mon960* | -lnews*) ;; -mac*) os=`echo $os | sed -e 's|mac|macos|'` ;; -linux*) os=`echo $os | sed -e 's|linux|linux-gnu|'` ;; -sunos5*) os=`echo $os | sed -e 's|sunos5|solaris2|'` ;; -sunos6*) os=`echo $os | sed -e 's|sunos6|solaris3|'` ;; -opened*) os=-openedition ;; -wince*) os=-wince ;; -osfrose*) os=-osfrose ;; -osf*) os=-osf ;; -utek*) os=-bsd ;; -dynix*) os=-bsd ;; -acis*) os=-aos ;; -386bsd) os=-bsd ;; -ctix* | -uts*) os=-sysv ;; -ns2 ) os=-nextstep2 ;; -nsk*) os=-nsk ;; # Preserve the version number of sinix5. -sinix5.*) os=`echo $os | sed -e 's|sinix|sysv|'` ;; -sinix*) os=-sysv4 ;; -triton*) os=-sysv3 ;; -oss*) os=-sysv3 ;; -svr4) os=-sysv4 ;; -svr3) os=-sysv3 ;; -sysvr4) os=-sysv4 ;; # This must come after -sysvr4. -sysv*) ;; -ose*) os=-ose ;; -es1800*) os=-ose ;; -xenix) os=-xenix ;; -*mint | -mint[0-9]* | -*MiNT | -MiNT[0-9]*) os=-mint ;; -none) ;; *) # Get rid of the `-' at the beginning of $os. os=`echo $os | sed 's/[^-]*-//'` echo Invalid configuration \`$1\': system \`$os\' not recognized 1>&2 exit 1 ;; esac else # Here we handle the default operating systems that come with various machines. # The value should be what the vendor currently ships out the door with their # machine or put another way, the most popular os provided with the machine. # Note that if you're going to try to match "-MANUFACTURER" here (say, # "-sun"), then you have to tell the case statement up towards the top # that MANUFACTURER isn't an operating system. Otherwise, code above # will signal an error saying that MANUFACTURER isn't an operating # system, and we'll never get to this point. case $basic_machine in *-acorn) os=-riscix1.2 ;; arm*-rebel) os=-linux ;; arm*-semi) os=-aout ;; pdp10-*) os=-tops20 ;; pdp11-*) os=-none ;; *-dec | vax-*) os=-ultrix4.2 ;; m68*-apollo) os=-domain ;; i386-sun) os=-sunos4.0.2 ;; m68000-sun) os=-sunos3 # This also exists in the configure program, but was not the # default. # os=-sunos4 ;; m68*-cisco) os=-aout ;; mips*-cisco) os=-elf ;; mips*-*) os=-elf ;; *-tti) # must be before sparc entry or we get the wrong os. os=-sysv3 ;; sparc-* | *-sun) os=-sunos4.1.1 ;; *-be) os=-beos ;; *-ibm) os=-aix ;; *-wec) os=-proelf ;; *-winbond) os=-proelf ;; *-oki) os=-proelf ;; *-hp) os=-hpux ;; *-hitachi) os=-hiux ;; i860-* | *-att | *-ncr | *-altos | *-motorola | *-convergent) os=-sysv ;; *-cbm) os=-amigaos ;; *-dg) os=-dgux ;; *-dolphin) os=-sysv3 ;; m68k-ccur) os=-rtu ;; m88k-omron*) os=-luna ;; *-next ) os=-nextstep ;; *-sequent) os=-ptx ;; *-crds) os=-unos ;; *-ns) os=-genix ;; i370-*) os=-mvs ;; *-next) os=-nextstep3 ;; *-gould) os=-sysv ;; *-highlevel) os=-bsd ;; *-encore) os=-bsd ;; *-sgi) os=-irix ;; *-siemens) os=-sysv4 ;; *-masscomp) os=-rtu ;; f30[01]-fujitsu | f700-fujitsu) os=-uxpv ;; *-rom68k) os=-coff ;; *-*bug) os=-coff ;; *-apple) os=-macos ;; *-atari*) os=-mint ;; *) os=-none ;; esac fi # Here we handle the case where we know the os, and the CPU type, but not the # manufacturer. We pick the logical manufacturer. vendor=unknown case $basic_machine in *-unknown) case $os in -riscix*) vendor=acorn ;; -sunos*) vendor=sun ;; -aix*) vendor=ibm ;; -beos*) vendor=be ;; -hpux*) vendor=hp ;; -mpeix*) vendor=hp ;; -hiux*) vendor=hitachi ;; -unos*) vendor=crds ;; -dgux*) vendor=dg ;; -luna*) vendor=omron ;; -genix*) vendor=ns ;; -mvs* | -opened*) vendor=ibm ;; -ptx*) vendor=sequent ;; -vxsim* | -vxworks*) vendor=wrs ;; -aux*) vendor=apple ;; -hms*) vendor=hitachi ;; -mpw* | -macos*) vendor=apple ;; -*mint | -mint[0-9]* | -*MiNT | -MiNT[0-9]*) vendor=atari ;; -vos*) vendor=stratus ;; esac basic_machine=`echo $basic_machine | sed "s/unknown/$vendor/"` ;; esac echo $basic_machine$os exit 0 # Local variables: # eval: (add-hook 'write-file-hooks 'time-stamp) # time-stamp-start: "timestamp='" # time-stamp-format: "%:y-%02m-%02d" # time-stamp-end: "'" # End: exonerate-2.4.0/configure0000755000175000017500000047557711222703636012372 00000000000000#! /bin/sh # Guess values for system-dependent variables and create Makefiles. # Generated by GNU Autoconf 2.61. # # Copyright (C) 1992, 1993, 1994, 1995, 1996, 1998, 1999, 2000, 2001, # 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc. # This configure script is free software; the Free Software Foundation # gives unlimited permission to copy, distribute and modify it. ## --------------------- ## ## M4sh Initialization. ## ## --------------------- ## # Be more Bourne compatible DUALCASE=1; export DUALCASE # for MKS sh if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then emulate sh NULLCMD=: # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which # is contrary to our usage. Disable this feature. alias -g '${1+"$@"}'='"$@"' setopt NO_GLOB_SUBST else case `(set -o) 2>/dev/null` in *posix*) set -o posix ;; esac fi # PATH needs CR # Avoid depending upon Character Ranges. as_cr_letters='abcdefghijklmnopqrstuvwxyz' as_cr_LETTERS='ABCDEFGHIJKLMNOPQRSTUVWXYZ' as_cr_Letters=$as_cr_letters$as_cr_LETTERS as_cr_digits='0123456789' as_cr_alnum=$as_cr_Letters$as_cr_digits # The user is always right. if test "${PATH_SEPARATOR+set}" != set; then echo "#! /bin/sh" >conf$$.sh echo "exit 0" >>conf$$.sh chmod +x conf$$.sh if (PATH="/nonexistent;."; conf$$.sh) >/dev/null 2>&1; then PATH_SEPARATOR=';' else PATH_SEPARATOR=: fi rm -f conf$$.sh fi # Support unset when possible. if ( (MAIL=60; unset MAIL) || exit) >/dev/null 2>&1; then as_unset=unset else as_unset=false fi # IFS # We need space, tab and new line, in precisely that order. Quoting is # there to prevent editors from complaining about space-tab. # (If _AS_PATH_WALK were called with IFS unset, it would disable word # splitting by setting IFS to empty value.) as_nl=' ' IFS=" "" $as_nl" # Find who we are. Look in the path if we contain no directory separator. case $0 in *[\\/]* ) as_myself=$0 ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. test -r "$as_dir/$0" && as_myself=$as_dir/$0 && break done IFS=$as_save_IFS ;; esac # We did not find ourselves, most probably we were run as `sh COMMAND' # in which case we are not to be found in the path. if test "x$as_myself" = x; then as_myself=$0 fi if test ! -f "$as_myself"; then echo "$as_myself: error: cannot find myself; rerun with an absolute file name" >&2 { (exit 1); exit 1; } fi # Work around bugs in pre-3.0 UWIN ksh. for as_var in ENV MAIL MAILPATH do ($as_unset $as_var) >/dev/null 2>&1 && $as_unset $as_var done PS1='$ ' PS2='> ' PS4='+ ' # NLS nuisances. for as_var in \ LANG LANGUAGE LC_ADDRESS LC_ALL LC_COLLATE LC_CTYPE LC_IDENTIFICATION \ LC_MEASUREMENT LC_MESSAGES LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER \ LC_TELEPHONE LC_TIME do if (set +x; test -z "`(eval $as_var=C; export $as_var) 2>&1`"); then eval $as_var=C; export $as_var else ($as_unset $as_var) >/dev/null 2>&1 && $as_unset $as_var fi done # Required to use basename. if expr a : '\(a\)' >/dev/null 2>&1 && test "X`expr 00001 : '.*\(...\)'`" = X001; then as_expr=expr else as_expr=false fi if (basename -- /) >/dev/null 2>&1 && test "X`basename -- / 2>&1`" = "X/"; then as_basename=basename else as_basename=false fi # Name of the executable. as_me=`$as_basename -- "$0" || $as_expr X/"$0" : '.*/\([^/][^/]*\)/*$' \| \ X"$0" : 'X\(//\)$' \| \ X"$0" : 'X\(/\)' \| . 2>/dev/null || echo X/"$0" | sed '/^.*\/\([^/][^/]*\)\/*$/{ s//\1/ q } /^X\/\(\/\/\)$/{ s//\1/ q } /^X\/\(\/\).*/{ s//\1/ q } s/.*/./; q'` # CDPATH. $as_unset CDPATH if test "x$CONFIG_SHELL" = x; then if (eval ":") 2>/dev/null; then as_have_required=yes else as_have_required=no fi if test $as_have_required = yes && (eval ": (as_func_return () { (exit \$1) } as_func_success () { as_func_return 0 } as_func_failure () { as_func_return 1 } as_func_ret_success () { return 0 } as_func_ret_failure () { return 1 } exitcode=0 if as_func_success; then : else exitcode=1 echo as_func_success failed. fi if as_func_failure; then exitcode=1 echo as_func_failure succeeded. fi if as_func_ret_success; then : else exitcode=1 echo as_func_ret_success failed. fi if as_func_ret_failure; then exitcode=1 echo as_func_ret_failure succeeded. fi if ( set x; as_func_ret_success y && test x = \"\$1\" ); then : else exitcode=1 echo positional parameters were not saved. fi test \$exitcode = 0) || { (exit 1); exit 1; } ( as_lineno_1=\$LINENO as_lineno_2=\$LINENO test \"x\$as_lineno_1\" != \"x\$as_lineno_2\" && test \"x\`expr \$as_lineno_1 + 1\`\" = \"x\$as_lineno_2\") || { (exit 1); exit 1; } ") 2> /dev/null; then : else as_candidate_shells= as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in /bin$PATH_SEPARATOR/usr/bin$PATH_SEPARATOR$PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. case $as_dir in /*) for as_base in sh bash ksh sh5; do as_candidate_shells="$as_candidate_shells $as_dir/$as_base" done;; esac done IFS=$as_save_IFS for as_shell in $as_candidate_shells $SHELL; do # Try only shells that exist, to save several forks. if { test -f "$as_shell" || test -f "$as_shell.exe"; } && { ("$as_shell") 2> /dev/null <<\_ASEOF if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then emulate sh NULLCMD=: # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which # is contrary to our usage. Disable this feature. alias -g '${1+"$@"}'='"$@"' setopt NO_GLOB_SUBST else case `(set -o) 2>/dev/null` in *posix*) set -o posix ;; esac fi : _ASEOF }; then CONFIG_SHELL=$as_shell as_have_required=yes if { "$as_shell" 2> /dev/null <<\_ASEOF if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then emulate sh NULLCMD=: # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which # is contrary to our usage. Disable this feature. alias -g '${1+"$@"}'='"$@"' setopt NO_GLOB_SUBST else case `(set -o) 2>/dev/null` in *posix*) set -o posix ;; esac fi : (as_func_return () { (exit $1) } as_func_success () { as_func_return 0 } as_func_failure () { as_func_return 1 } as_func_ret_success () { return 0 } as_func_ret_failure () { return 1 } exitcode=0 if as_func_success; then : else exitcode=1 echo as_func_success failed. fi if as_func_failure; then exitcode=1 echo as_func_failure succeeded. fi if as_func_ret_success; then : else exitcode=1 echo as_func_ret_success failed. fi if as_func_ret_failure; then exitcode=1 echo as_func_ret_failure succeeded. fi if ( set x; as_func_ret_success y && test x = "$1" ); then : else exitcode=1 echo positional parameters were not saved. fi test $exitcode = 0) || { (exit 1); exit 1; } ( as_lineno_1=$LINENO as_lineno_2=$LINENO test "x$as_lineno_1" != "x$as_lineno_2" && test "x`expr $as_lineno_1 + 1`" = "x$as_lineno_2") || { (exit 1); exit 1; } _ASEOF }; then break fi fi done if test "x$CONFIG_SHELL" != x; then for as_var in BASH_ENV ENV do ($as_unset $as_var) >/dev/null 2>&1 && $as_unset $as_var done export CONFIG_SHELL exec "$CONFIG_SHELL" "$as_myself" ${1+"$@"} fi if test $as_have_required = no; then echo This script requires a shell more modern than all the echo shells that I found on your system. Please install a echo modern shell, or manually run the script under such a echo shell if you do have one. { (exit 1); exit 1; } fi fi fi (eval "as_func_return () { (exit \$1) } as_func_success () { as_func_return 0 } as_func_failure () { as_func_return 1 } as_func_ret_success () { return 0 } as_func_ret_failure () { return 1 } exitcode=0 if as_func_success; then : else exitcode=1 echo as_func_success failed. fi if as_func_failure; then exitcode=1 echo as_func_failure succeeded. fi if as_func_ret_success; then : else exitcode=1 echo as_func_ret_success failed. fi if as_func_ret_failure; then exitcode=1 echo as_func_ret_failure succeeded. fi if ( set x; as_func_ret_success y && test x = \"\$1\" ); then : else exitcode=1 echo positional parameters were not saved. fi test \$exitcode = 0") || { echo No shell found that supports shell functions. echo Please tell autoconf@gnu.org about your system, echo including any error possibly output before this echo message } as_lineno_1=$LINENO as_lineno_2=$LINENO test "x$as_lineno_1" != "x$as_lineno_2" && test "x`expr $as_lineno_1 + 1`" = "x$as_lineno_2" || { # Create $as_me.lineno as a copy of $as_myself, but with $LINENO # uniformly replaced by the line number. The first 'sed' inserts a # line-number line after each line using $LINENO; the second 'sed' # does the real work. The second script uses 'N' to pair each # line-number line with the line containing $LINENO, and appends # trailing '-' during substitution so that $LINENO is not a special # case at line end. # (Raja R Harinath suggested sed '=', and Paul Eggert wrote the # scripts with optimization help from Paolo Bonzini. Blame Lee # E. McMahon (1931-1989) for sed's syntax. :-) sed -n ' p /[$]LINENO/= ' <$as_myself | sed ' s/[$]LINENO.*/&-/ t lineno b :lineno N :loop s/[$]LINENO\([^'$as_cr_alnum'_].*\n\)\(.*\)/\2\1\2/ t loop s/-\n.*// ' >$as_me.lineno && chmod +x "$as_me.lineno" || { echo "$as_me: error: cannot create $as_me.lineno; rerun with a POSIX shell" >&2 { (exit 1); exit 1; }; } # Don't try to exec as it changes $[0], causing all sort of problems # (the dirname of $[0] is not the place where we might find the # original and so on. Autoconf is especially sensitive to this). . "./$as_me.lineno" # Exit status is that of the last command. exit } if (as_dir=`dirname -- /` && test "X$as_dir" = X/) >/dev/null 2>&1; then as_dirname=dirname else as_dirname=false fi ECHO_C= ECHO_N= ECHO_T= case `echo -n x` in -n*) case `echo 'x\c'` in *c*) ECHO_T=' ';; # ECHO_T is single tab character. *) ECHO_C='\c';; esac;; *) ECHO_N='-n';; esac if expr a : '\(a\)' >/dev/null 2>&1 && test "X`expr 00001 : '.*\(...\)'`" = X001; then as_expr=expr else as_expr=false fi rm -f conf$$ conf$$.exe conf$$.file if test -d conf$$.dir; then rm -f conf$$.dir/conf$$.file else rm -f conf$$.dir mkdir conf$$.dir fi echo >conf$$.file if ln -s conf$$.file conf$$ 2>/dev/null; then as_ln_s='ln -s' # ... but there are two gotchas: # 1) On MSYS, both `ln -s file dir' and `ln file dir' fail. # 2) DJGPP < 2.04 has no symlinks; `ln -s' creates a wrapper executable. # In both cases, we have to default to `cp -p'. ln -s conf$$.file conf$$.dir 2>/dev/null && test ! -f conf$$.exe || as_ln_s='cp -p' elif ln conf$$.file conf$$ 2>/dev/null; then as_ln_s=ln else as_ln_s='cp -p' fi rm -f conf$$ conf$$.exe conf$$.dir/conf$$.file conf$$.file rmdir conf$$.dir 2>/dev/null if mkdir -p . 2>/dev/null; then as_mkdir_p=: else test -d ./-p && rmdir ./-p as_mkdir_p=false fi if test -x / >/dev/null 2>&1; then as_test_x='test -x' else if ls -dL / >/dev/null 2>&1; then as_ls_L_option=L else as_ls_L_option= fi as_test_x=' eval sh -c '\'' if test -d "$1"; then test -d "$1/."; else case $1 in -*)set "./$1";; esac; case `ls -ld'$as_ls_L_option' "$1" 2>/dev/null` in ???[sx]*):;;*)false;;esac;fi '\'' sh ' fi as_executable_p=$as_test_x # Sed expression to map a string onto a valid CPP name. as_tr_cpp="eval sed 'y%*$as_cr_letters%P$as_cr_LETTERS%;s%[^_$as_cr_alnum]%_%g'" # Sed expression to map a string onto a valid variable name. as_tr_sh="eval sed 'y%*+%pp%;s%[^_$as_cr_alnum]%_%g'" exec 7<&0 &1 # Name of the host. # hostname on some systems (SVR3.2, Linux) returns a bogus exit status, # so uname gets run too. ac_hostname=`(hostname || uname -n) 2>/dev/null | sed 1q` # # Initializations. # ac_default_prefix=/usr/local ac_clean_files= ac_config_libobj_dir=. LIBOBJS= cross_compiling=no subdirs= MFLAGS= MAKEFLAGS= SHELL=${CONFIG_SHELL-/bin/sh} # Identity of this package. PACKAGE_NAME= PACKAGE_TARNAME= PACKAGE_VERSION= PACKAGE_STRING= PACKAGE_BUGREPORT= ac_unique_file="src/sequence/sequence.c" # Factoring default headers for most tests. ac_includes_default="\ #include #ifdef HAVE_SYS_TYPES_H # include #endif #ifdef HAVE_SYS_STAT_H # include #endif #ifdef STDC_HEADERS # include # include #else # ifdef HAVE_STDLIB_H # include # endif #endif #ifdef HAVE_STRING_H # if !defined STDC_HEADERS && defined HAVE_MEMORY_H # include # endif # include #endif #ifdef HAVE_STRINGS_H # include #endif #ifdef HAVE_INTTYPES_H # include #endif #ifdef HAVE_STDINT_H # include #endif #ifdef HAVE_UNISTD_H # include #endif" ac_subst_vars='SHELL PATH_SEPARATOR PACKAGE_NAME PACKAGE_TARNAME PACKAGE_VERSION PACKAGE_STRING PACKAGE_BUGREPORT exec_prefix prefix program_transform_name bindir sbindir libexecdir datarootdir datadir sysconfdir sharedstatedir localstatedir includedir oldincludedir docdir infodir htmldir dvidir pdfdir psdir libdir localedir mandir DEFS ECHO_C ECHO_N ECHO_T LIBS build_alias host_alias target_alias INSTALL_PROGRAM INSTALL_SCRIPT INSTALL_DATA PACKAGE VERSION ACLOCAL AUTOCONF AUTOMAKE AUTOHEADER MAKEINFO SET_MAKE CC CFLAGS LDFLAGS CPPFLAGS ac_ct_CC EXEEXT OBJEXT CPP GREP EGREP build build_cpu build_vendor build_os host host_cpu host_vendor host_os custom_guint64_format source_root_dir PKG_CONFIG GLIB_CONFIG glib_libs glib_cflags codegen_extra_sources codegen_extra_ldadd installed_util_list use_pthreads LIBOBJS LTLIBOBJS' ac_subst_files='' ac_precious_vars='build_alias host_alias target_alias CC CFLAGS LDFLAGS LIBS CPPFLAGS CPP' # Initialize some variables set by options. ac_init_help= ac_init_version=false # The variables have the same names as the options, with # dashes changed to underlines. cache_file=/dev/null exec_prefix=NONE no_create= no_recursion= prefix=NONE program_prefix=NONE program_suffix=NONE program_transform_name=s,x,x, silent= site= srcdir= verbose= x_includes=NONE x_libraries=NONE # Installation directory options. # These are left unexpanded so users can "make install exec_prefix=/foo" # and all the variables that are supposed to be based on exec_prefix # by default will actually change. # Use braces instead of parens because sh, perl, etc. also accept them. # (The list follows the same order as the GNU Coding Standards.) bindir='${exec_prefix}/bin' sbindir='${exec_prefix}/sbin' libexecdir='${exec_prefix}/libexec' datarootdir='${prefix}/share' datadir='${datarootdir}' sysconfdir='${prefix}/etc' sharedstatedir='${prefix}/com' localstatedir='${prefix}/var' includedir='${prefix}/include' oldincludedir='/usr/include' docdir='${datarootdir}/doc/${PACKAGE}' infodir='${datarootdir}/info' htmldir='${docdir}' dvidir='${docdir}' pdfdir='${docdir}' psdir='${docdir}' libdir='${exec_prefix}/lib' localedir='${datarootdir}/locale' mandir='${datarootdir}/man' ac_prev= ac_dashdash= for ac_option do # If the previous option needs an argument, assign it. if test -n "$ac_prev"; then eval $ac_prev=\$ac_option ac_prev= continue fi case $ac_option in *=*) ac_optarg=`expr "X$ac_option" : '[^=]*=\(.*\)'` ;; *) ac_optarg=yes ;; esac # Accept the important Cygnus configure options, so we can diagnose typos. case $ac_dashdash$ac_option in --) ac_dashdash=yes ;; -bindir | --bindir | --bindi | --bind | --bin | --bi) ac_prev=bindir ;; -bindir=* | --bindir=* | --bindi=* | --bind=* | --bin=* | --bi=*) bindir=$ac_optarg ;; -build | --build | --buil | --bui | --bu) ac_prev=build_alias ;; -build=* | --build=* | --buil=* | --bui=* | --bu=*) build_alias=$ac_optarg ;; -cache-file | --cache-file | --cache-fil | --cache-fi \ | --cache-f | --cache- | --cache | --cach | --cac | --ca | --c) ac_prev=cache_file ;; -cache-file=* | --cache-file=* | --cache-fil=* | --cache-fi=* \ | --cache-f=* | --cache-=* | --cache=* | --cach=* | --cac=* | --ca=* | --c=*) cache_file=$ac_optarg ;; --config-cache | -C) cache_file=config.cache ;; -datadir | --datadir | --datadi | --datad) ac_prev=datadir ;; -datadir=* | --datadir=* | --datadi=* | --datad=*) datadir=$ac_optarg ;; -datarootdir | --datarootdir | --datarootdi | --datarootd | --dataroot \ | --dataroo | --dataro | --datar) ac_prev=datarootdir ;; -datarootdir=* | --datarootdir=* | --datarootdi=* | --datarootd=* \ | --dataroot=* | --dataroo=* | --dataro=* | --datar=*) datarootdir=$ac_optarg ;; -disable-* | --disable-*) ac_feature=`expr "x$ac_option" : 'x-*disable-\(.*\)'` # Reject names that are not valid shell variable names. expr "x$ac_feature" : ".*[^-._$as_cr_alnum]" >/dev/null && { echo "$as_me: error: invalid feature name: $ac_feature" >&2 { (exit 1); exit 1; }; } ac_feature=`echo $ac_feature | sed 's/[-.]/_/g'` eval enable_$ac_feature=no ;; -docdir | --docdir | --docdi | --doc | --do) ac_prev=docdir ;; -docdir=* | --docdir=* | --docdi=* | --doc=* | --do=*) docdir=$ac_optarg ;; -dvidir | --dvidir | --dvidi | --dvid | --dvi | --dv) ac_prev=dvidir ;; -dvidir=* | --dvidir=* | --dvidi=* | --dvid=* | --dvi=* | --dv=*) dvidir=$ac_optarg ;; -enable-* | --enable-*) ac_feature=`expr "x$ac_option" : 'x-*enable-\([^=]*\)'` # Reject names that are not valid shell variable names. expr "x$ac_feature" : ".*[^-._$as_cr_alnum]" >/dev/null && { echo "$as_me: error: invalid feature name: $ac_feature" >&2 { (exit 1); exit 1; }; } ac_feature=`echo $ac_feature | sed 's/[-.]/_/g'` eval enable_$ac_feature=\$ac_optarg ;; -exec-prefix | --exec_prefix | --exec-prefix | --exec-prefi \ | --exec-pref | --exec-pre | --exec-pr | --exec-p | --exec- \ | --exec | --exe | --ex) ac_prev=exec_prefix ;; -exec-prefix=* | --exec_prefix=* | --exec-prefix=* | --exec-prefi=* \ | --exec-pref=* | --exec-pre=* | --exec-pr=* | --exec-p=* | --exec-=* \ | --exec=* | --exe=* | --ex=*) exec_prefix=$ac_optarg ;; -gas | --gas | --ga | --g) # Obsolete; use --with-gas. with_gas=yes ;; -help | --help | --hel | --he | -h) ac_init_help=long ;; -help=r* | --help=r* | --hel=r* | --he=r* | -hr*) ac_init_help=recursive ;; -help=s* | --help=s* | --hel=s* | --he=s* | -hs*) ac_init_help=short ;; -host | --host | --hos | --ho) ac_prev=host_alias ;; -host=* | --host=* | --hos=* | --ho=*) host_alias=$ac_optarg ;; -htmldir | --htmldir | --htmldi | --htmld | --html | --htm | --ht) ac_prev=htmldir ;; -htmldir=* | --htmldir=* | --htmldi=* | --htmld=* | --html=* | --htm=* \ | --ht=*) htmldir=$ac_optarg ;; -includedir | --includedir | --includedi | --included | --include \ | --includ | --inclu | --incl | --inc) ac_prev=includedir ;; -includedir=* | --includedir=* | --includedi=* | --included=* | --include=* \ | --includ=* | --inclu=* | --incl=* | --inc=*) includedir=$ac_optarg ;; -infodir | --infodir | --infodi | --infod | --info | --inf) ac_prev=infodir ;; -infodir=* | --infodir=* | --infodi=* | --infod=* | --info=* | --inf=*) infodir=$ac_optarg ;; -libdir | --libdir | --libdi | --libd) ac_prev=libdir ;; -libdir=* | --libdir=* | --libdi=* | --libd=*) libdir=$ac_optarg ;; -libexecdir | --libexecdir | --libexecdi | --libexecd | --libexec \ | --libexe | --libex | --libe) ac_prev=libexecdir ;; -libexecdir=* | --libexecdir=* | --libexecdi=* | --libexecd=* | --libexec=* \ | --libexe=* | --libex=* | --libe=*) libexecdir=$ac_optarg ;; -localedir | --localedir | --localedi | --localed | --locale) ac_prev=localedir ;; -localedir=* | --localedir=* | --localedi=* | --localed=* | --locale=*) localedir=$ac_optarg ;; -localstatedir | --localstatedir | --localstatedi | --localstated \ | --localstate | --localstat | --localsta | --localst | --locals) ac_prev=localstatedir ;; -localstatedir=* | --localstatedir=* | --localstatedi=* | --localstated=* \ | --localstate=* | --localstat=* | --localsta=* | --localst=* | --locals=*) localstatedir=$ac_optarg ;; -mandir | --mandir | --mandi | --mand | --man | --ma | --m) ac_prev=mandir ;; -mandir=* | --mandir=* | --mandi=* | --mand=* | --man=* | --ma=* | --m=*) mandir=$ac_optarg ;; -nfp | --nfp | --nf) # Obsolete; use --without-fp. with_fp=no ;; -no-create | --no-create | --no-creat | --no-crea | --no-cre \ | --no-cr | --no-c | -n) no_create=yes ;; -no-recursion | --no-recursion | --no-recursio | --no-recursi \ | --no-recurs | --no-recur | --no-recu | --no-rec | --no-re | --no-r) no_recursion=yes ;; -oldincludedir | --oldincludedir | --oldincludedi | --oldincluded \ | --oldinclude | --oldinclud | --oldinclu | --oldincl | --oldinc \ | --oldin | --oldi | --old | --ol | --o) ac_prev=oldincludedir ;; -oldincludedir=* | --oldincludedir=* | --oldincludedi=* | --oldincluded=* \ | --oldinclude=* | --oldinclud=* | --oldinclu=* | --oldincl=* | --oldinc=* \ | --oldin=* | --oldi=* | --old=* | --ol=* | --o=*) oldincludedir=$ac_optarg ;; -prefix | --prefix | --prefi | --pref | --pre | --pr | --p) ac_prev=prefix ;; -prefix=* | --prefix=* | --prefi=* | --pref=* | --pre=* | --pr=* | --p=*) prefix=$ac_optarg ;; -program-prefix | --program-prefix | --program-prefi | --program-pref \ | --program-pre | --program-pr | --program-p) ac_prev=program_prefix ;; -program-prefix=* | --program-prefix=* | --program-prefi=* \ | --program-pref=* | --program-pre=* | --program-pr=* | --program-p=*) program_prefix=$ac_optarg ;; -program-suffix | --program-suffix | --program-suffi | --program-suff \ | --program-suf | --program-su | --program-s) ac_prev=program_suffix ;; -program-suffix=* | --program-suffix=* | --program-suffi=* \ | --program-suff=* | --program-suf=* | --program-su=* | --program-s=*) program_suffix=$ac_optarg ;; -program-transform-name | --program-transform-name \ | --program-transform-nam | --program-transform-na \ | --program-transform-n | --program-transform- \ | --program-transform | --program-transfor \ | --program-transfo | --program-transf \ | --program-trans | --program-tran \ | --progr-tra | --program-tr | --program-t) ac_prev=program_transform_name ;; -program-transform-name=* | --program-transform-name=* \ | --program-transform-nam=* | --program-transform-na=* \ | --program-transform-n=* | --program-transform-=* \ | --program-transform=* | --program-transfor=* \ | --program-transfo=* | --program-transf=* \ | --program-trans=* | --program-tran=* \ | --progr-tra=* | --program-tr=* | --program-t=*) program_transform_name=$ac_optarg ;; -pdfdir | --pdfdir | --pdfdi | --pdfd | --pdf | --pd) ac_prev=pdfdir ;; -pdfdir=* | --pdfdir=* | --pdfdi=* | --pdfd=* | --pdf=* | --pd=*) pdfdir=$ac_optarg ;; -psdir | --psdir | --psdi | --psd | --ps) ac_prev=psdir ;; -psdir=* | --psdir=* | --psdi=* | --psd=* | --ps=*) psdir=$ac_optarg ;; -q | -quiet | --quiet | --quie | --qui | --qu | --q \ | -silent | --silent | --silen | --sile | --sil) silent=yes ;; -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb) ac_prev=sbindir ;; -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \ | --sbi=* | --sb=*) sbindir=$ac_optarg ;; -sharedstatedir | --sharedstatedir | --sharedstatedi \ | --sharedstated | --sharedstate | --sharedstat | --sharedsta \ | --sharedst | --shareds | --shared | --share | --shar \ | --sha | --sh) ac_prev=sharedstatedir ;; -sharedstatedir=* | --sharedstatedir=* | --sharedstatedi=* \ | --sharedstated=* | --sharedstate=* | --sharedstat=* | --sharedsta=* \ | --sharedst=* | --shareds=* | --shared=* | --share=* | --shar=* \ | --sha=* | --sh=*) sharedstatedir=$ac_optarg ;; -site | --site | --sit) ac_prev=site ;; -site=* | --site=* | --sit=*) site=$ac_optarg ;; -srcdir | --srcdir | --srcdi | --srcd | --src | --sr) ac_prev=srcdir ;; -srcdir=* | --srcdir=* | --srcdi=* | --srcd=* | --src=* | --sr=*) srcdir=$ac_optarg ;; -sysconfdir | --sysconfdir | --sysconfdi | --sysconfd | --sysconf \ | --syscon | --sysco | --sysc | --sys | --sy) ac_prev=sysconfdir ;; -sysconfdir=* | --sysconfdir=* | --sysconfdi=* | --sysconfd=* | --sysconf=* \ | --syscon=* | --sysco=* | --sysc=* | --sys=* | --sy=*) sysconfdir=$ac_optarg ;; -target | --target | --targe | --targ | --tar | --ta | --t) ac_prev=target_alias ;; -target=* | --target=* | --targe=* | --targ=* | --tar=* | --ta=* | --t=*) target_alias=$ac_optarg ;; -v | -verbose | --verbose | --verbos | --verbo | --verb) verbose=yes ;; -version | --version | --versio | --versi | --vers | -V) ac_init_version=: ;; -with-* | --with-*) ac_package=`expr "x$ac_option" : 'x-*with-\([^=]*\)'` # Reject names that are not valid shell variable names. expr "x$ac_package" : ".*[^-._$as_cr_alnum]" >/dev/null && { echo "$as_me: error: invalid package name: $ac_package" >&2 { (exit 1); exit 1; }; } ac_package=`echo $ac_package | sed 's/[-.]/_/g'` eval with_$ac_package=\$ac_optarg ;; -without-* | --without-*) ac_package=`expr "x$ac_option" : 'x-*without-\(.*\)'` # Reject names that are not valid shell variable names. expr "x$ac_package" : ".*[^-._$as_cr_alnum]" >/dev/null && { echo "$as_me: error: invalid package name: $ac_package" >&2 { (exit 1); exit 1; }; } ac_package=`echo $ac_package | sed 's/[-.]/_/g'` eval with_$ac_package=no ;; --x) # Obsolete; use --with-x. with_x=yes ;; -x-includes | --x-includes | --x-include | --x-includ | --x-inclu \ | --x-incl | --x-inc | --x-in | --x-i) ac_prev=x_includes ;; -x-includes=* | --x-includes=* | --x-include=* | --x-includ=* | --x-inclu=* \ | --x-incl=* | --x-inc=* | --x-in=* | --x-i=*) x_includes=$ac_optarg ;; -x-libraries | --x-libraries | --x-librarie | --x-librari \ | --x-librar | --x-libra | --x-libr | --x-lib | --x-li | --x-l) ac_prev=x_libraries ;; -x-libraries=* | --x-libraries=* | --x-librarie=* | --x-librari=* \ | --x-librar=* | --x-libra=* | --x-libr=* | --x-lib=* | --x-li=* | --x-l=*) x_libraries=$ac_optarg ;; -*) { echo "$as_me: error: unrecognized option: $ac_option Try \`$0 --help' for more information." >&2 { (exit 1); exit 1; }; } ;; *=*) ac_envvar=`expr "x$ac_option" : 'x\([^=]*\)='` # Reject names that are not valid shell variable names. expr "x$ac_envvar" : ".*[^_$as_cr_alnum]" >/dev/null && { echo "$as_me: error: invalid variable name: $ac_envvar" >&2 { (exit 1); exit 1; }; } eval $ac_envvar=\$ac_optarg export $ac_envvar ;; *) # FIXME: should be removed in autoconf 3.0. echo "$as_me: WARNING: you should use --build, --host, --target" >&2 expr "x$ac_option" : ".*[^-._$as_cr_alnum]" >/dev/null && echo "$as_me: WARNING: invalid host type: $ac_option" >&2 : ${build_alias=$ac_option} ${host_alias=$ac_option} ${target_alias=$ac_option} ;; esac done if test -n "$ac_prev"; then ac_option=--`echo $ac_prev | sed 's/_/-/g'` { echo "$as_me: error: missing argument to $ac_option" >&2 { (exit 1); exit 1; }; } fi # Be sure to have absolute directory names. for ac_var in exec_prefix prefix bindir sbindir libexecdir datarootdir \ datadir sysconfdir sharedstatedir localstatedir includedir \ oldincludedir docdir infodir htmldir dvidir pdfdir psdir \ libdir localedir mandir do eval ac_val=\$$ac_var case $ac_val in [\\/$]* | ?:[\\/]* ) continue;; NONE | '' ) case $ac_var in *prefix ) continue;; esac;; esac { echo "$as_me: error: expected an absolute directory name for --$ac_var: $ac_val" >&2 { (exit 1); exit 1; }; } done # There might be people who depend on the old broken behavior: `$host' # used to hold the argument of --host etc. # FIXME: To remove some day. build=$build_alias host=$host_alias target=$target_alias # FIXME: To remove some day. if test "x$host_alias" != x; then if test "x$build_alias" = x; then cross_compiling=maybe echo "$as_me: WARNING: If you wanted to set the --build type, don't use --host. If a cross compiler is detected then cross compile mode will be used." >&2 elif test "x$build_alias" != "x$host_alias"; then cross_compiling=yes fi fi ac_tool_prefix= test -n "$host_alias" && ac_tool_prefix=$host_alias- test "$silent" = yes && exec 6>/dev/null ac_pwd=`pwd` && test -n "$ac_pwd" && ac_ls_di=`ls -di .` && ac_pwd_ls_di=`cd "$ac_pwd" && ls -di .` || { echo "$as_me: error: Working directory cannot be determined" >&2 { (exit 1); exit 1; }; } test "X$ac_ls_di" = "X$ac_pwd_ls_di" || { echo "$as_me: error: pwd does not report name of working directory" >&2 { (exit 1); exit 1; }; } # Find the source files, if location was not specified. if test -z "$srcdir"; then ac_srcdir_defaulted=yes # Try the directory containing this script, then the parent directory. ac_confdir=`$as_dirname -- "$0" || $as_expr X"$0" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$0" : 'X\(//\)[^/]' \| \ X"$0" : 'X\(//\)$' \| \ X"$0" : 'X\(/\)' \| . 2>/dev/null || echo X"$0" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q } /^X\(\/\/\)[^/].*/{ s//\1/ q } /^X\(\/\/\)$/{ s//\1/ q } /^X\(\/\).*/{ s//\1/ q } s/.*/./; q'` srcdir=$ac_confdir if test ! -r "$srcdir/$ac_unique_file"; then srcdir=.. fi else ac_srcdir_defaulted=no fi if test ! -r "$srcdir/$ac_unique_file"; then test "$ac_srcdir_defaulted" = yes && srcdir="$ac_confdir or .." { echo "$as_me: error: cannot find sources ($ac_unique_file) in $srcdir" >&2 { (exit 1); exit 1; }; } fi ac_msg="sources are in $srcdir, but \`cd $srcdir' does not work" ac_abs_confdir=`( cd "$srcdir" && test -r "./$ac_unique_file" || { echo "$as_me: error: $ac_msg" >&2 { (exit 1); exit 1; }; } pwd)` # When building in place, set srcdir=. if test "$ac_abs_confdir" = "$ac_pwd"; then srcdir=. fi # Remove unnecessary trailing slashes from srcdir. # Double slashes in file names in object file debugging info # mess up M-x gdb in Emacs. case $srcdir in */) srcdir=`expr "X$srcdir" : 'X\(.*[^/]\)' \| "X$srcdir" : 'X\(.*\)'`;; esac for ac_var in $ac_precious_vars; do eval ac_env_${ac_var}_set=\${${ac_var}+set} eval ac_env_${ac_var}_value=\$${ac_var} eval ac_cv_env_${ac_var}_set=\${${ac_var}+set} eval ac_cv_env_${ac_var}_value=\$${ac_var} done # # Report the --help message. # if test "$ac_init_help" = "long"; then # Omit some internal or obsolete options to make the list less imposing. # This message is too long to be a string in the A/UX 3.1 sh. cat <<_ACEOF \`configure' configures this package to adapt to many kinds of systems. Usage: $0 [OPTION]... [VAR=VALUE]... To assign environment variables (e.g., CC, CFLAGS...), specify them as VAR=VALUE. See below for descriptions of some of the useful variables. Defaults for the options are specified in brackets. Configuration: -h, --help display this help and exit --help=short display options specific to this package --help=recursive display the short help of all the included packages -V, --version display version information and exit -q, --quiet, --silent do not print \`checking...' messages --cache-file=FILE cache test results in FILE [disabled] -C, --config-cache alias for \`--cache-file=config.cache' -n, --no-create do not create output files --srcdir=DIR find the sources in DIR [configure dir or \`..'] Installation directories: --prefix=PREFIX install architecture-independent files in PREFIX [$ac_default_prefix] --exec-prefix=EPREFIX install architecture-dependent files in EPREFIX [PREFIX] By default, \`make install' will install all the files in \`$ac_default_prefix/bin', \`$ac_default_prefix/lib' etc. You can specify an installation prefix other than \`$ac_default_prefix' using \`--prefix', for instance \`--prefix=\$HOME'. For better control, use the options below. Fine tuning of the installation directories: --bindir=DIR user executables [EPREFIX/bin] --sbindir=DIR system admin executables [EPREFIX/sbin] --libexecdir=DIR program executables [EPREFIX/libexec] --sysconfdir=DIR read-only single-machine data [PREFIX/etc] --sharedstatedir=DIR modifiable architecture-independent data [PREFIX/com] --localstatedir=DIR modifiable single-machine data [PREFIX/var] --libdir=DIR object code libraries [EPREFIX/lib] --includedir=DIR C header files [PREFIX/include] --oldincludedir=DIR C header files for non-gcc [/usr/include] --datarootdir=DIR read-only arch.-independent data root [PREFIX/share] --datadir=DIR read-only architecture-independent data [DATAROOTDIR] --infodir=DIR info documentation [DATAROOTDIR/info] --localedir=DIR locale-dependent data [DATAROOTDIR/locale] --mandir=DIR man documentation [DATAROOTDIR/man] --docdir=DIR documentation root [DATAROOTDIR/doc/PACKAGE] --htmldir=DIR html documentation [DOCDIR] --dvidir=DIR dvi documentation [DOCDIR] --pdfdir=DIR pdf documentation [DOCDIR] --psdir=DIR ps documentation [DOCDIR] _ACEOF cat <<\_ACEOF Program names: --program-prefix=PREFIX prepend PREFIX to installed program names --program-suffix=SUFFIX append SUFFIX to installed program names --program-transform-name=PROGRAM run sed PROGRAM on installed program names System types: --build=BUILD configure for building on BUILD [guessed] --host=HOST cross-compile to build programs to run on HOST [BUILD] _ACEOF fi if test -n "$ac_init_help"; then cat <<\_ACEOF Optional Features: --disable-FEATURE do not include FEATURE (same as --enable-FEATURE=no) --enable-FEATURE[=ARG] include FEATURE [ARG=yes] --enable-glib2 Use glib2 library --disable-glib2 Do not glib2 (use glib1 instead) --enable-assert Use code for assertion tests --disable-assert Do not use assertion code --enable-paranoia Use paranoid compilation options --disable-paranoia Do not use paranoid compilation options --enable-gprof Generate executables suitable for gprof profiling --disable-gprof Do not create executables for gprof profiling --enable-gcov Generate executables suitable for gcov analysis --disable-gcov Do not create executables for gcov analysis --enable-largefile Enable largefile support on 32 bit systems --disable-largefile Do not enable largefile support on 32 bit systems --enable-compiledmodels Use compiled C4 models --disable-compiledmodels Do not use compiled C4 models --enable-utilities Install all utilities --disable-utilities Do not install utilities --enable-pthreads Use pthreads for exonerate-server --disable-pthreads Do not use pthreads for exonerate-server Some influential environment variables: CC C compiler command CFLAGS C compiler flags LDFLAGS linker flags, e.g. -L if you have libraries in a nonstandard directory LIBS libraries to pass to the linker, e.g. -l CPPFLAGS C/C++/Objective C preprocessor flags, e.g. -I if you have headers in a nonstandard directory CPP C preprocessor Use these variables to override the choices made by `configure' or to help it to find libraries and programs with nonstandard names/locations. _ACEOF ac_status=$? fi if test "$ac_init_help" = "recursive"; then # If there are subdirs, report their specific --help. for ac_dir in : $ac_subdirs_all; do test "x$ac_dir" = x: && continue test -d "$ac_dir" || continue ac_builddir=. case "$ac_dir" in .) ac_dir_suffix= ac_top_builddir_sub=. ac_top_build_prefix= ;; *) ac_dir_suffix=/`echo "$ac_dir" | sed 's,^\.[\\/],,'` # A ".." for each directory in $ac_dir_suffix. ac_top_builddir_sub=`echo "$ac_dir_suffix" | sed 's,/[^\\/]*,/..,g;s,/,,'` case $ac_top_builddir_sub in "") ac_top_builddir_sub=. ac_top_build_prefix= ;; *) ac_top_build_prefix=$ac_top_builddir_sub/ ;; esac ;; esac ac_abs_top_builddir=$ac_pwd ac_abs_builddir=$ac_pwd$ac_dir_suffix # for backward compatibility: ac_top_builddir=$ac_top_build_prefix case $srcdir in .) # We are building in place. ac_srcdir=. ac_top_srcdir=$ac_top_builddir_sub ac_abs_top_srcdir=$ac_pwd ;; [\\/]* | ?:[\\/]* ) # Absolute name. ac_srcdir=$srcdir$ac_dir_suffix; ac_top_srcdir=$srcdir ac_abs_top_srcdir=$srcdir ;; *) # Relative name. ac_srcdir=$ac_top_build_prefix$srcdir$ac_dir_suffix ac_top_srcdir=$ac_top_build_prefix$srcdir ac_abs_top_srcdir=$ac_pwd/$srcdir ;; esac ac_abs_srcdir=$ac_abs_top_srcdir$ac_dir_suffix cd "$ac_dir" || { ac_status=$?; continue; } # Check for guested configure. if test -f "$ac_srcdir/configure.gnu"; then echo && $SHELL "$ac_srcdir/configure.gnu" --help=recursive elif test -f "$ac_srcdir/configure"; then echo && $SHELL "$ac_srcdir/configure" --help=recursive else echo "$as_me: WARNING: no configuration information is in $ac_dir" >&2 fi || ac_status=$? cd "$ac_pwd" || { ac_status=$?; break; } done fi test -n "$ac_init_help" && exit $ac_status if $ac_init_version; then cat <<\_ACEOF configure generated by GNU Autoconf 2.61 Copyright (C) 1992, 1993, 1994, 1995, 1996, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc. This configure script is free software; the Free Software Foundation gives unlimited permission to copy, distribute and modify it. _ACEOF exit fi cat >config.log <<_ACEOF This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. It was created by $as_me, which was generated by GNU Autoconf 2.61. Invocation command line was $ $0 $@ _ACEOF exec 5>>config.log { cat <<_ASUNAME ## --------- ## ## Platform. ## ## --------- ## hostname = `(hostname || uname -n) 2>/dev/null | sed 1q` uname -m = `(uname -m) 2>/dev/null || echo unknown` uname -r = `(uname -r) 2>/dev/null || echo unknown` uname -s = `(uname -s) 2>/dev/null || echo unknown` uname -v = `(uname -v) 2>/dev/null || echo unknown` /usr/bin/uname -p = `(/usr/bin/uname -p) 2>/dev/null || echo unknown` /bin/uname -X = `(/bin/uname -X) 2>/dev/null || echo unknown` /bin/arch = `(/bin/arch) 2>/dev/null || echo unknown` /usr/bin/arch -k = `(/usr/bin/arch -k) 2>/dev/null || echo unknown` /usr/convex/getsysinfo = `(/usr/convex/getsysinfo) 2>/dev/null || echo unknown` /usr/bin/hostinfo = `(/usr/bin/hostinfo) 2>/dev/null || echo unknown` /bin/machine = `(/bin/machine) 2>/dev/null || echo unknown` /usr/bin/oslevel = `(/usr/bin/oslevel) 2>/dev/null || echo unknown` /bin/universe = `(/bin/universe) 2>/dev/null || echo unknown` _ASUNAME as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. echo "PATH: $as_dir" done IFS=$as_save_IFS } >&5 cat >&5 <<_ACEOF ## ----------- ## ## Core tests. ## ## ----------- ## _ACEOF # Keep a trace of the command line. # Strip out --no-create and --no-recursion so they do not pile up. # Strip out --silent because we don't want to record it for future runs. # Also quote any args containing shell meta-characters. # Make two passes to allow for proper duplicate-argument suppression. ac_configure_args= ac_configure_args0= ac_configure_args1= ac_must_keep_next=false for ac_pass in 1 2 do for ac_arg do case $ac_arg in -no-create | --no-c* | -n | -no-recursion | --no-r*) continue ;; -q | -quiet | --quiet | --quie | --qui | --qu | --q \ | -silent | --silent | --silen | --sile | --sil) continue ;; *\'*) ac_arg=`echo "$ac_arg" | sed "s/'/'\\\\\\\\''/g"` ;; esac case $ac_pass in 1) ac_configure_args0="$ac_configure_args0 '$ac_arg'" ;; 2) ac_configure_args1="$ac_configure_args1 '$ac_arg'" if test $ac_must_keep_next = true; then ac_must_keep_next=false # Got value, back to normal. else case $ac_arg in *=* | --config-cache | -C | -disable-* | --disable-* \ | -enable-* | --enable-* | -gas | --g* | -nfp | --nf* \ | -q | -quiet | --q* | -silent | --sil* | -v | -verb* \ | -with-* | --with-* | -without-* | --without-* | --x) case "$ac_configure_args0 " in "$ac_configure_args1"*" '$ac_arg' "* ) continue ;; esac ;; -* ) ac_must_keep_next=true ;; esac fi ac_configure_args="$ac_configure_args '$ac_arg'" ;; esac done done $as_unset ac_configure_args0 || test "${ac_configure_args0+set}" != set || { ac_configure_args0=; export ac_configure_args0; } $as_unset ac_configure_args1 || test "${ac_configure_args1+set}" != set || { ac_configure_args1=; export ac_configure_args1; } # When interrupted or exit'd, cleanup temporary files, and complete # config.log. We remove comments because anyway the quotes in there # would cause problems or look ugly. # WARNING: Use '\'' to represent an apostrophe within the trap. # WARNING: Do not start the trap code with a newline, due to a FreeBSD 4.0 bug. trap 'exit_status=$? # Save into config.log some information that might help in debugging. { echo cat <<\_ASBOX ## ---------------- ## ## Cache variables. ## ## ---------------- ## _ASBOX echo # The following way of writing the cache mishandles newlines in values, ( for ac_var in `(set) 2>&1 | sed -n '\''s/^\([a-zA-Z_][a-zA-Z0-9_]*\)=.*/\1/p'\''`; do eval ac_val=\$$ac_var case $ac_val in #( *${as_nl}*) case $ac_var in #( *_cv_*) { echo "$as_me:$LINENO: WARNING: Cache variable $ac_var contains a newline." >&5 echo "$as_me: WARNING: Cache variable $ac_var contains a newline." >&2;} ;; esac case $ac_var in #( _ | IFS | as_nl) ;; #( *) $as_unset $ac_var ;; esac ;; esac done (set) 2>&1 | case $as_nl`(ac_space='\'' '\''; set) 2>&1` in #( *${as_nl}ac_space=\ *) sed -n \ "s/'\''/'\''\\\\'\'''\''/g; s/^\\([_$as_cr_alnum]*_cv_[_$as_cr_alnum]*\\)=\\(.*\\)/\\1='\''\\2'\''/p" ;; #( *) sed -n "/^[_$as_cr_alnum]*_cv_[_$as_cr_alnum]*=/p" ;; esac | sort ) echo cat <<\_ASBOX ## ----------------- ## ## Output variables. ## ## ----------------- ## _ASBOX echo for ac_var in $ac_subst_vars do eval ac_val=\$$ac_var case $ac_val in *\'\''*) ac_val=`echo "$ac_val" | sed "s/'\''/'\''\\\\\\\\'\'''\''/g"`;; esac echo "$ac_var='\''$ac_val'\''" done | sort echo if test -n "$ac_subst_files"; then cat <<\_ASBOX ## ------------------- ## ## File substitutions. ## ## ------------------- ## _ASBOX echo for ac_var in $ac_subst_files do eval ac_val=\$$ac_var case $ac_val in *\'\''*) ac_val=`echo "$ac_val" | sed "s/'\''/'\''\\\\\\\\'\'''\''/g"`;; esac echo "$ac_var='\''$ac_val'\''" done | sort echo fi if test -s confdefs.h; then cat <<\_ASBOX ## ----------- ## ## confdefs.h. ## ## ----------- ## _ASBOX echo cat confdefs.h echo fi test "$ac_signal" != 0 && echo "$as_me: caught signal $ac_signal" echo "$as_me: exit $exit_status" } >&5 rm -f core *.core core.conftest.* && rm -f -r conftest* confdefs* conf$$* $ac_clean_files && exit $exit_status ' 0 for ac_signal in 1 2 13 15; do trap 'ac_signal='$ac_signal'; { (exit 1); exit 1; }' $ac_signal done ac_signal=0 # confdefs.h avoids OS command line length limits that DEFS can exceed. rm -f -r conftest* confdefs.h # Predefined preprocessor variables. cat >>confdefs.h <<_ACEOF #define PACKAGE_NAME "$PACKAGE_NAME" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_TARNAME "$PACKAGE_TARNAME" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_VERSION "$PACKAGE_VERSION" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_STRING "$PACKAGE_STRING" _ACEOF cat >>confdefs.h <<_ACEOF #define PACKAGE_BUGREPORT "$PACKAGE_BUGREPORT" _ACEOF # Let the site file select an alternate cache file if it wants to. # Prefer explicitly selected file to automatically selected ones. if test -n "$CONFIG_SITE"; then set x "$CONFIG_SITE" elif test "x$prefix" != xNONE; then set x "$prefix/share/config.site" "$prefix/etc/config.site" else set x "$ac_default_prefix/share/config.site" \ "$ac_default_prefix/etc/config.site" fi shift for ac_site_file do if test -r "$ac_site_file"; then { echo "$as_me:$LINENO: loading site script $ac_site_file" >&5 echo "$as_me: loading site script $ac_site_file" >&6;} sed 's/^/| /' "$ac_site_file" >&5 . "$ac_site_file" fi done if test -r "$cache_file"; then # Some versions of bash will fail to source /dev/null (special # files actually), so we avoid doing that. if test -f "$cache_file"; then { echo "$as_me:$LINENO: loading cache $cache_file" >&5 echo "$as_me: loading cache $cache_file" >&6;} case $cache_file in [\\/]* | ?:[\\/]* ) . "$cache_file";; *) . "./$cache_file";; esac fi else { echo "$as_me:$LINENO: creating cache $cache_file" >&5 echo "$as_me: creating cache $cache_file" >&6;} >$cache_file fi # Check that the precious variables saved in the cache have kept the same # value. ac_cache_corrupted=false for ac_var in $ac_precious_vars; do eval ac_old_set=\$ac_cv_env_${ac_var}_set eval ac_new_set=\$ac_env_${ac_var}_set eval ac_old_val=\$ac_cv_env_${ac_var}_value eval ac_new_val=\$ac_env_${ac_var}_value case $ac_old_set,$ac_new_set in set,) { echo "$as_me:$LINENO: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&5 echo "$as_me: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&2;} ac_cache_corrupted=: ;; ,set) { echo "$as_me:$LINENO: error: \`$ac_var' was not set in the previous run" >&5 echo "$as_me: error: \`$ac_var' was not set in the previous run" >&2;} ac_cache_corrupted=: ;; ,);; *) if test "x$ac_old_val" != "x$ac_new_val"; then { echo "$as_me:$LINENO: error: \`$ac_var' has changed since the previous run:" >&5 echo "$as_me: error: \`$ac_var' has changed since the previous run:" >&2;} { echo "$as_me:$LINENO: former value: $ac_old_val" >&5 echo "$as_me: former value: $ac_old_val" >&2;} { echo "$as_me:$LINENO: current value: $ac_new_val" >&5 echo "$as_me: current value: $ac_new_val" >&2;} ac_cache_corrupted=: fi;; esac # Pass precious variables to config.status. if test "$ac_new_set" = set; then case $ac_new_val in *\'*) ac_arg=$ac_var=`echo "$ac_new_val" | sed "s/'/'\\\\\\\\''/g"` ;; *) ac_arg=$ac_var=$ac_new_val ;; esac case " $ac_configure_args " in *" '$ac_arg' "*) ;; # Avoid dups. Use of quotes ensures accuracy. *) ac_configure_args="$ac_configure_args '$ac_arg'" ;; esac fi done if $ac_cache_corrupted; then { echo "$as_me:$LINENO: error: changes in the environment can compromise the build" >&5 echo "$as_me: error: changes in the environment can compromise the build" >&2;} { { echo "$as_me:$LINENO: error: run \`make distclean' and/or \`rm $cache_file' and start over" >&5 echo "$as_me: error: run \`make distclean' and/or \`rm $cache_file' and start over" >&2;} { (exit 1); exit 1; }; } fi ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu am__api_version="1.4" ac_aux_dir= for ac_dir in "$srcdir" "$srcdir/.." "$srcdir/../.."; do if test -f "$ac_dir/install-sh"; then ac_aux_dir=$ac_dir ac_install_sh="$ac_aux_dir/install-sh -c" break elif test -f "$ac_dir/install.sh"; then ac_aux_dir=$ac_dir ac_install_sh="$ac_aux_dir/install.sh -c" break elif test -f "$ac_dir/shtool"; then ac_aux_dir=$ac_dir ac_install_sh="$ac_aux_dir/shtool install -c" break fi done if test -z "$ac_aux_dir"; then { { echo "$as_me:$LINENO: error: cannot find install-sh or install.sh in \"$srcdir\" \"$srcdir/..\" \"$srcdir/../..\"" >&5 echo "$as_me: error: cannot find install-sh or install.sh in \"$srcdir\" \"$srcdir/..\" \"$srcdir/../..\"" >&2;} { (exit 1); exit 1; }; } fi # These three variables are undocumented and unsupported, # and are intended to be withdrawn in a future Autoconf release. # They can cause serious problems if a builder's source tree is in a directory # whose full name contains unusual characters. ac_config_guess="$SHELL $ac_aux_dir/config.guess" # Please don't use this var. ac_config_sub="$SHELL $ac_aux_dir/config.sub" # Please don't use this var. ac_configure="$SHELL $ac_aux_dir/configure" # Please don't use this var. # Find a good install program. We prefer a C program (faster), # so one script is as good as another. But avoid the broken or # incompatible versions: # SysV /etc/install, /usr/sbin/install # SunOS /usr/etc/install # IRIX /sbin/install # AIX /bin/install # AmigaOS /C/install, which installs bootblocks on floppy discs # AIX 4 /usr/bin/installbsd, which doesn't work without a -g flag # AFS /usr/afsws/bin/install, which mishandles nonexistent args # SVR4 /usr/ucb/install, which tries to use the nonexistent group "staff" # OS/2's system install, which has a completely different semantic # ./install, which can be erroneously created by make from ./install.sh. { echo "$as_me:$LINENO: checking for a BSD-compatible install" >&5 echo $ECHO_N "checking for a BSD-compatible install... $ECHO_C" >&6; } if test -z "$INSTALL"; then if test "${ac_cv_path_install+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. # Account for people who put trailing slashes in PATH elements. case $as_dir/ in ./ | .// | /cC/* | \ /etc/* | /usr/sbin/* | /usr/etc/* | /sbin/* | /usr/afsws/bin/* | \ ?:\\/os2\\/install\\/* | ?:\\/OS2\\/INSTALL\\/* | \ /usr/ucb/* ) ;; *) # OSF1 and SCO ODT 3.0 have their own names for install. # Don't use installbsd from OSF since it installs stuff as root # by default. for ac_prog in ginstall scoinst install; do for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_prog$ac_exec_ext" && $as_test_x "$as_dir/$ac_prog$ac_exec_ext"; }; then if test $ac_prog = install && grep dspmsg "$as_dir/$ac_prog$ac_exec_ext" >/dev/null 2>&1; then # AIX install. It has an incompatible calling convention. : elif test $ac_prog = install && grep pwplus "$as_dir/$ac_prog$ac_exec_ext" >/dev/null 2>&1; then # program-specific install script used by HP pwplus--don't use. : else ac_cv_path_install="$as_dir/$ac_prog$ac_exec_ext -c" break 3 fi fi done done ;; esac done IFS=$as_save_IFS fi if test "${ac_cv_path_install+set}" = set; then INSTALL=$ac_cv_path_install else # As a last resort, use the slow shell script. Don't cache a # value for INSTALL within a source directory, because that will # break other packages using the cache if that directory is # removed, or if the value is a relative name. INSTALL=$ac_install_sh fi fi { echo "$as_me:$LINENO: result: $INSTALL" >&5 echo "${ECHO_T}$INSTALL" >&6; } # Use test -z because SunOS4 sh mishandles braces in ${var-val}. # It thinks the first close brace ends the variable substitution. test -z "$INSTALL_PROGRAM" && INSTALL_PROGRAM='${INSTALL}' test -z "$INSTALL_SCRIPT" && INSTALL_SCRIPT='${INSTALL}' test -z "$INSTALL_DATA" && INSTALL_DATA='${INSTALL} -m 644' { echo "$as_me:$LINENO: checking whether build environment is sane" >&5 echo $ECHO_N "checking whether build environment is sane... $ECHO_C" >&6; } # Just in case sleep 1 echo timestamp > conftestfile # Do `set' in a subshell so we don't clobber the current shell's # arguments. Must try -L first in case configure is actually a # symlink; some systems play weird games with the mod time of symlinks # (eg FreeBSD returns the mod time of the symlink's containing # directory). if ( set X `ls -Lt $srcdir/configure conftestfile 2> /dev/null` if test "$*" = "X"; then # -L didn't work. set X `ls -t $srcdir/configure conftestfile` fi if test "$*" != "X $srcdir/configure conftestfile" \ && test "$*" != "X conftestfile $srcdir/configure"; then # If neither matched, then we have a broken ls. This can happen # if, for instance, CONFIG_SHELL is bash and it inherits a # broken ls alias from the environment. This has actually # happened. Such a system could not be considered "sane". { { echo "$as_me:$LINENO: error: ls -t appears to fail. Make sure there is not a broken alias in your environment" >&5 echo "$as_me: error: ls -t appears to fail. Make sure there is not a broken alias in your environment" >&2;} { (exit 1); exit 1; }; } fi test "$2" = conftestfile ) then # Ok. : else { { echo "$as_me:$LINENO: error: newly created file is older than distributed files! Check your system clock" >&5 echo "$as_me: error: newly created file is older than distributed files! Check your system clock" >&2;} { (exit 1); exit 1; }; } fi rm -f conftest* { echo "$as_me:$LINENO: result: yes" >&5 echo "${ECHO_T}yes" >&6; } test "$program_prefix" != NONE && program_transform_name="s&^&$program_prefix&;$program_transform_name" # Use a double $ so make ignores it. test "$program_suffix" != NONE && program_transform_name="s&\$&$program_suffix&;$program_transform_name" # Double any \ or $. echo might interpret backslashes. # By default was `s,x,x', remove it if useless. cat <<\_ACEOF >conftest.sed s/[\\$]/&&/g;s/;s,x,x,$// _ACEOF program_transform_name=`echo $program_transform_name | sed -f conftest.sed` rm -f conftest.sed { echo "$as_me:$LINENO: checking whether ${MAKE-make} sets \$(MAKE)" >&5 echo $ECHO_N "checking whether ${MAKE-make} sets \$(MAKE)... $ECHO_C" >&6; } set x ${MAKE-make}; ac_make=`echo "$2" | sed 's/+/p/g; s/[^a-zA-Z0-9_]/_/g'` if { as_var=ac_cv_prog_make_${ac_make}_set; eval "test \"\${$as_var+set}\" = set"; }; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.make <<\_ACEOF SHELL = /bin/sh all: @echo '@@@%%%=$(MAKE)=@@@%%%' _ACEOF # GNU make sometimes prints "make[1]: Entering...", which would confuse us. case `${MAKE-make} -f conftest.make 2>/dev/null` in *@@@%%%=?*=@@@%%%*) eval ac_cv_prog_make_${ac_make}_set=yes;; *) eval ac_cv_prog_make_${ac_make}_set=no;; esac rm -f conftest.make fi if eval test \$ac_cv_prog_make_${ac_make}_set = yes; then { echo "$as_me:$LINENO: result: yes" >&5 echo "${ECHO_T}yes" >&6; } SET_MAKE= else { echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6; } SET_MAKE="MAKE=${MAKE-make}" fi PACKAGE=exonerate VERSION=2.4.0 if test "`cd $srcdir && pwd`" != "`pwd`" && test -f $srcdir/config.status; then { { echo "$as_me:$LINENO: error: source directory already configured; run \"make distclean\" there first" >&5 echo "$as_me: error: source directory already configured; run \"make distclean\" there first" >&2;} { (exit 1); exit 1; }; } fi cat >>confdefs.h <<_ACEOF #define PACKAGE "$PACKAGE" _ACEOF cat >>confdefs.h <<_ACEOF #define VERSION "$VERSION" _ACEOF missing_dir=`cd $ac_aux_dir && pwd` { echo "$as_me:$LINENO: checking for working aclocal-${am__api_version}" >&5 echo $ECHO_N "checking for working aclocal-${am__api_version}... $ECHO_C" >&6; } # Run test in a subshell; some versions of sh will print an error if # an executable is not found, even if stderr is redirected. # Redirect stdin to placate older versions of autoconf. Sigh. if (aclocal-${am__api_version} --version) < /dev/null > /dev/null 2>&1; then ACLOCAL=aclocal-${am__api_version} { echo "$as_me:$LINENO: result: found" >&5 echo "${ECHO_T}found" >&6; } else ACLOCAL="$missing_dir/missing aclocal-${am__api_version}" { echo "$as_me:$LINENO: result: missing" >&5 echo "${ECHO_T}missing" >&6; } fi { echo "$as_me:$LINENO: checking for working autoconf" >&5 echo $ECHO_N "checking for working autoconf... $ECHO_C" >&6; } # Run test in a subshell; some versions of sh will print an error if # an executable is not found, even if stderr is redirected. # Redirect stdin to placate older versions of autoconf. Sigh. if (autoconf --version) < /dev/null > /dev/null 2>&1; then AUTOCONF=autoconf { echo "$as_me:$LINENO: result: found" >&5 echo "${ECHO_T}found" >&6; } else AUTOCONF="$missing_dir/missing autoconf" { echo "$as_me:$LINENO: result: missing" >&5 echo "${ECHO_T}missing" >&6; } fi { echo "$as_me:$LINENO: checking for working automake-${am__api_version}" >&5 echo $ECHO_N "checking for working automake-${am__api_version}... $ECHO_C" >&6; } # Run test in a subshell; some versions of sh will print an error if # an executable is not found, even if stderr is redirected. # Redirect stdin to placate older versions of autoconf. Sigh. if (automake-${am__api_version} --version) < /dev/null > /dev/null 2>&1; then AUTOMAKE=automake-${am__api_version} { echo "$as_me:$LINENO: result: found" >&5 echo "${ECHO_T}found" >&6; } else AUTOMAKE="$missing_dir/missing automake-${am__api_version}" { echo "$as_me:$LINENO: result: missing" >&5 echo "${ECHO_T}missing" >&6; } fi { echo "$as_me:$LINENO: checking for working autoheader" >&5 echo $ECHO_N "checking for working autoheader... $ECHO_C" >&6; } # Run test in a subshell; some versions of sh will print an error if # an executable is not found, even if stderr is redirected. # Redirect stdin to placate older versions of autoconf. Sigh. if (autoheader --version) < /dev/null > /dev/null 2>&1; then AUTOHEADER=autoheader { echo "$as_me:$LINENO: result: found" >&5 echo "${ECHO_T}found" >&6; } else AUTOHEADER="$missing_dir/missing autoheader" { echo "$as_me:$LINENO: result: missing" >&5 echo "${ECHO_T}missing" >&6; } fi { echo "$as_me:$LINENO: checking for working makeinfo" >&5 echo $ECHO_N "checking for working makeinfo... $ECHO_C" >&6; } # Run test in a subshell; some versions of sh will print an error if # an executable is not found, even if stderr is redirected. # Redirect stdin to placate older versions of autoconf. Sigh. if (makeinfo --version) < /dev/null > /dev/null 2>&1; then MAKEINFO=makeinfo { echo "$as_me:$LINENO: result: found" >&5 echo "${ECHO_T}found" >&6; } else MAKEINFO="$missing_dir/missing makeinfo" { echo "$as_me:$LINENO: result: missing" >&5 echo "${ECHO_T}missing" >&6; } fi ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}gcc", so it can be a program name with args. set dummy ${ac_tool_prefix}gcc; ac_word=$2 { echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6; } if test "${ac_cv_prog_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_prog_CC="${ac_tool_prefix}gcc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then { echo "$as_me:$LINENO: result: $CC" >&5 echo "${ECHO_T}$CC" >&6; } else { echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6; } fi fi if test -z "$ac_cv_prog_CC"; then ac_ct_CC=$CC # Extract the first word of "gcc", so it can be a program name with args. set dummy gcc; ac_word=$2 { echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6; } if test "${ac_cv_prog_ac_ct_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_prog_ac_ct_CC="gcc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then { echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 echo "${ECHO_T}$ac_ct_CC" >&6; } else { echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6; } fi if test "x$ac_ct_CC" = x; then CC="" else case $cross_compiling:$ac_tool_warned in yes:) { echo "$as_me:$LINENO: WARNING: In the future, Autoconf will not detect cross-tools whose name does not start with the host triplet. If you think this configuration is useful to you, please write to autoconf@gnu.org." >&5 echo "$as_me: WARNING: In the future, Autoconf will not detect cross-tools whose name does not start with the host triplet. If you think this configuration is useful to you, please write to autoconf@gnu.org." >&2;} ac_tool_warned=yes ;; esac CC=$ac_ct_CC fi else CC="$ac_cv_prog_CC" fi if test -z "$CC"; then if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}cc", so it can be a program name with args. set dummy ${ac_tool_prefix}cc; ac_word=$2 { echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6; } if test "${ac_cv_prog_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_prog_CC="${ac_tool_prefix}cc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then { echo "$as_me:$LINENO: result: $CC" >&5 echo "${ECHO_T}$CC" >&6; } else { echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6; } fi fi fi if test -z "$CC"; then # Extract the first word of "cc", so it can be a program name with args. set dummy cc; ac_word=$2 { echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6; } if test "${ac_cv_prog_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else ac_prog_rejected=no as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then if test "$as_dir/$ac_word$ac_exec_ext" = "/usr/ucb/cc"; then ac_prog_rejected=yes continue fi ac_cv_prog_CC="cc" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS if test $ac_prog_rejected = yes; then # We found a bogon in the path, so make sure we never use it. set dummy $ac_cv_prog_CC shift if test $# != 0; then # We chose a different compiler from the bogus one. # However, it has the same basename, so the bogon will be chosen # first if we set CC to just the basename; use the full file name. shift ac_cv_prog_CC="$as_dir/$ac_word${1+' '}$@" fi fi fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then { echo "$as_me:$LINENO: result: $CC" >&5 echo "${ECHO_T}$CC" >&6; } else { echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6; } fi fi if test -z "$CC"; then if test -n "$ac_tool_prefix"; then for ac_prog in cl.exe do # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. set dummy $ac_tool_prefix$ac_prog; ac_word=$2 { echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6; } if test "${ac_cv_prog_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_prog_CC="$ac_tool_prefix$ac_prog" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then { echo "$as_me:$LINENO: result: $CC" >&5 echo "${ECHO_T}$CC" >&6; } else { echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6; } fi test -n "$CC" && break done fi if test -z "$CC"; then ac_ct_CC=$CC for ac_prog in cl.exe do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 { echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6; } if test "${ac_cv_prog_ac_ct_CC+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_prog_ac_ct_CC="$ac_prog" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then { echo "$as_me:$LINENO: result: $ac_ct_CC" >&5 echo "${ECHO_T}$ac_ct_CC" >&6; } else { echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6; } fi test -n "$ac_ct_CC" && break done if test "x$ac_ct_CC" = x; then CC="" else case $cross_compiling:$ac_tool_warned in yes:) { echo "$as_me:$LINENO: WARNING: In the future, Autoconf will not detect cross-tools whose name does not start with the host triplet. If you think this configuration is useful to you, please write to autoconf@gnu.org." >&5 echo "$as_me: WARNING: In the future, Autoconf will not detect cross-tools whose name does not start with the host triplet. If you think this configuration is useful to you, please write to autoconf@gnu.org." >&2;} ac_tool_warned=yes ;; esac CC=$ac_ct_CC fi fi fi test -z "$CC" && { { echo "$as_me:$LINENO: error: no acceptable C compiler found in \$PATH See \`config.log' for more details." >&5 echo "$as_me: error: no acceptable C compiler found in \$PATH See \`config.log' for more details." >&2;} { (exit 1); exit 1; }; } # Provide some information about the compiler. echo "$as_me:$LINENO: checking for C compiler version" >&5 ac_compiler=`set X $ac_compile; echo $2` { (ac_try="$ac_compiler --version >&5" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_compiler --version >&5") 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } { (ac_try="$ac_compiler -v >&5" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_compiler -v >&5") 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } { (ac_try="$ac_compiler -V >&5" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_compiler -V >&5") 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF ac_clean_files_save=$ac_clean_files ac_clean_files="$ac_clean_files a.out a.exe b.out" # Try to create an executable without -o first, disregard a.out. # It will help us diagnose broken compilers, and finding out an intuition # of exeext. { echo "$as_me:$LINENO: checking for C compiler default output file name" >&5 echo $ECHO_N "checking for C compiler default output file name... $ECHO_C" >&6; } ac_link_default=`echo "$ac_link" | sed 's/ -o *conftest[^ ]*//'` # # List of possible output files, starting from the most likely. # The algorithm is not robust to junk in `.', hence go to wildcards (a.*) # only as a last resort. b.out is created by i960 compilers. ac_files='a_out.exe a.exe conftest.exe a.out conftest a.* conftest.* b.out' # # The IRIX 6 linker writes into existing files which may not be # executable, retaining their permissions. Remove them first so a # subsequent execution test works. ac_rmfiles= for ac_file in $ac_files do case $ac_file in *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.map | *.inf | *.o | *.obj ) ;; * ) ac_rmfiles="$ac_rmfiles $ac_file";; esac done rm -f $ac_rmfiles if { (ac_try="$ac_link_default" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_link_default") 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; then # Autoconf-2.13 could set the ac_cv_exeext variable to `no'. # So ignore a value of `no', otherwise this would lead to `EXEEXT = no' # in a Makefile. We should not override ac_cv_exeext if it was cached, # so that the user can short-circuit this test for compilers unknown to # Autoconf. for ac_file in $ac_files '' do test -f "$ac_file" || continue case $ac_file in *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.map | *.inf | *.o | *.obj ) ;; [ab].out ) # We found the default executable, but exeext='' is most # certainly right. break;; *.* ) if test "${ac_cv_exeext+set}" = set && test "$ac_cv_exeext" != no; then :; else ac_cv_exeext=`expr "$ac_file" : '[^.]*\(\..*\)'` fi # We set ac_cv_exeext here because the later test for it is not # safe: cross compilers may not add the suffix if given an `-o' # argument, so we may need to know it at that point already. # Even if this section looks crufty: it has the advantage of # actually working. break;; * ) break;; esac done test "$ac_cv_exeext" = no && ac_cv_exeext= else ac_file='' fi { echo "$as_me:$LINENO: result: $ac_file" >&5 echo "${ECHO_T}$ac_file" >&6; } if test -z "$ac_file"; then echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 { { echo "$as_me:$LINENO: error: C compiler cannot create executables See \`config.log' for more details." >&5 echo "$as_me: error: C compiler cannot create executables See \`config.log' for more details." >&2;} { (exit 77); exit 77; }; } fi ac_exeext=$ac_cv_exeext # Check that the compiler produces executables we can run. If not, either # the compiler is broken, or we cross compile. { echo "$as_me:$LINENO: checking whether the C compiler works" >&5 echo $ECHO_N "checking whether the C compiler works... $ECHO_C" >&6; } # FIXME: These cross compiler hacks should be removed for Autoconf 3.0 # If not cross compiling, check that we can run a simple program. if test "$cross_compiling" != yes; then if { ac_try='./$ac_file' { (case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_try") 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then cross_compiling=no else if test "$cross_compiling" = maybe; then cross_compiling=yes else { { echo "$as_me:$LINENO: error: cannot run C compiled programs. If you meant to cross compile, use \`--host'. See \`config.log' for more details." >&5 echo "$as_me: error: cannot run C compiled programs. If you meant to cross compile, use \`--host'. See \`config.log' for more details." >&2;} { (exit 1); exit 1; }; } fi fi fi { echo "$as_me:$LINENO: result: yes" >&5 echo "${ECHO_T}yes" >&6; } rm -f a.out a.exe conftest$ac_cv_exeext b.out ac_clean_files=$ac_clean_files_save # Check that the compiler produces executables we can run. If not, either # the compiler is broken, or we cross compile. { echo "$as_me:$LINENO: checking whether we are cross compiling" >&5 echo $ECHO_N "checking whether we are cross compiling... $ECHO_C" >&6; } { echo "$as_me:$LINENO: result: $cross_compiling" >&5 echo "${ECHO_T}$cross_compiling" >&6; } { echo "$as_me:$LINENO: checking for suffix of executables" >&5 echo $ECHO_N "checking for suffix of executables... $ECHO_C" >&6; } if { (ac_try="$ac_link" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_link") 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; then # If both `conftest.exe' and `conftest' are `present' (well, observable) # catch `conftest.exe'. For instance with Cygwin, `ls conftest' will # work properly (i.e., refer to `conftest.exe'), while it won't with # `rm'. for ac_file in conftest.exe conftest conftest.*; do test -f "$ac_file" || continue case $ac_file in *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.map | *.inf | *.o | *.obj ) ;; *.* ) ac_cv_exeext=`expr "$ac_file" : '[^.]*\(\..*\)'` break;; * ) break;; esac done else { { echo "$as_me:$LINENO: error: cannot compute suffix of executables: cannot compile and link See \`config.log' for more details." >&5 echo "$as_me: error: cannot compute suffix of executables: cannot compile and link See \`config.log' for more details." >&2;} { (exit 1); exit 1; }; } fi rm -f conftest$ac_cv_exeext { echo "$as_me:$LINENO: result: $ac_cv_exeext" >&5 echo "${ECHO_T}$ac_cv_exeext" >&6; } rm -f conftest.$ac_ext EXEEXT=$ac_cv_exeext ac_exeext=$EXEEXT { echo "$as_me:$LINENO: checking for suffix of object files" >&5 echo $ECHO_N "checking for suffix of object files... $ECHO_C" >&6; } if test "${ac_cv_objext+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.o conftest.obj if { (ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_compile") 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; then for ac_file in conftest.o conftest.obj conftest.*; do test -f "$ac_file" || continue; case $ac_file in *.$ac_ext | *.xcoff | *.tds | *.d | *.pdb | *.xSYM | *.bb | *.bbg | *.map | *.inf ) ;; *) ac_cv_objext=`expr "$ac_file" : '.*\.\(.*\)'` break;; esac done else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 { { echo "$as_me:$LINENO: error: cannot compute suffix of object files: cannot compile See \`config.log' for more details." >&5 echo "$as_me: error: cannot compute suffix of object files: cannot compile See \`config.log' for more details." >&2;} { (exit 1); exit 1; }; } fi rm -f conftest.$ac_cv_objext conftest.$ac_ext fi { echo "$as_me:$LINENO: result: $ac_cv_objext" >&5 echo "${ECHO_T}$ac_cv_objext" >&6; } OBJEXT=$ac_cv_objext ac_objext=$OBJEXT { echo "$as_me:$LINENO: checking whether we are using the GNU C compiler" >&5 echo $ECHO_N "checking whether we are using the GNU C compiler... $ECHO_C" >&6; } if test "${ac_cv_c_compiler_gnu+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { #ifndef __GNUC__ choke me #endif ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_compile") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err } && test -s conftest.$ac_objext; then ac_compiler_gnu=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_compiler_gnu=no fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext ac_cv_c_compiler_gnu=$ac_compiler_gnu fi { echo "$as_me:$LINENO: result: $ac_cv_c_compiler_gnu" >&5 echo "${ECHO_T}$ac_cv_c_compiler_gnu" >&6; } GCC=`test $ac_compiler_gnu = yes && echo yes` ac_test_CFLAGS=${CFLAGS+set} ac_save_CFLAGS=$CFLAGS { echo "$as_me:$LINENO: checking whether $CC accepts -g" >&5 echo $ECHO_N "checking whether $CC accepts -g... $ECHO_C" >&6; } if test "${ac_cv_prog_cc_g+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_save_c_werror_flag=$ac_c_werror_flag ac_c_werror_flag=yes ac_cv_prog_cc_g=no CFLAGS="-g" cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_compile") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err } && test -s conftest.$ac_objext; then ac_cv_prog_cc_g=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 CFLAGS="" cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_compile") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err } && test -s conftest.$ac_objext; then : else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_c_werror_flag=$ac_save_c_werror_flag CFLAGS="-g" cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_compile") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err } && test -s conftest.$ac_objext; then ac_cv_prog_cc_g=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext ac_c_werror_flag=$ac_save_c_werror_flag fi { echo "$as_me:$LINENO: result: $ac_cv_prog_cc_g" >&5 echo "${ECHO_T}$ac_cv_prog_cc_g" >&6; } if test "$ac_test_CFLAGS" = set; then CFLAGS=$ac_save_CFLAGS elif test $ac_cv_prog_cc_g = yes; then if test "$GCC" = yes; then CFLAGS="-g -O2" else CFLAGS="-g" fi else if test "$GCC" = yes; then CFLAGS="-O2" else CFLAGS= fi fi { echo "$as_me:$LINENO: checking for $CC option to accept ISO C89" >&5 echo $ECHO_N "checking for $CC option to accept ISO C89... $ECHO_C" >&6; } if test "${ac_cv_prog_cc_c89+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_cv_prog_cc_c89=no ac_save_CC=$CC cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include #include #include #include /* Most of the following tests are stolen from RCS 5.7's src/conf.sh. */ struct buf { int x; }; FILE * (*rcsopen) (struct buf *, struct stat *, int); static char *e (p, i) char **p; int i; { return p[i]; } static char *f (char * (*g) (char **, int), char **p, ...) { char *s; va_list v; va_start (v,p); s = g (p, va_arg (v,int)); va_end (v); return s; } /* OSF 4.0 Compaq cc is some sort of almost-ANSI by default. It has function prototypes and stuff, but not '\xHH' hex character constants. These don't provoke an error unfortunately, instead are silently treated as 'x'. The following induces an error, until -std is added to get proper ANSI mode. Curiously '\x00'!='x' always comes out true, for an array size at least. It's necessary to write '\x00'==0 to get something that's true only with -std. */ int osf4_cc_array ['\x00' == 0 ? 1 : -1]; /* IBM C 6 for AIX is almost-ANSI by default, but it replaces macro parameters inside strings and character constants. */ #define FOO(x) 'x' int xlc6_cc_array[FOO(a) == 'x' ? 1 : -1]; int test (int i, double x); struct s1 {int (*f) (int a);}; struct s2 {int (*f) (double a);}; int pairnames (int, char **, FILE *(*)(struct buf *, struct stat *, int), int, int); int argc; char **argv; int main () { return f (e, argv, 0) != argv[0] || f (e, argv, 1) != argv[1]; ; return 0; } _ACEOF for ac_arg in '' -qlanglvl=extc89 -qlanglvl=ansi -std \ -Ae "-Aa -D_HPUX_SOURCE" "-Xc -D__EXTENSIONS__" do CC="$ac_save_CC $ac_arg" rm -f conftest.$ac_objext if { (ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_compile") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err } && test -s conftest.$ac_objext; then ac_cv_prog_cc_c89=$ac_arg else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f core conftest.err conftest.$ac_objext test "x$ac_cv_prog_cc_c89" != "xno" && break done rm -f conftest.$ac_ext CC=$ac_save_CC fi # AC_CACHE_VAL case "x$ac_cv_prog_cc_c89" in x) { echo "$as_me:$LINENO: result: none needed" >&5 echo "${ECHO_T}none needed" >&6; } ;; xno) { echo "$as_me:$LINENO: result: unsupported" >&5 echo "${ECHO_T}unsupported" >&6; } ;; *) CC="$CC $ac_cv_prog_cc_c89" { echo "$as_me:$LINENO: result: $ac_cv_prog_cc_c89" >&5 echo "${ECHO_T}$ac_cv_prog_cc_c89" >&6; } ;; esac ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu # Find a good install program. We prefer a C program (faster), # so one script is as good as another. But avoid the broken or # incompatible versions: # SysV /etc/install, /usr/sbin/install # SunOS /usr/etc/install # IRIX /sbin/install # AIX /bin/install # AmigaOS /C/install, which installs bootblocks on floppy discs # AIX 4 /usr/bin/installbsd, which doesn't work without a -g flag # AFS /usr/afsws/bin/install, which mishandles nonexistent args # SVR4 /usr/ucb/install, which tries to use the nonexistent group "staff" # OS/2's system install, which has a completely different semantic # ./install, which can be erroneously created by make from ./install.sh. { echo "$as_me:$LINENO: checking for a BSD-compatible install" >&5 echo $ECHO_N "checking for a BSD-compatible install... $ECHO_C" >&6; } if test -z "$INSTALL"; then if test "${ac_cv_path_install+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. # Account for people who put trailing slashes in PATH elements. case $as_dir/ in ./ | .// | /cC/* | \ /etc/* | /usr/sbin/* | /usr/etc/* | /sbin/* | /usr/afsws/bin/* | \ ?:\\/os2\\/install\\/* | ?:\\/OS2\\/INSTALL\\/* | \ /usr/ucb/* ) ;; *) # OSF1 and SCO ODT 3.0 have their own names for install. # Don't use installbsd from OSF since it installs stuff as root # by default. for ac_prog in ginstall scoinst install; do for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_prog$ac_exec_ext" && $as_test_x "$as_dir/$ac_prog$ac_exec_ext"; }; then if test $ac_prog = install && grep dspmsg "$as_dir/$ac_prog$ac_exec_ext" >/dev/null 2>&1; then # AIX install. It has an incompatible calling convention. : elif test $ac_prog = install && grep pwplus "$as_dir/$ac_prog$ac_exec_ext" >/dev/null 2>&1; then # program-specific install script used by HP pwplus--don't use. : else ac_cv_path_install="$as_dir/$ac_prog$ac_exec_ext -c" break 3 fi fi done done ;; esac done IFS=$as_save_IFS fi if test "${ac_cv_path_install+set}" = set; then INSTALL=$ac_cv_path_install else # As a last resort, use the slow shell script. Don't cache a # value for INSTALL within a source directory, because that will # break other packages using the cache if that directory is # removed, or if the value is a relative name. INSTALL=$ac_install_sh fi fi { echo "$as_me:$LINENO: result: $INSTALL" >&5 echo "${ECHO_T}$INSTALL" >&6; } # Use test -z because SunOS4 sh mishandles braces in ${var-val}. # It thinks the first close brace ends the variable substitution. test -z "$INSTALL_PROGRAM" && INSTALL_PROGRAM='${INSTALL}' test -z "$INSTALL_SCRIPT" && INSTALL_SCRIPT='${INSTALL}' test -z "$INSTALL_DATA" && INSTALL_DATA='${INSTALL} -m 644' { echo "$as_me:$LINENO: checking whether ${MAKE-make} sets \$(MAKE)" >&5 echo $ECHO_N "checking whether ${MAKE-make} sets \$(MAKE)... $ECHO_C" >&6; } set x ${MAKE-make}; ac_make=`echo "$2" | sed 's/+/p/g; s/[^a-zA-Z0-9_]/_/g'` if { as_var=ac_cv_prog_make_${ac_make}_set; eval "test \"\${$as_var+set}\" = set"; }; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.make <<\_ACEOF SHELL = /bin/sh all: @echo '@@@%%%=$(MAKE)=@@@%%%' _ACEOF # GNU make sometimes prints "make[1]: Entering...", which would confuse us. case `${MAKE-make} -f conftest.make 2>/dev/null` in *@@@%%%=?*=@@@%%%*) eval ac_cv_prog_make_${ac_make}_set=yes;; *) eval ac_cv_prog_make_${ac_make}_set=no;; esac rm -f conftest.make fi if eval test \$ac_cv_prog_make_${ac_make}_set = yes; then { echo "$as_me:$LINENO: result: yes" >&5 echo "${ECHO_T}yes" >&6; } SET_MAKE= else { echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6; } SET_MAKE="MAKE=${MAKE-make}" fi # AM_PROG_LIBTOOL # OTHER STUFF { echo "$as_me:$LINENO: checking for an ANSI C-conforming const" >&5 echo $ECHO_N "checking for an ANSI C-conforming const... $ECHO_C" >&6; } if test "${ac_cv_c_const+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ int main () { /* FIXME: Include the comments suggested by Paul. */ #ifndef __cplusplus /* Ultrix mips cc rejects this. */ typedef int charset[2]; const charset cs; /* SunOS 4.1.1 cc rejects this. */ char const *const *pcpcc; char **ppc; /* NEC SVR4.0.2 mips cc rejects this. */ struct point {int x, y;}; static struct point const zero = {0,0}; /* AIX XL C 1.02.0.0 rejects this. It does not let you subtract one const X* pointer from another in an arm of an if-expression whose if-part is not a constant expression */ const char *g = "string"; pcpcc = &g + (g ? g-g : 0); /* HPUX 7.0 cc rejects these. */ ++pcpcc; ppc = (char**) pcpcc; pcpcc = (char const *const *) ppc; { /* SCO 3.2v4 cc rejects this. */ char *t; char const *s = 0 ? (char *) 0 : (char const *) 0; *t++ = 0; if (s) return 0; } { /* Someone thinks the Sun supposedly-ANSI compiler will reject this. */ int x[] = {25, 17}; const int *foo = &x[0]; ++foo; } { /* Sun SC1.0 ANSI compiler rejects this -- but not the above. */ typedef const int *iptr; iptr p = 0; ++p; } { /* AIX XL C 1.02.0.0 rejects this saying "k.c", line 2.27: 1506-025 (S) Operand must be a modifiable lvalue. */ struct s { int j; const int *ap[3]; }; struct s *b; b->j = 5; } { /* ULTRIX-32 V3.1 (Rev 9) vcc rejects this */ const int foo = 10; if (!foo) return 0; } return !cs[0] && !zero.x; #endif ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_compile") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err } && test -s conftest.$ac_objext; then ac_cv_c_const=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_c_const=no fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext fi { echo "$as_me:$LINENO: result: $ac_cv_c_const" >&5 echo "${ECHO_T}$ac_cv_c_const" >&6; } if test $ac_cv_c_const = no; then cat >>confdefs.h <<\_ACEOF #define const _ACEOF fi { echo "$as_me:$LINENO: checking for inline" >&5 echo $ECHO_N "checking for inline... $ECHO_C" >&6; } if test "${ac_cv_c_inline+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_cv_c_inline=no for ac_kw in inline __inline__ __inline; do cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #ifndef __cplusplus typedef int foo_t; static $ac_kw foo_t static_foo () {return 0; } $ac_kw foo_t foo () {return 0; } #endif _ACEOF rm -f conftest.$ac_objext if { (ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_compile") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err } && test -s conftest.$ac_objext; then ac_cv_c_inline=$ac_kw else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext test "$ac_cv_c_inline" != no && break done fi { echo "$as_me:$LINENO: result: $ac_cv_c_inline" >&5 echo "${ECHO_T}$ac_cv_c_inline" >&6; } case $ac_cv_c_inline in inline | yes) ;; *) case $ac_cv_c_inline in no) ac_val=;; *) ac_val=$ac_cv_c_inline;; esac cat >>confdefs.h <<_ACEOF #ifndef __cplusplus #define inline $ac_val #endif _ACEOF ;; esac ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu { echo "$as_me:$LINENO: checking how to run the C preprocessor" >&5 echo $ECHO_N "checking how to run the C preprocessor... $ECHO_C" >&6; } # On Suns, sometimes $CPP names a directory. if test -n "$CPP" && test -d "$CPP"; then CPP= fi if test -z "$CPP"; then if test "${ac_cv_prog_CPP+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else # Double quotes because CPP needs to be expanded for CPP in "$CC -E" "$CC -E -traditional-cpp" "/lib/cpp" do ac_preproc_ok=false for ac_c_preproc_warn_flag in '' yes do # Use a header file that comes with gcc, so configuring glibc # with a fresh cross-compiler works. # Prefer to if __STDC__ is defined, since # exists even on freestanding compilers. # On the NeXT, cc -E runs the code through the compiler's parser, # not just through cpp. "Syntax error" is here to catch this case. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #ifdef __STDC__ # include #else # include #endif Syntax error _ACEOF if { (ac_try="$ac_cpp conftest.$ac_ext" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_cpp conftest.$ac_ext") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null && { test -z "$ac_c_preproc_warn_flag$ac_c_werror_flag" || test ! -s conftest.err }; then : else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 # Broken: fails on valid input. continue fi rm -f conftest.err conftest.$ac_ext # OK, works on sane cases. Now check whether nonexistent headers # can be detected and how. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include _ACEOF if { (ac_try="$ac_cpp conftest.$ac_ext" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_cpp conftest.$ac_ext") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null && { test -z "$ac_c_preproc_warn_flag$ac_c_werror_flag" || test ! -s conftest.err }; then # Broken: success on invalid input. continue else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 # Passes both tests. ac_preproc_ok=: break fi rm -f conftest.err conftest.$ac_ext done # Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped. rm -f conftest.err conftest.$ac_ext if $ac_preproc_ok; then break fi done ac_cv_prog_CPP=$CPP fi CPP=$ac_cv_prog_CPP else ac_cv_prog_CPP=$CPP fi { echo "$as_me:$LINENO: result: $CPP" >&5 echo "${ECHO_T}$CPP" >&6; } ac_preproc_ok=false for ac_c_preproc_warn_flag in '' yes do # Use a header file that comes with gcc, so configuring glibc # with a fresh cross-compiler works. # Prefer to if __STDC__ is defined, since # exists even on freestanding compilers. # On the NeXT, cc -E runs the code through the compiler's parser, # not just through cpp. "Syntax error" is here to catch this case. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #ifdef __STDC__ # include #else # include #endif Syntax error _ACEOF if { (ac_try="$ac_cpp conftest.$ac_ext" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_cpp conftest.$ac_ext") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null && { test -z "$ac_c_preproc_warn_flag$ac_c_werror_flag" || test ! -s conftest.err }; then : else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 # Broken: fails on valid input. continue fi rm -f conftest.err conftest.$ac_ext # OK, works on sane cases. Now check whether nonexistent headers # can be detected and how. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include _ACEOF if { (ac_try="$ac_cpp conftest.$ac_ext" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_cpp conftest.$ac_ext") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null && { test -z "$ac_c_preproc_warn_flag$ac_c_werror_flag" || test ! -s conftest.err }; then # Broken: success on invalid input. continue else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 # Passes both tests. ac_preproc_ok=: break fi rm -f conftest.err conftest.$ac_ext done # Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped. rm -f conftest.err conftest.$ac_ext if $ac_preproc_ok; then : else { { echo "$as_me:$LINENO: error: C preprocessor \"$CPP\" fails sanity check See \`config.log' for more details." >&5 echo "$as_me: error: C preprocessor \"$CPP\" fails sanity check See \`config.log' for more details." >&2;} { (exit 1); exit 1; }; } fi ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu { echo "$as_me:$LINENO: checking for grep that handles long lines and -e" >&5 echo $ECHO_N "checking for grep that handles long lines and -e... $ECHO_C" >&6; } if test "${ac_cv_path_GREP+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else # Extract the first word of "grep ggrep" to use in msg output if test -z "$GREP"; then set dummy grep ggrep; ac_prog_name=$2 if test "${ac_cv_path_GREP+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_path_GREP_found=false # Loop through the user's path and test for each of PROGNAME-LIST as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_prog in grep ggrep; do for ac_exec_ext in '' $ac_executable_extensions; do ac_path_GREP="$as_dir/$ac_prog$ac_exec_ext" { test -f "$ac_path_GREP" && $as_test_x "$ac_path_GREP"; } || continue # Check for GNU ac_path_GREP and select it if it is found. # Check for GNU $ac_path_GREP case `"$ac_path_GREP" --version 2>&1` in *GNU*) ac_cv_path_GREP="$ac_path_GREP" ac_path_GREP_found=:;; *) ac_count=0 echo $ECHO_N "0123456789$ECHO_C" >"conftest.in" while : do cat "conftest.in" "conftest.in" >"conftest.tmp" mv "conftest.tmp" "conftest.in" cp "conftest.in" "conftest.nl" echo 'GREP' >> "conftest.nl" "$ac_path_GREP" -e 'GREP$' -e '-(cannot match)-' < "conftest.nl" >"conftest.out" 2>/dev/null || break diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break ac_count=`expr $ac_count + 1` if test $ac_count -gt ${ac_path_GREP_max-0}; then # Best one so far, save it but keep looking for a better one ac_cv_path_GREP="$ac_path_GREP" ac_path_GREP_max=$ac_count fi # 10*(2^10) chars as input seems more than enough test $ac_count -gt 10 && break done rm -f conftest.in conftest.tmp conftest.nl conftest.out;; esac $ac_path_GREP_found && break 3 done done done IFS=$as_save_IFS fi GREP="$ac_cv_path_GREP" if test -z "$GREP"; then { { echo "$as_me:$LINENO: error: no acceptable $ac_prog_name could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" >&5 echo "$as_me: error: no acceptable $ac_prog_name could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" >&2;} { (exit 1); exit 1; }; } fi else ac_cv_path_GREP=$GREP fi fi { echo "$as_me:$LINENO: result: $ac_cv_path_GREP" >&5 echo "${ECHO_T}$ac_cv_path_GREP" >&6; } GREP="$ac_cv_path_GREP" { echo "$as_me:$LINENO: checking for egrep" >&5 echo $ECHO_N "checking for egrep... $ECHO_C" >&6; } if test "${ac_cv_path_EGREP+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if echo a | $GREP -E '(a|b)' >/dev/null 2>&1 then ac_cv_path_EGREP="$GREP -E" else # Extract the first word of "egrep" to use in msg output if test -z "$EGREP"; then set dummy egrep; ac_prog_name=$2 if test "${ac_cv_path_EGREP+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_path_EGREP_found=false # Loop through the user's path and test for each of PROGNAME-LIST as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_prog in egrep; do for ac_exec_ext in '' $ac_executable_extensions; do ac_path_EGREP="$as_dir/$ac_prog$ac_exec_ext" { test -f "$ac_path_EGREP" && $as_test_x "$ac_path_EGREP"; } || continue # Check for GNU ac_path_EGREP and select it if it is found. # Check for GNU $ac_path_EGREP case `"$ac_path_EGREP" --version 2>&1` in *GNU*) ac_cv_path_EGREP="$ac_path_EGREP" ac_path_EGREP_found=:;; *) ac_count=0 echo $ECHO_N "0123456789$ECHO_C" >"conftest.in" while : do cat "conftest.in" "conftest.in" >"conftest.tmp" mv "conftest.tmp" "conftest.in" cp "conftest.in" "conftest.nl" echo 'EGREP' >> "conftest.nl" "$ac_path_EGREP" 'EGREP$' < "conftest.nl" >"conftest.out" 2>/dev/null || break diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break ac_count=`expr $ac_count + 1` if test $ac_count -gt ${ac_path_EGREP_max-0}; then # Best one so far, save it but keep looking for a better one ac_cv_path_EGREP="$ac_path_EGREP" ac_path_EGREP_max=$ac_count fi # 10*(2^10) chars as input seems more than enough test $ac_count -gt 10 && break done rm -f conftest.in conftest.tmp conftest.nl conftest.out;; esac $ac_path_EGREP_found && break 3 done done done IFS=$as_save_IFS fi EGREP="$ac_cv_path_EGREP" if test -z "$EGREP"; then { { echo "$as_me:$LINENO: error: no acceptable $ac_prog_name could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" >&5 echo "$as_me: error: no acceptable $ac_prog_name could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" >&2;} { (exit 1); exit 1; }; } fi else ac_cv_path_EGREP=$EGREP fi fi fi { echo "$as_me:$LINENO: result: $ac_cv_path_EGREP" >&5 echo "${ECHO_T}$ac_cv_path_EGREP" >&6; } EGREP="$ac_cv_path_EGREP" { echo "$as_me:$LINENO: checking for ANSI C header files" >&5 echo $ECHO_N "checking for ANSI C header files... $ECHO_C" >&6; } if test "${ac_cv_header_stdc+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include #include #include #include int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_compile") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err } && test -s conftest.$ac_objext; then ac_cv_header_stdc=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_cv_header_stdc=no fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext if test $ac_cv_header_stdc = yes; then # SunOS 4.x string.h does not declare mem*, contrary to ANSI. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include _ACEOF if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | $EGREP "memchr" >/dev/null 2>&1; then : else ac_cv_header_stdc=no fi rm -f conftest* fi if test $ac_cv_header_stdc = yes; then # ISC 2.0.2 stdlib.h does not declare free, contrary to ANSI. cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include _ACEOF if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | $EGREP "free" >/dev/null 2>&1; then : else ac_cv_header_stdc=no fi rm -f conftest* fi if test $ac_cv_header_stdc = yes; then # /bin/cc in Irix-4.0.5 gets non-ANSI ctype macros unless using -ansi. if test "$cross_compiling" = yes; then : else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include #include #if ((' ' & 0x0FF) == 0x020) # define ISLOWER(c) ('a' <= (c) && (c) <= 'z') # define TOUPPER(c) (ISLOWER(c) ? 'A' + ((c) - 'a') : (c)) #else # define ISLOWER(c) \ (('a' <= (c) && (c) <= 'i') \ || ('j' <= (c) && (c) <= 'r') \ || ('s' <= (c) && (c) <= 'z')) # define TOUPPER(c) (ISLOWER(c) ? ((c) | 0x40) : (c)) #endif #define XOR(e, f) (((e) && !(f)) || (!(e) && (f))) int main () { int i; for (i = 0; i < 256; i++) if (XOR (islower (i), ISLOWER (i)) || toupper (i) != TOUPPER (i)) return 2; return 0; } _ACEOF rm -f conftest$ac_exeext if { (ac_try="$ac_link" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_link") 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { ac_try='./conftest$ac_exeext' { (case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_try") 2>&5 ac_status=$? echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); }; }; then : else echo "$as_me: program exited with status $ac_status" >&5 echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ( exit $ac_status ) ac_cv_header_stdc=no fi rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext conftest.$ac_objext conftest.$ac_ext fi fi fi { echo "$as_me:$LINENO: result: $ac_cv_header_stdc" >&5 echo "${ECHO_T}$ac_cv_header_stdc" >&6; } if test $ac_cv_header_stdc = yes; then cat >>confdefs.h <<\_ACEOF #define STDC_HEADERS 1 _ACEOF fi # CHECKS FOR stdlib.h stdarg.h string.h and float.h # On IRIX 5.3, sys/types and inttypes.h are conflicting. for ac_header in sys/types.h sys/stat.h stdlib.h string.h memory.h strings.h \ inttypes.h stdint.h unistd.h do as_ac_Header=`echo "ac_cv_header_$ac_header" | $as_tr_sh` { echo "$as_me:$LINENO: checking for $ac_header" >&5 echo $ECHO_N "checking for $ac_header... $ECHO_C" >&6; } if { as_var=$as_ac_Header; eval "test \"\${$as_var+set}\" = set"; }; then echo $ECHO_N "(cached) $ECHO_C" >&6 else cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_includes_default #include <$ac_header> _ACEOF rm -f conftest.$ac_objext if { (ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_compile") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err } && test -s conftest.$ac_objext; then eval "$as_ac_Header=yes" else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 eval "$as_ac_Header=no" fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext fi ac_res=`eval echo '${'$as_ac_Header'}'` { echo "$as_me:$LINENO: result: $ac_res" >&5 echo "${ECHO_T}$ac_res" >&6; } if test `eval echo '${'$as_ac_Header'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_header" | $as_tr_cpp` 1 _ACEOF fi done for ac_header in limits.h errno.h math.h strings.h ctype.h \ sys/stat.h sys/types.h time.h unistd.h do as_ac_Header=`echo "ac_cv_header_$ac_header" | $as_tr_sh` if { as_var=$as_ac_Header; eval "test \"\${$as_var+set}\" = set"; }; then { echo "$as_me:$LINENO: checking for $ac_header" >&5 echo $ECHO_N "checking for $ac_header... $ECHO_C" >&6; } if { as_var=$as_ac_Header; eval "test \"\${$as_var+set}\" = set"; }; then echo $ECHO_N "(cached) $ECHO_C" >&6 fi ac_res=`eval echo '${'$as_ac_Header'}'` { echo "$as_me:$LINENO: result: $ac_res" >&5 echo "${ECHO_T}$ac_res" >&6; } else # Is the header compilable? { echo "$as_me:$LINENO: checking $ac_header usability" >&5 echo $ECHO_N "checking $ac_header usability... $ECHO_C" >&6; } cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ $ac_includes_default #include <$ac_header> _ACEOF rm -f conftest.$ac_objext if { (ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_compile") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err } && test -s conftest.$ac_objext; then ac_header_compiler=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_header_compiler=no fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext { echo "$as_me:$LINENO: result: $ac_header_compiler" >&5 echo "${ECHO_T}$ac_header_compiler" >&6; } # Is the header present? { echo "$as_me:$LINENO: checking $ac_header presence" >&5 echo $ECHO_N "checking $ac_header presence... $ECHO_C" >&6; } cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include <$ac_header> _ACEOF if { (ac_try="$ac_cpp conftest.$ac_ext" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_cpp conftest.$ac_ext") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } >/dev/null && { test -z "$ac_c_preproc_warn_flag$ac_c_werror_flag" || test ! -s conftest.err }; then ac_header_preproc=yes else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_header_preproc=no fi rm -f conftest.err conftest.$ac_ext { echo "$as_me:$LINENO: result: $ac_header_preproc" >&5 echo "${ECHO_T}$ac_header_preproc" >&6; } # So? What about this header? case $ac_header_compiler:$ac_header_preproc:$ac_c_preproc_warn_flag in yes:no: ) { echo "$as_me:$LINENO: WARNING: $ac_header: accepted by the compiler, rejected by the preprocessor!" >&5 echo "$as_me: WARNING: $ac_header: accepted by the compiler, rejected by the preprocessor!" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: proceeding with the compiler's result" >&5 echo "$as_me: WARNING: $ac_header: proceeding with the compiler's result" >&2;} ac_header_preproc=yes ;; no:yes:* ) { echo "$as_me:$LINENO: WARNING: $ac_header: present but cannot be compiled" >&5 echo "$as_me: WARNING: $ac_header: present but cannot be compiled" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: check for missing prerequisite headers?" >&5 echo "$as_me: WARNING: $ac_header: check for missing prerequisite headers?" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: see the Autoconf documentation" >&5 echo "$as_me: WARNING: $ac_header: see the Autoconf documentation" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: section \"Present But Cannot Be Compiled\"" >&5 echo "$as_me: WARNING: $ac_header: section \"Present But Cannot Be Compiled\"" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: proceeding with the preprocessor's result" >&5 echo "$as_me: WARNING: $ac_header: proceeding with the preprocessor's result" >&2;} { echo "$as_me:$LINENO: WARNING: $ac_header: in the future, the compiler will take precedence" >&5 echo "$as_me: WARNING: $ac_header: in the future, the compiler will take precedence" >&2;} ;; esac { echo "$as_me:$LINENO: checking for $ac_header" >&5 echo $ECHO_N "checking for $ac_header... $ECHO_C" >&6; } if { as_var=$as_ac_Header; eval "test \"\${$as_var+set}\" = set"; }; then echo $ECHO_N "(cached) $ECHO_C" >&6 else eval "$as_ac_Header=\$ac_header_preproc" fi ac_res=`eval echo '${'$as_ac_Header'}'` { echo "$as_me:$LINENO: result: $ac_res" >&5 echo "${ECHO_T}$ac_res" >&6; } fi if test `eval echo '${'$as_ac_Header'}'` = yes; then cat >>confdefs.h <<_ACEOF #define `echo "HAVE_$ac_header" | $as_tr_cpp` 1 _ACEOF fi done # SET host (REQUIRES config.guess AND config.sub) # Make sure we can run config.sub. $SHELL "$ac_aux_dir/config.sub" sun4 >/dev/null 2>&1 || { { echo "$as_me:$LINENO: error: cannot run $SHELL $ac_aux_dir/config.sub" >&5 echo "$as_me: error: cannot run $SHELL $ac_aux_dir/config.sub" >&2;} { (exit 1); exit 1; }; } { echo "$as_me:$LINENO: checking build system type" >&5 echo $ECHO_N "checking build system type... $ECHO_C" >&6; } if test "${ac_cv_build+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else ac_build_alias=$build_alias test "x$ac_build_alias" = x && ac_build_alias=`$SHELL "$ac_aux_dir/config.guess"` test "x$ac_build_alias" = x && { { echo "$as_me:$LINENO: error: cannot guess build type; you must specify one" >&5 echo "$as_me: error: cannot guess build type; you must specify one" >&2;} { (exit 1); exit 1; }; } ac_cv_build=`$SHELL "$ac_aux_dir/config.sub" $ac_build_alias` || { { echo "$as_me:$LINENO: error: $SHELL $ac_aux_dir/config.sub $ac_build_alias failed" >&5 echo "$as_me: error: $SHELL $ac_aux_dir/config.sub $ac_build_alias failed" >&2;} { (exit 1); exit 1; }; } fi { echo "$as_me:$LINENO: result: $ac_cv_build" >&5 echo "${ECHO_T}$ac_cv_build" >&6; } case $ac_cv_build in *-*-*) ;; *) { { echo "$as_me:$LINENO: error: invalid value of canonical build" >&5 echo "$as_me: error: invalid value of canonical build" >&2;} { (exit 1); exit 1; }; };; esac build=$ac_cv_build ac_save_IFS=$IFS; IFS='-' set x $ac_cv_build shift build_cpu=$1 build_vendor=$2 shift; shift # Remember, the first character of IFS is used to create $*, # except with old shells: build_os=$* IFS=$ac_save_IFS case $build_os in *\ *) build_os=`echo "$build_os" | sed 's/ /-/g'`;; esac { echo "$as_me:$LINENO: checking host system type" >&5 echo $ECHO_N "checking host system type... $ECHO_C" >&6; } if test "${ac_cv_host+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else if test "x$host_alias" = x; then ac_cv_host=$ac_cv_build else ac_cv_host=`$SHELL "$ac_aux_dir/config.sub" $host_alias` || { { echo "$as_me:$LINENO: error: $SHELL $ac_aux_dir/config.sub $host_alias failed" >&5 echo "$as_me: error: $SHELL $ac_aux_dir/config.sub $host_alias failed" >&2;} { (exit 1); exit 1; }; } fi fi { echo "$as_me:$LINENO: result: $ac_cv_host" >&5 echo "${ECHO_T}$ac_cv_host" >&6; } case $ac_cv_host in *-*-*) ;; *) { { echo "$as_me:$LINENO: error: invalid value of canonical host" >&5 echo "$as_me: error: invalid value of canonical host" >&2;} { (exit 1); exit 1; }; };; esac host=$ac_cv_host ac_save_IFS=$IFS; IFS='-' set x $ac_cv_host shift host_cpu=$1 host_vendor=$2 shift; shift # Remember, the first character of IFS is used to create $*, # except with old shells: host_os=$* IFS=$ac_save_IFS case $host_os in *\ *) host_os=`echo "$host_os" | sed 's/ /-/g'`;; esac echo "Set host to $host" # AVOID autoconf WARNINGS ABOUT --datarootdir custom_guint64_format="llu" # TRY TO GET TO BUILD ON ALL PLATFORMS BY DEFAULT echo ""$host case $host in *-apple-darwin*) echo "Customising build for OSX" C4_AR_INIT="csq" C4_AR_APPEND="sq" ;; *-dec-osf*) echo "Customising build for OSF" C4_AR="/bin/ar" C4_AR_INIT="czr" C4_AR_APPEND="zr" # use the TRU64 C compiler CC="cc" GCC= # No longer true (prevents GCC specific CFLAGS being set) CFLAGS="-O3 -arch ev6 -fast" ;; *-irix*) echo "Customising build for IRIX" C4_AR_INIT="cq" C4_AR_APPEND="q" ;; *-solaris*) echo "Customising build for Solaris" C4_AR_INIT="cq" C4_AR_APPEND="q" ;; x86_64-*) echo "Customising build for X86_64" custom_guint64_format="lu" ;; *) echo "*** Unknown build platform: $host ***" ;; esac # SET SOURCE_ROOT_DIR (REQUIRED FOR viterbi.h) source_root_dir=`pwd` echo "Set source_root_dir to $source_root_dir" echo "----------------------------------------------------------------" # CHECK FOR socklen_t # THIS IS TO FIX PROBLEMS WHERE socklen_t ISN'T DEFINED (eg. ON OSF) # { echo "$as_me:$LINENO: checking for socklen_t" >&5 echo $ECHO_N "checking for socklen_t... $ECHO_C" >&6; } cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include #include socklen_t x; int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_compile") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err } && test -s conftest.$ac_objext; then { echo "$as_me:$LINENO: result: yes" >&5 echo "${ECHO_T}yes" >&6; } else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 cat >conftest.$ac_ext <<_ACEOF /* confdefs.h. */ _ACEOF cat confdefs.h >>conftest.$ac_ext cat >>conftest.$ac_ext <<_ACEOF /* end confdefs.h. */ #include #include int accept(int, struct sockaddr *, size_t *); int main () { ; return 0; } _ACEOF rm -f conftest.$ac_objext if { (ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval "echo \"\$as_me:$LINENO: $ac_try_echo\"") >&5 (eval "$ac_compile") 2>conftest.er1 ac_status=$? grep -v '^ *+' conftest.er1 >conftest.err rm -f conftest.er1 cat conftest.err >&5 echo "$as_me:$LINENO: \$? = $ac_status" >&5 (exit $ac_status); } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err } && test -s conftest.$ac_objext; then { echo "$as_me:$LINENO: result: size_t" >&5 echo "${ECHO_T}size_t" >&6; } cat >>confdefs.h <<\_ACEOF #define socklen_t size_t _ACEOF else echo "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 { echo "$as_me:$LINENO: result: int" >&5 echo "${ECHO_T}int" >&6; } cat >>confdefs.h <<\_ACEOF #define socklen_t int _ACEOF fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext fi rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext # ALLOW EITHER GLIB-1 OR GLIB-2 TO BE USED # # IT SEEMS TRICKY TO ALLOW COMPILATION AGAINST BOTH GLIB-1 AND GLIB-2 # THAT WILL WORK FOR ONE LIBRARY WHEN THE OTHER IS ABSENT ... # # ... WHAT FOLLOWS IS FAR FROM IDEAL, BUT SEEMS TO WORK. # ANY SUGGESTIONS FOR A BETTER WAY OF DOING THIS WILL BE WELCOME. # # Check whether --enable-glib2 was given. if test "${enable_glib2+set}" = set; then enableval=$enable_glib2; enable_glib2="$enableval" else enable_glib2=yes fi if test "$enable_glib2" = yes; then # AM_PATH_GLIB_2_0(2.0.0, # [LIBS="$LIBS $GLIB_LIBS" CFLAGS="$CFLAGS $GLIB_CFLAGS"], # AC_MSG_ERROR(Cannot find GLIB2: Is pkg-config in path?)) # PKG_CHECK_MODULES(GLIB, [glib-2.0], [:], [:]) # Extract the first word of "pkg-config", so it can be a program name with args. set dummy pkg-config; ac_word=$2 { echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6; } if test "${ac_cv_path_PKG_CONFIG+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else case $PKG_CONFIG in [\\/]* | ?:[\\/]*) ac_cv_path_PKG_CONFIG="$PKG_CONFIG" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_path_PKG_CONFIG="$as_dir/$ac_word$ac_exec_ext" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS test -z "$ac_cv_path_PKG_CONFIG" && ac_cv_path_PKG_CONFIG="no" ;; esac fi PKG_CONFIG=$ac_cv_path_PKG_CONFIG if test -n "$PKG_CONFIG"; then { echo "$as_me:$LINENO: result: $PKG_CONFIG" >&5 echo "${ECHO_T}$PKG_CONFIG" >&6; } else { echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6; } fi if test "$PKG_CONFIG" = no; then echo "ERROR: Could not find pkg-config ... is glib-2 installed ???" exit 1 fi echo "Using GLIB-2" glib_cflags=`pkg-config --cflags glib-2.0` glib_libs=`pkg-config --libs glib-2.0` CFLAGS="$CFLAGS $glib_cflags" LIBS="$LIBS $glib_libs" elif test "$enable_glib2" = no; then echo "Using GLIB-1" # AM_PATH_GLIB([1.2.0], # [LIBS="$LIBS $GLIB_LIBS" CFLAGS="$CFLAGS $GLIB_CFLAGS"], # AC_MSG_ERROR(Cannot find GLIB1: Is glib-config in path?)) # Extract the first word of "glib-config", so it can be a program name with args. set dummy glib-config; ac_word=$2 { echo "$as_me:$LINENO: checking for $ac_word" >&5 echo $ECHO_N "checking for $ac_word... $ECHO_C" >&6; } if test "${ac_cv_path_GLIB_CONFIG+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else case $GLIB_CONFIG in [\\/]* | ?:[\\/]*) ac_cv_path_GLIB_CONFIG="$GLIB_CONFIG" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. for ac_exec_ext in '' $ac_executable_extensions; do if { test -f "$as_dir/$ac_word$ac_exec_ext" && $as_test_x "$as_dir/$ac_word$ac_exec_ext"; }; then ac_cv_path_GLIB_CONFIG="$as_dir/$ac_word$ac_exec_ext" echo "$as_me:$LINENO: found $as_dir/$ac_word$ac_exec_ext" >&5 break 2 fi done done IFS=$as_save_IFS test -z "$ac_cv_path_GLIB_CONFIG" && ac_cv_path_GLIB_CONFIG="no" ;; esac fi GLIB_CONFIG=$ac_cv_path_GLIB_CONFIG if test -n "$GLIB_CONFIG"; then { echo "$as_me:$LINENO: result: $GLIB_CONFIG" >&5 echo "${ECHO_T}$GLIB_CONFIG" >&6; } else { echo "$as_me:$LINENO: result: no" >&5 echo "${ECHO_T}no" >&6; } fi if test "$GLIB_CONFIG" = no; then echo "ERROR: Could not find glib-config ... is glib-1 installed ???" exit 1 fi glib_cflags=`glib-config --cflags glib` glib_libs=`glib-config --libs glib` CFLAGS="$CFLAGS $glib_cflags" LIBS="$LIBS $glib_libs" else echo "error: must be yes or no: --enable-glib2:$enable_glib2" exit 1 fi # ALLOW ASSERTIONS TO BE TURNED ON OR OFF # THIS ALSO ACTIVATES DEBUGGING OR OPTIMISATION COMPILE OPTIONS # Check whether --enable-assert was given. if test "${enable_assert+set}" = set; then enableval=$enable_assert; enable_assert="$enableval" else enable_assert=no fi if test "$enable_assert" = yes; then echo "Turning ASSERTIONS on" CFLAGS="$CFLAGS -g" elif test "$enable_assert" = no; then CFLAGS="$CFLAGS -DG_DISABLE_ASSERT" echo "Turning assertions off" if test "$GCC" = "yes"; then # Not currently using -fomit-frame-pointer as clashes with -pg # CFLAGS="$CFLAGS -O3 -fomit-frame-pointer -finline-functions" CFLAGS="$CFLAGS -O3 -finline-functions" fi else echo "error: must be yes or no: --enable-assert:$enable_assert" exit 1 fi # ALLOW PARANOID CODE CHECKING TO BE TURNED ON OR OFF # Check whether --enable-paranoia was given. if test "${enable_paranoia+set}" = set; then enableval=$enable_paranoia; enable_paranoia="$enableval" else enable_paranoia=no fi if test "$enable_paranoia" = yes; then echo "Turning PARANOID options on" if test "$GCC" = "yes"; then CFLAGS="$CFLAGS -Wall -Werror \ -Wstrict-prototypes -Wmissing-prototypes\ -Wmissing-declarations\ -D_POSIX_SOURCE -D_GNU_SOURCE" fi elif test "$enable_paranoia" = no; then echo "Turning paranoid options off" else echo "error: must be yes or no: --enable-paranoia:$enable_paranoia" exit 1 fi # ALLOW PROFILING WITH GPROF # Check whether --enable-gprof was given. if test "${enable_gprof+set}" = set; then enableval=$enable_gprof; enable_gprof="$enableval" else enable_gprof=no fi if test "$enable_gprof" = yes; then echo "Building executables for profiling with GPROF" echo "Statically linking for gprof" echo " Run: then run: gprof gmon.out" CFLAGS="$CFLAGS -O0 -static -pg" LDFLAGS="$LDFLAGS -g -pg -nodefaultlibs" LIBS="$LIBS -lc_p -lgcc -lm" elif test "$enable_gprof" = no; then echo "Not using options for profiling with gprof" else echo "error: must be yes or no: --enable-gprof:$enable_gprof" exit 1 fi # ALLOW COVERAGE TESTING WITH GCOV # Check whether --enable-gcov was given. if test "${enable_gcov+set}" = set; then enableval=$enable_gcov; enable_gcof="$enableval" else enable_gcov=no fi if test "$enable_gcov" = yes; then echo "Building executables for analysis with GCOV" echo " Run then run: gcov .c" echo " then view: .c.gcov" CFLAGS="$CFLAGS -fprofile-arcs -ftest-coverage" elif test "$enable_gcov" = no; then echo "Not using options for analysis with gcov" else echo "error: must be yes or no: --enable-gcov:$enable_gcov" exit 1 fi # FORCE LARGEFILE SUPPORT FOR 32 BIT SYSTEMS # Check whether --enable-largefile was given. if test "${enable_largefile+set}" = set; then enableval=$enable_largefile; enable_largefile="$enableval" else enable_largefile=yes fi if test "$enable_largefile" = yes; then echo "Adding support for LARGE FILES (>2Gb) for a 32 bit system" CFLAGS="$CFLAGS -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64" elif test "$enable_largefile" = no; then echo "Not forcing largefile support for 32 bit systems" else echo "error: must be yes or no:" echo " --enable-largefile:$enable_largefile" exit 1 fi # ALLOW USE OF COMPILED MODELS # Check whether --enable-compiledmodels was given. if test "${enable_compiledmodels+set}" = set; then enableval=$enable_compiledmodels; enable_compiledmodels="$enableval" else enable_compiledmodels=yes fi if test "$enable_compiledmodels" = yes; then echo "Using COMPILED C4 MODELS" codegen_extra_ldadd="viterbi.o scheduler.o c4_model_archive.a" codegen_extra_sources="c4_model_archive.h" elif test "$enable_compiledmodels" = no; then echo "Not using compiled C4 models" # model_extra_programs="bootstrapper" codegen_extra_sources="" codegen_extra_ldadd="`pwd`/src/c4/viterbi.o \ `pwd`/src/sdp/scheduler.o" else echo "error: must be yes or no:" echo " --enable-compiledmodels:$enable_compiledmodels" exit 1 fi # AC_SUBST(model_extra_programs) # ALLOW INSTALLATION OF UTILITIES # Check whether --enable-utilities was given. if test "${enable_utilities+set}" = set; then enableval=$enable_utilities; enable_utilities="$enableval" else enable_utilities=yes fi if test "$enable_utilities" = yes; then echo "Installing UTILITIES" installed_util_list=" \ fastaclean fastaclip fastachecksum fastacomposition \ fastadiff fastaexplode fastafetch fastahardmask \ fastaindex fastalength fastanrdb fastaoverlap \ fastareformat fastaremove fastarevcomp fastasoftmask \ fastasort fastasplit fastasubseq fastatranslate \ fastavalidcds fasta2esd fastaannotatecdna esd2esi" elif test "$enable_utilities" = no; then echo "Not installing utilities" installed_util_list="" else echo "error: must be yes or no:" echo " --enable-utilities:$enable_utilities" exit 1 fi # ALLOW DISABLING OF PTHREADS # Check whether --enable-pthreads was given. if test "${enable_pthreads+set}" = set; then enableval=$enable_pthreads; enable_pthreads="$enableval" else enable_pthreads=yes fi if test "$enable_pthreads" = yes; then echo "Using PTHREADS" CFLAGS="$CFLAGS -DUSE_PTHREADS" # for g_thread_init() g_thread_init_ldflags=`pkg-config --libs gthread-2.0` LDFLAGS="$LDFLAGS $g_thread_init_ldflags -lpthread" elif test "$enable_pthreads" = no; then echo "Not using pthreads" else echo "error: must be yes or no:" echo " --enable-pthreads:$enable_pthreads" exit 1 fi echo "----------------------------------------------------------------" ac_config_files="$ac_config_files Makefile doc/Makefile doc/man/Makefile doc/man/man1/Makefile src/Makefile src/general/Makefile src/sequence/Makefile src/database/Makefile src/struct/Makefile src/comparison/Makefile src/c4/Makefile src/bsdp/Makefile src/sdp/Makefile src/hub/Makefile src/model/Makefile src/program/Makefile src/util/Makefile test/Makefile test/util/Makefile test/ipcress/Makefile test/exonerate/Makefile test/data/Makefile test/data/cdna/Makefile test/data/protein/Makefile" cat >confcache <<\_ACEOF # This file is a shell script that caches the results of configure # tests run on this system so they can be shared between configure # scripts and configure runs, see configure's option --config-cache. # It is not useful on other systems. If it contains results you don't # want to keep, you may remove or edit it. # # config.status only pays attention to the cache file if you give it # the --recheck option to rerun configure. # # `ac_cv_env_foo' variables (set or unset) will be overridden when # loading this file, other *unset* `ac_cv_foo' will be assigned the # following values. _ACEOF # The following way of writing the cache mishandles newlines in values, # but we know of no workaround that is simple, portable, and efficient. # So, we kill variables containing newlines. # Ultrix sh set writes to stderr and can't be redirected directly, # and sets the high bit in the cache file unless we assign to the vars. ( for ac_var in `(set) 2>&1 | sed -n 's/^\([a-zA-Z_][a-zA-Z0-9_]*\)=.*/\1/p'`; do eval ac_val=\$$ac_var case $ac_val in #( *${as_nl}*) case $ac_var in #( *_cv_*) { echo "$as_me:$LINENO: WARNING: Cache variable $ac_var contains a newline." >&5 echo "$as_me: WARNING: Cache variable $ac_var contains a newline." >&2;} ;; esac case $ac_var in #( _ | IFS | as_nl) ;; #( *) $as_unset $ac_var ;; esac ;; esac done (set) 2>&1 | case $as_nl`(ac_space=' '; set) 2>&1` in #( *${as_nl}ac_space=\ *) # `set' does not quote correctly, so add quotes (double-quote # substitution turns \\\\ into \\, and sed turns \\ into \). sed -n \ "s/'/'\\\\''/g; s/^\\([_$as_cr_alnum]*_cv_[_$as_cr_alnum]*\\)=\\(.*\\)/\\1='\\2'/p" ;; #( *) # `set' quotes correctly as required by POSIX, so do not add quotes. sed -n "/^[_$as_cr_alnum]*_cv_[_$as_cr_alnum]*=/p" ;; esac | sort ) | sed ' /^ac_cv_env_/b end t clear :clear s/^\([^=]*\)=\(.*[{}].*\)$/test "${\1+set}" = set || &/ t end s/^\([^=]*\)=\(.*\)$/\1=${\1=\2}/ :end' >>confcache if diff "$cache_file" confcache >/dev/null 2>&1; then :; else if test -w "$cache_file"; then test "x$cache_file" != "x/dev/null" && { echo "$as_me:$LINENO: updating cache $cache_file" >&5 echo "$as_me: updating cache $cache_file" >&6;} cat confcache >$cache_file else { echo "$as_me:$LINENO: not updating unwritable cache $cache_file" >&5 echo "$as_me: not updating unwritable cache $cache_file" >&6;} fi fi rm -f confcache test "x$prefix" = xNONE && prefix=$ac_default_prefix # Let make expand exec_prefix. test "x$exec_prefix" = xNONE && exec_prefix='${prefix}' # Transform confdefs.h into DEFS. # Protect against shell expansion while executing Makefile rules. # Protect against Makefile macro expansion. # # If the first sed substitution is executed (which looks for macros that # take arguments), then branch to the quote section. Otherwise, # look for a macro that doesn't take arguments. ac_script=' t clear :clear s/^[ ]*#[ ]*define[ ][ ]*\([^ (][^ (]*([^)]*)\)[ ]*\(.*\)/-D\1=\2/g t quote s/^[ ]*#[ ]*define[ ][ ]*\([^ ][^ ]*\)[ ]*\(.*\)/-D\1=\2/g t quote b any :quote s/[ `~#$^&*(){}\\|;'\''"<>?]/\\&/g s/\[/\\&/g s/\]/\\&/g s/\$/$$/g H :any ${ g s/^\n// s/\n/ /g p } ' DEFS=`sed -n "$ac_script" confdefs.h` ac_libobjs= ac_ltlibobjs= for ac_i in : $LIBOBJS; do test "x$ac_i" = x: && continue # 1. Remove the extension, and $U if already installed. ac_script='s/\$U\././;s/\.o$//;s/\.obj$//' ac_i=`echo "$ac_i" | sed "$ac_script"` # 2. Prepend LIBOBJDIR. When used with automake>=1.10 LIBOBJDIR # will be set to the directory where LIBOBJS objects are built. ac_libobjs="$ac_libobjs \${LIBOBJDIR}$ac_i\$U.$ac_objext" ac_ltlibobjs="$ac_ltlibobjs \${LIBOBJDIR}$ac_i"'$U.lo' done LIBOBJS=$ac_libobjs LTLIBOBJS=$ac_ltlibobjs : ${CONFIG_STATUS=./config.status} ac_clean_files_save=$ac_clean_files ac_clean_files="$ac_clean_files $CONFIG_STATUS" { echo "$as_me:$LINENO: creating $CONFIG_STATUS" >&5 echo "$as_me: creating $CONFIG_STATUS" >&6;} cat >$CONFIG_STATUS <<_ACEOF #! $SHELL # Generated by $as_me. # Run this file to recreate the current configuration. # Compiler output produced by configure, useful for debugging # configure, is in config.log if it exists. debug=false ac_cs_recheck=false ac_cs_silent=false SHELL=\${CONFIG_SHELL-$SHELL} _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF ## --------------------- ## ## M4sh Initialization. ## ## --------------------- ## # Be more Bourne compatible DUALCASE=1; export DUALCASE # for MKS sh if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then emulate sh NULLCMD=: # Zsh 3.x and 4.x performs word splitting on ${1+"$@"}, which # is contrary to our usage. Disable this feature. alias -g '${1+"$@"}'='"$@"' setopt NO_GLOB_SUBST else case `(set -o) 2>/dev/null` in *posix*) set -o posix ;; esac fi # PATH needs CR # Avoid depending upon Character Ranges. as_cr_letters='abcdefghijklmnopqrstuvwxyz' as_cr_LETTERS='ABCDEFGHIJKLMNOPQRSTUVWXYZ' as_cr_Letters=$as_cr_letters$as_cr_LETTERS as_cr_digits='0123456789' as_cr_alnum=$as_cr_Letters$as_cr_digits # The user is always right. if test "${PATH_SEPARATOR+set}" != set; then echo "#! /bin/sh" >conf$$.sh echo "exit 0" >>conf$$.sh chmod +x conf$$.sh if (PATH="/nonexistent;."; conf$$.sh) >/dev/null 2>&1; then PATH_SEPARATOR=';' else PATH_SEPARATOR=: fi rm -f conf$$.sh fi # Support unset when possible. if ( (MAIL=60; unset MAIL) || exit) >/dev/null 2>&1; then as_unset=unset else as_unset=false fi # IFS # We need space, tab and new line, in precisely that order. Quoting is # there to prevent editors from complaining about space-tab. # (If _AS_PATH_WALK were called with IFS unset, it would disable word # splitting by setting IFS to empty value.) as_nl=' ' IFS=" "" $as_nl" # Find who we are. Look in the path if we contain no directory separator. case $0 in *[\\/]* ) as_myself=$0 ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS test -z "$as_dir" && as_dir=. test -r "$as_dir/$0" && as_myself=$as_dir/$0 && break done IFS=$as_save_IFS ;; esac # We did not find ourselves, most probably we were run as `sh COMMAND' # in which case we are not to be found in the path. if test "x$as_myself" = x; then as_myself=$0 fi if test ! -f "$as_myself"; then echo "$as_myself: error: cannot find myself; rerun with an absolute file name" >&2 { (exit 1); exit 1; } fi # Work around bugs in pre-3.0 UWIN ksh. for as_var in ENV MAIL MAILPATH do ($as_unset $as_var) >/dev/null 2>&1 && $as_unset $as_var done PS1='$ ' PS2='> ' PS4='+ ' # NLS nuisances. for as_var in \ LANG LANGUAGE LC_ADDRESS LC_ALL LC_COLLATE LC_CTYPE LC_IDENTIFICATION \ LC_MEASUREMENT LC_MESSAGES LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER \ LC_TELEPHONE LC_TIME do if (set +x; test -z "`(eval $as_var=C; export $as_var) 2>&1`"); then eval $as_var=C; export $as_var else ($as_unset $as_var) >/dev/null 2>&1 && $as_unset $as_var fi done # Required to use basename. if expr a : '\(a\)' >/dev/null 2>&1 && test "X`expr 00001 : '.*\(...\)'`" = X001; then as_expr=expr else as_expr=false fi if (basename -- /) >/dev/null 2>&1 && test "X`basename -- / 2>&1`" = "X/"; then as_basename=basename else as_basename=false fi # Name of the executable. as_me=`$as_basename -- "$0" || $as_expr X/"$0" : '.*/\([^/][^/]*\)/*$' \| \ X"$0" : 'X\(//\)$' \| \ X"$0" : 'X\(/\)' \| . 2>/dev/null || echo X/"$0" | sed '/^.*\/\([^/][^/]*\)\/*$/{ s//\1/ q } /^X\/\(\/\/\)$/{ s//\1/ q } /^X\/\(\/\).*/{ s//\1/ q } s/.*/./; q'` # CDPATH. $as_unset CDPATH as_lineno_1=$LINENO as_lineno_2=$LINENO test "x$as_lineno_1" != "x$as_lineno_2" && test "x`expr $as_lineno_1 + 1`" = "x$as_lineno_2" || { # Create $as_me.lineno as a copy of $as_myself, but with $LINENO # uniformly replaced by the line number. The first 'sed' inserts a # line-number line after each line using $LINENO; the second 'sed' # does the real work. The second script uses 'N' to pair each # line-number line with the line containing $LINENO, and appends # trailing '-' during substitution so that $LINENO is not a special # case at line end. # (Raja R Harinath suggested sed '=', and Paul Eggert wrote the # scripts with optimization help from Paolo Bonzini. Blame Lee # E. McMahon (1931-1989) for sed's syntax. :-) sed -n ' p /[$]LINENO/= ' <$as_myself | sed ' s/[$]LINENO.*/&-/ t lineno b :lineno N :loop s/[$]LINENO\([^'$as_cr_alnum'_].*\n\)\(.*\)/\2\1\2/ t loop s/-\n.*// ' >$as_me.lineno && chmod +x "$as_me.lineno" || { echo "$as_me: error: cannot create $as_me.lineno; rerun with a POSIX shell" >&2 { (exit 1); exit 1; }; } # Don't try to exec as it changes $[0], causing all sort of problems # (the dirname of $[0] is not the place where we might find the # original and so on. Autoconf is especially sensitive to this). . "./$as_me.lineno" # Exit status is that of the last command. exit } if (as_dir=`dirname -- /` && test "X$as_dir" = X/) >/dev/null 2>&1; then as_dirname=dirname else as_dirname=false fi ECHO_C= ECHO_N= ECHO_T= case `echo -n x` in -n*) case `echo 'x\c'` in *c*) ECHO_T=' ';; # ECHO_T is single tab character. *) ECHO_C='\c';; esac;; *) ECHO_N='-n';; esac if expr a : '\(a\)' >/dev/null 2>&1 && test "X`expr 00001 : '.*\(...\)'`" = X001; then as_expr=expr else as_expr=false fi rm -f conf$$ conf$$.exe conf$$.file if test -d conf$$.dir; then rm -f conf$$.dir/conf$$.file else rm -f conf$$.dir mkdir conf$$.dir fi echo >conf$$.file if ln -s conf$$.file conf$$ 2>/dev/null; then as_ln_s='ln -s' # ... but there are two gotchas: # 1) On MSYS, both `ln -s file dir' and `ln file dir' fail. # 2) DJGPP < 2.04 has no symlinks; `ln -s' creates a wrapper executable. # In both cases, we have to default to `cp -p'. ln -s conf$$.file conf$$.dir 2>/dev/null && test ! -f conf$$.exe || as_ln_s='cp -p' elif ln conf$$.file conf$$ 2>/dev/null; then as_ln_s=ln else as_ln_s='cp -p' fi rm -f conf$$ conf$$.exe conf$$.dir/conf$$.file conf$$.file rmdir conf$$.dir 2>/dev/null if mkdir -p . 2>/dev/null; then as_mkdir_p=: else test -d ./-p && rmdir ./-p as_mkdir_p=false fi if test -x / >/dev/null 2>&1; then as_test_x='test -x' else if ls -dL / >/dev/null 2>&1; then as_ls_L_option=L else as_ls_L_option= fi as_test_x=' eval sh -c '\'' if test -d "$1"; then test -d "$1/."; else case $1 in -*)set "./$1";; esac; case `ls -ld'$as_ls_L_option' "$1" 2>/dev/null` in ???[sx]*):;;*)false;;esac;fi '\'' sh ' fi as_executable_p=$as_test_x # Sed expression to map a string onto a valid CPP name. as_tr_cpp="eval sed 'y%*$as_cr_letters%P$as_cr_LETTERS%;s%[^_$as_cr_alnum]%_%g'" # Sed expression to map a string onto a valid variable name. as_tr_sh="eval sed 'y%*+%pp%;s%[^_$as_cr_alnum]%_%g'" exec 6>&1 # Save the log message, to keep $[0] and so on meaningful, and to # report actual input values of CONFIG_FILES etc. instead of their # values after options handling. ac_log=" This file was extended by $as_me, which was generated by GNU Autoconf 2.61. Invocation command line was CONFIG_FILES = $CONFIG_FILES CONFIG_HEADERS = $CONFIG_HEADERS CONFIG_LINKS = $CONFIG_LINKS CONFIG_COMMANDS = $CONFIG_COMMANDS $ $0 $@ on `(hostname || uname -n) 2>/dev/null | sed 1q` " _ACEOF cat >>$CONFIG_STATUS <<_ACEOF # Files that config.status was made for. config_files="$ac_config_files" _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF ac_cs_usage="\ \`$as_me' instantiates files from templates according to the current configuration. Usage: $0 [OPTIONS] [FILE]... -h, --help print this help, then exit -V, --version print version number and configuration settings, then exit -q, --quiet do not print progress messages -d, --debug don't remove temporary files --recheck update $as_me by reconfiguring in the same conditions --file=FILE[:TEMPLATE] instantiate the configuration file FILE Configuration files: $config_files Report bugs to ." _ACEOF cat >>$CONFIG_STATUS <<_ACEOF ac_cs_version="\\ config.status configured by $0, generated by GNU Autoconf 2.61, with options \\"`echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`\\" Copyright (C) 2006 Free Software Foundation, Inc. This config.status script is free software; the Free Software Foundation gives unlimited permission to copy, distribute and modify it." ac_pwd='$ac_pwd' srcdir='$srcdir' INSTALL='$INSTALL' _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF # If no file are specified by the user, then we need to provide default # value. By we need to know if files were specified by the user. ac_need_defaults=: while test $# != 0 do case $1 in --*=*) ac_option=`expr "X$1" : 'X\([^=]*\)='` ac_optarg=`expr "X$1" : 'X[^=]*=\(.*\)'` ac_shift=: ;; *) ac_option=$1 ac_optarg=$2 ac_shift=shift ;; esac case $ac_option in # Handling of the options. -recheck | --recheck | --rechec | --reche | --rech | --rec | --re | --r) ac_cs_recheck=: ;; --version | --versio | --versi | --vers | --ver | --ve | --v | -V ) echo "$ac_cs_version"; exit ;; --debug | --debu | --deb | --de | --d | -d ) debug=: ;; --file | --fil | --fi | --f ) $ac_shift CONFIG_FILES="$CONFIG_FILES $ac_optarg" ac_need_defaults=false;; --he | --h | --help | --hel | -h ) echo "$ac_cs_usage"; exit ;; -q | -quiet | --quiet | --quie | --qui | --qu | --q \ | -silent | --silent | --silen | --sile | --sil | --si | --s) ac_cs_silent=: ;; # This is an error. -*) { echo "$as_me: error: unrecognized option: $1 Try \`$0 --help' for more information." >&2 { (exit 1); exit 1; }; } ;; *) ac_config_targets="$ac_config_targets $1" ac_need_defaults=false ;; esac shift done ac_configure_extra_args= if $ac_cs_silent; then exec 6>/dev/null ac_configure_extra_args="$ac_configure_extra_args --silent" fi _ACEOF cat >>$CONFIG_STATUS <<_ACEOF if \$ac_cs_recheck; then echo "running CONFIG_SHELL=$SHELL $SHELL $0 "$ac_configure_args \$ac_configure_extra_args " --no-create --no-recursion" >&6 CONFIG_SHELL=$SHELL export CONFIG_SHELL exec $SHELL "$0"$ac_configure_args \$ac_configure_extra_args --no-create --no-recursion fi _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF exec 5>>config.log { echo sed 'h;s/./-/g;s/^.../## /;s/...$/ ##/;p;x;p;x' <<_ASBOX ## Running $as_me. ## _ASBOX echo "$ac_log" } >&5 _ACEOF cat >>$CONFIG_STATUS <<_ACEOF _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF # Handling of arguments. for ac_config_target in $ac_config_targets do case $ac_config_target in "Makefile") CONFIG_FILES="$CONFIG_FILES Makefile" ;; "doc/Makefile") CONFIG_FILES="$CONFIG_FILES doc/Makefile" ;; "doc/man/Makefile") CONFIG_FILES="$CONFIG_FILES doc/man/Makefile" ;; "doc/man/man1/Makefile") CONFIG_FILES="$CONFIG_FILES doc/man/man1/Makefile" ;; "src/Makefile") CONFIG_FILES="$CONFIG_FILES src/Makefile" ;; "src/general/Makefile") CONFIG_FILES="$CONFIG_FILES src/general/Makefile" ;; "src/sequence/Makefile") CONFIG_FILES="$CONFIG_FILES src/sequence/Makefile" ;; "src/database/Makefile") CONFIG_FILES="$CONFIG_FILES src/database/Makefile" ;; "src/struct/Makefile") CONFIG_FILES="$CONFIG_FILES src/struct/Makefile" ;; "src/comparison/Makefile") CONFIG_FILES="$CONFIG_FILES src/comparison/Makefile" ;; "src/c4/Makefile") CONFIG_FILES="$CONFIG_FILES src/c4/Makefile" ;; "src/bsdp/Makefile") CONFIG_FILES="$CONFIG_FILES src/bsdp/Makefile" ;; "src/sdp/Makefile") CONFIG_FILES="$CONFIG_FILES src/sdp/Makefile" ;; "src/hub/Makefile") CONFIG_FILES="$CONFIG_FILES src/hub/Makefile" ;; "src/model/Makefile") CONFIG_FILES="$CONFIG_FILES src/model/Makefile" ;; "src/program/Makefile") CONFIG_FILES="$CONFIG_FILES src/program/Makefile" ;; "src/util/Makefile") CONFIG_FILES="$CONFIG_FILES src/util/Makefile" ;; "test/Makefile") CONFIG_FILES="$CONFIG_FILES test/Makefile" ;; "test/util/Makefile") CONFIG_FILES="$CONFIG_FILES test/util/Makefile" ;; "test/ipcress/Makefile") CONFIG_FILES="$CONFIG_FILES test/ipcress/Makefile" ;; "test/exonerate/Makefile") CONFIG_FILES="$CONFIG_FILES test/exonerate/Makefile" ;; "test/data/Makefile") CONFIG_FILES="$CONFIG_FILES test/data/Makefile" ;; "test/data/cdna/Makefile") CONFIG_FILES="$CONFIG_FILES test/data/cdna/Makefile" ;; "test/data/protein/Makefile") CONFIG_FILES="$CONFIG_FILES test/data/protein/Makefile" ;; *) { { echo "$as_me:$LINENO: error: invalid argument: $ac_config_target" >&5 echo "$as_me: error: invalid argument: $ac_config_target" >&2;} { (exit 1); exit 1; }; };; esac done # If the user did not use the arguments to specify the items to instantiate, # then the envvar interface is used. Set only those that are not. # We use the long form for the default assignment because of an extremely # bizarre bug on SunOS 4.1.3. if $ac_need_defaults; then test "${CONFIG_FILES+set}" = set || CONFIG_FILES=$config_files fi # Have a temporary directory for convenience. Make it in the build tree # simply because there is no reason against having it here, and in addition, # creating and moving files from /tmp can sometimes cause problems. # Hook for its removal unless debugging. # Note that there is a small window in which the directory will not be cleaned: # after its creation but before its name has been assigned to `$tmp'. $debug || { tmp= trap 'exit_status=$? { test -z "$tmp" || test ! -d "$tmp" || rm -fr "$tmp"; } && exit $exit_status ' 0 trap '{ (exit 1); exit 1; }' 1 2 13 15 } # Create a (secure) tmp directory for tmp files. { tmp=`(umask 077 && mktemp -d "./confXXXXXX") 2>/dev/null` && test -n "$tmp" && test -d "$tmp" } || { tmp=./conf$$-$RANDOM (umask 077 && mkdir "$tmp") } || { echo "$me: cannot create a temporary directory in ." >&2 { (exit 1); exit 1; } } # # Set up the sed scripts for CONFIG_FILES section. # # No need to generate the scripts if there are no CONFIG_FILES. # This happens for instance when ./config.status config.h if test -n "$CONFIG_FILES"; then _ACEOF ac_delim='%!_!# ' for ac_last_try in false false false false false :; do cat >conf$$subs.sed <<_ACEOF SHELL!$SHELL$ac_delim PATH_SEPARATOR!$PATH_SEPARATOR$ac_delim PACKAGE_NAME!$PACKAGE_NAME$ac_delim PACKAGE_TARNAME!$PACKAGE_TARNAME$ac_delim PACKAGE_VERSION!$PACKAGE_VERSION$ac_delim PACKAGE_STRING!$PACKAGE_STRING$ac_delim PACKAGE_BUGREPORT!$PACKAGE_BUGREPORT$ac_delim exec_prefix!$exec_prefix$ac_delim prefix!$prefix$ac_delim program_transform_name!$program_transform_name$ac_delim bindir!$bindir$ac_delim sbindir!$sbindir$ac_delim libexecdir!$libexecdir$ac_delim datarootdir!$datarootdir$ac_delim datadir!$datadir$ac_delim sysconfdir!$sysconfdir$ac_delim sharedstatedir!$sharedstatedir$ac_delim localstatedir!$localstatedir$ac_delim includedir!$includedir$ac_delim oldincludedir!$oldincludedir$ac_delim docdir!$docdir$ac_delim infodir!$infodir$ac_delim htmldir!$htmldir$ac_delim dvidir!$dvidir$ac_delim pdfdir!$pdfdir$ac_delim psdir!$psdir$ac_delim libdir!$libdir$ac_delim localedir!$localedir$ac_delim mandir!$mandir$ac_delim DEFS!$DEFS$ac_delim ECHO_C!$ECHO_C$ac_delim ECHO_N!$ECHO_N$ac_delim ECHO_T!$ECHO_T$ac_delim LIBS!$LIBS$ac_delim build_alias!$build_alias$ac_delim host_alias!$host_alias$ac_delim target_alias!$target_alias$ac_delim INSTALL_PROGRAM!$INSTALL_PROGRAM$ac_delim INSTALL_SCRIPT!$INSTALL_SCRIPT$ac_delim INSTALL_DATA!$INSTALL_DATA$ac_delim PACKAGE!$PACKAGE$ac_delim VERSION!$VERSION$ac_delim ACLOCAL!$ACLOCAL$ac_delim AUTOCONF!$AUTOCONF$ac_delim AUTOMAKE!$AUTOMAKE$ac_delim AUTOHEADER!$AUTOHEADER$ac_delim MAKEINFO!$MAKEINFO$ac_delim SET_MAKE!$SET_MAKE$ac_delim CC!$CC$ac_delim CFLAGS!$CFLAGS$ac_delim LDFLAGS!$LDFLAGS$ac_delim CPPFLAGS!$CPPFLAGS$ac_delim ac_ct_CC!$ac_ct_CC$ac_delim EXEEXT!$EXEEXT$ac_delim OBJEXT!$OBJEXT$ac_delim CPP!$CPP$ac_delim GREP!$GREP$ac_delim EGREP!$EGREP$ac_delim build!$build$ac_delim build_cpu!$build_cpu$ac_delim build_vendor!$build_vendor$ac_delim build_os!$build_os$ac_delim host!$host$ac_delim host_cpu!$host_cpu$ac_delim host_vendor!$host_vendor$ac_delim host_os!$host_os$ac_delim custom_guint64_format!$custom_guint64_format$ac_delim source_root_dir!$source_root_dir$ac_delim PKG_CONFIG!$PKG_CONFIG$ac_delim GLIB_CONFIG!$GLIB_CONFIG$ac_delim glib_libs!$glib_libs$ac_delim glib_cflags!$glib_cflags$ac_delim codegen_extra_sources!$codegen_extra_sources$ac_delim codegen_extra_ldadd!$codegen_extra_ldadd$ac_delim installed_util_list!$installed_util_list$ac_delim use_pthreads!$use_pthreads$ac_delim LIBOBJS!$LIBOBJS$ac_delim LTLIBOBJS!$LTLIBOBJS$ac_delim _ACEOF if test `sed -n "s/.*$ac_delim\$/X/p" conf$$subs.sed | grep -c X` = 78; then break elif $ac_last_try; then { { echo "$as_me:$LINENO: error: could not make $CONFIG_STATUS" >&5 echo "$as_me: error: could not make $CONFIG_STATUS" >&2;} { (exit 1); exit 1; }; } else ac_delim="$ac_delim!$ac_delim _$ac_delim!! " fi done ac_eof=`sed -n '/^CEOF[0-9]*$/s/CEOF/0/p' conf$$subs.sed` if test -n "$ac_eof"; then ac_eof=`echo "$ac_eof" | sort -nru | sed 1q` ac_eof=`expr $ac_eof + 1` fi cat >>$CONFIG_STATUS <<_ACEOF cat >"\$tmp/subs-1.sed" <<\CEOF$ac_eof /@[a-zA-Z_][a-zA-Z_0-9]*@/!b end _ACEOF sed ' s/[,\\&]/\\&/g; s/@/@|#_!!_#|/g s/^/s,@/; s/!/@,|#_!!_#|/ :n t n s/'"$ac_delim"'$/,g/; t s/$/\\/; p N; s/^.*\n//; s/[,\\&]/\\&/g; s/@/@|#_!!_#|/g; b n ' >>$CONFIG_STATUS >$CONFIG_STATUS <<_ACEOF :end s/|#_!!_#|//g CEOF$ac_eof _ACEOF # VPATH may cause trouble with some makes, so we remove $(srcdir), # ${srcdir} and @srcdir@ from VPATH if srcdir is ".", strip leading and # trailing colons and then remove the whole line if VPATH becomes empty # (actually we leave an empty line to preserve line numbers). if test "x$srcdir" = x.; then ac_vpsub='/^[ ]*VPATH[ ]*=/{ s/:*\$(srcdir):*/:/ s/:*\${srcdir}:*/:/ s/:*@srcdir@:*/:/ s/^\([^=]*=[ ]*\):*/\1/ s/:*$// s/^[^=]*=[ ]*$// }' fi cat >>$CONFIG_STATUS <<\_ACEOF fi # test -n "$CONFIG_FILES" for ac_tag in :F $CONFIG_FILES do case $ac_tag in :[FHLC]) ac_mode=$ac_tag; continue;; esac case $ac_mode$ac_tag in :[FHL]*:*);; :L* | :C*:*) { { echo "$as_me:$LINENO: error: Invalid tag $ac_tag." >&5 echo "$as_me: error: Invalid tag $ac_tag." >&2;} { (exit 1); exit 1; }; };; :[FH]-) ac_tag=-:-;; :[FH]*) ac_tag=$ac_tag:$ac_tag.in;; esac ac_save_IFS=$IFS IFS=: set x $ac_tag IFS=$ac_save_IFS shift ac_file=$1 shift case $ac_mode in :L) ac_source=$1;; :[FH]) ac_file_inputs= for ac_f do case $ac_f in -) ac_f="$tmp/stdin";; *) # Look for the file first in the build tree, then in the source tree # (if the path is not absolute). The absolute path cannot be DOS-style, # because $ac_f cannot contain `:'. test -f "$ac_f" || case $ac_f in [\\/$]*) false;; *) test -f "$srcdir/$ac_f" && ac_f="$srcdir/$ac_f";; esac || { { echo "$as_me:$LINENO: error: cannot find input file: $ac_f" >&5 echo "$as_me: error: cannot find input file: $ac_f" >&2;} { (exit 1); exit 1; }; };; esac ac_file_inputs="$ac_file_inputs $ac_f" done # Let's still pretend it is `configure' which instantiates (i.e., don't # use $as_me), people would be surprised to read: # /* config.h. Generated by config.status. */ configure_input="Generated from "`IFS=: echo $* | sed 's|^[^:]*/||;s|:[^:]*/|, |g'`" by configure." if test x"$ac_file" != x-; then configure_input="$ac_file. $configure_input" { echo "$as_me:$LINENO: creating $ac_file" >&5 echo "$as_me: creating $ac_file" >&6;} fi case $ac_tag in *:-:* | *:-) cat >"$tmp/stdin";; esac ;; esac ac_dir=`$as_dirname -- "$ac_file" || $as_expr X"$ac_file" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$ac_file" : 'X\(//\)[^/]' \| \ X"$ac_file" : 'X\(//\)$' \| \ X"$ac_file" : 'X\(/\)' \| . 2>/dev/null || echo X"$ac_file" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q } /^X\(\/\/\)[^/].*/{ s//\1/ q } /^X\(\/\/\)$/{ s//\1/ q } /^X\(\/\).*/{ s//\1/ q } s/.*/./; q'` { as_dir="$ac_dir" case $as_dir in #( -*) as_dir=./$as_dir;; esac test -d "$as_dir" || { $as_mkdir_p && mkdir -p "$as_dir"; } || { as_dirs= while :; do case $as_dir in #( *\'*) as_qdir=`echo "$as_dir" | sed "s/'/'\\\\\\\\''/g"`;; #( *) as_qdir=$as_dir;; esac as_dirs="'$as_qdir' $as_dirs" as_dir=`$as_dirname -- "$as_dir" || $as_expr X"$as_dir" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$as_dir" : 'X\(//\)[^/]' \| \ X"$as_dir" : 'X\(//\)$' \| \ X"$as_dir" : 'X\(/\)' \| . 2>/dev/null || echo X"$as_dir" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q } /^X\(\/\/\)[^/].*/{ s//\1/ q } /^X\(\/\/\)$/{ s//\1/ q } /^X\(\/\).*/{ s//\1/ q } s/.*/./; q'` test -d "$as_dir" && break done test -z "$as_dirs" || eval "mkdir $as_dirs" } || test -d "$as_dir" || { { echo "$as_me:$LINENO: error: cannot create directory $as_dir" >&5 echo "$as_me: error: cannot create directory $as_dir" >&2;} { (exit 1); exit 1; }; }; } ac_builddir=. case "$ac_dir" in .) ac_dir_suffix= ac_top_builddir_sub=. ac_top_build_prefix= ;; *) ac_dir_suffix=/`echo "$ac_dir" | sed 's,^\.[\\/],,'` # A ".." for each directory in $ac_dir_suffix. ac_top_builddir_sub=`echo "$ac_dir_suffix" | sed 's,/[^\\/]*,/..,g;s,/,,'` case $ac_top_builddir_sub in "") ac_top_builddir_sub=. ac_top_build_prefix= ;; *) ac_top_build_prefix=$ac_top_builddir_sub/ ;; esac ;; esac ac_abs_top_builddir=$ac_pwd ac_abs_builddir=$ac_pwd$ac_dir_suffix # for backward compatibility: ac_top_builddir=$ac_top_build_prefix case $srcdir in .) # We are building in place. ac_srcdir=. ac_top_srcdir=$ac_top_builddir_sub ac_abs_top_srcdir=$ac_pwd ;; [\\/]* | ?:[\\/]* ) # Absolute name. ac_srcdir=$srcdir$ac_dir_suffix; ac_top_srcdir=$srcdir ac_abs_top_srcdir=$srcdir ;; *) # Relative name. ac_srcdir=$ac_top_build_prefix$srcdir$ac_dir_suffix ac_top_srcdir=$ac_top_build_prefix$srcdir ac_abs_top_srcdir=$ac_pwd/$srcdir ;; esac ac_abs_srcdir=$ac_abs_top_srcdir$ac_dir_suffix case $ac_mode in :F) # # CONFIG_FILE # case $INSTALL in [\\/$]* | ?:[\\/]* ) ac_INSTALL=$INSTALL ;; *) ac_INSTALL=$ac_top_build_prefix$INSTALL ;; esac _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF # If the template does not know about datarootdir, expand it. # FIXME: This hack should be removed a few years after 2.60. ac_datarootdir_hack=; ac_datarootdir_seen= case `sed -n '/datarootdir/ { p q } /@datadir@/p /@docdir@/p /@infodir@/p /@localedir@/p /@mandir@/p ' $ac_file_inputs` in *datarootdir*) ac_datarootdir_seen=yes;; *@datadir@*|*@docdir@*|*@infodir@*|*@localedir@*|*@mandir@*) { echo "$as_me:$LINENO: WARNING: $ac_file_inputs seems to ignore the --datarootdir setting" >&5 echo "$as_me: WARNING: $ac_file_inputs seems to ignore the --datarootdir setting" >&2;} _ACEOF cat >>$CONFIG_STATUS <<_ACEOF ac_datarootdir_hack=' s&@datadir@&$datadir&g s&@docdir@&$docdir&g s&@infodir@&$infodir&g s&@localedir@&$localedir&g s&@mandir@&$mandir&g s&\\\${datarootdir}&$datarootdir&g' ;; esac _ACEOF # Neutralize VPATH when `$srcdir' = `.'. # Shell code in configure.ac might set extrasub. # FIXME: do we really want to maintain this feature? cat >>$CONFIG_STATUS <<_ACEOF sed "$ac_vpsub $extrasub _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF :t /@[a-zA-Z_][a-zA-Z_0-9]*@/!b s&@configure_input@&$configure_input&;t t s&@top_builddir@&$ac_top_builddir_sub&;t t s&@srcdir@&$ac_srcdir&;t t s&@abs_srcdir@&$ac_abs_srcdir&;t t s&@top_srcdir@&$ac_top_srcdir&;t t s&@abs_top_srcdir@&$ac_abs_top_srcdir&;t t s&@builddir@&$ac_builddir&;t t s&@abs_builddir@&$ac_abs_builddir&;t t s&@abs_top_builddir@&$ac_abs_top_builddir&;t t s&@INSTALL@&$ac_INSTALL&;t t $ac_datarootdir_hack " $ac_file_inputs | sed -f "$tmp/subs-1.sed" >$tmp/out test -z "$ac_datarootdir_hack$ac_datarootdir_seen" && { ac_out=`sed -n '/\${datarootdir}/p' "$tmp/out"`; test -n "$ac_out"; } && { ac_out=`sed -n '/^[ ]*datarootdir[ ]*:*=/p' "$tmp/out"`; test -z "$ac_out"; } && { echo "$as_me:$LINENO: WARNING: $ac_file contains a reference to the variable \`datarootdir' which seems to be undefined. Please make sure it is defined." >&5 echo "$as_me: WARNING: $ac_file contains a reference to the variable \`datarootdir' which seems to be undefined. Please make sure it is defined." >&2;} rm -f "$tmp/stdin" case $ac_file in -) cat "$tmp/out"; rm -f "$tmp/out";; *) rm -f "$ac_file"; mv "$tmp/out" $ac_file;; esac ;; esac done # for ac_tag { (exit 0); exit 0; } _ACEOF chmod +x $CONFIG_STATUS ac_clean_files=$ac_clean_files_save # configure is writing to config.log, and then calls config.status. # config.status does its own redirection, appending to config.log. # Unfortunately, on DOS this fails, as config.log is still kept open # by configure, so config.status won't be able to write to it; its # output is simply discarded. So we exec the FD to /dev/null, # effectively closing config.log, so it can be properly (re)opened and # appended to by config.status. When coming back to configure, we # need to make the FD available again. if test "$no_create" != yes; then ac_cs_success=: ac_config_status_args= test "$silent" = yes && ac_config_status_args="$ac_config_status_args --quiet" exec 5>/dev/null $SHELL $CONFIG_STATUS $ac_config_status_args || ac_cs_success=false exec 5>>config.log # Use ||, not &&, to avoid exiting from the if with $? = 1, which # would make configure fail if this is the last instruction. $ac_cs_success || { (exit 1); exit 1; } fi exonerate-2.4.0/configure.in0000644000175000017500000002553611222703626012755 00000000000000# configure.in for the exonerate package # # Guy St.C. Slater. 16:11:2000 # # ( Borrowed some config tricks from gtk+, gnet, xscreensaver, mutt ) AC_INIT(src/sequence/sequence.c) AM_INIT_AUTOMAKE(exonerate,2.4.0) AC_PROG_CC AC_PROG_INSTALL AC_PROG_MAKE_SET # AM_PROG_LIBTOOL # OTHER STUFF AC_C_CONST AC_C_INLINE AC_HEADER_STDC # CHECKS FOR stdlib.h stdarg.h string.h and float.h AC_CHECK_HEADERS(limits.h errno.h math.h strings.h ctype.h \ sys/stat.h sys/types.h time.h unistd.h) # SET host (REQUIRES config.guess AND config.sub) AC_CANONICAL_HOST AC_SUBST(host) echo "Set host to $host" # AVOID autoconf WARNINGS ABOUT --datarootdir AC_SUBST(datarootdir) custom_guint64_format="llu" # TRY TO GET TO BUILD ON ALL PLATFORMS BY DEFAULT echo ""$host case $host in *-apple-darwin*) echo "Customising build for OSX" C4_AR_INIT="csq" C4_AR_APPEND="sq" ;; *-dec-osf*) echo "Customising build for OSF" C4_AR="/bin/ar" C4_AR_INIT="czr" C4_AR_APPEND="zr" # use the TRU64 C compiler CC="cc" GCC= # No longer true (prevents GCC specific CFLAGS being set) CFLAGS="-O3 -arch ev6 -fast" ;; *-irix*) echo "Customising build for IRIX" C4_AR_INIT="cq" C4_AR_APPEND="q" ;; *-solaris*) echo "Customising build for Solaris" C4_AR_INIT="cq" C4_AR_APPEND="q" ;; x86_64-*) echo "Customising build for X86_64" custom_guint64_format="lu" ;; *) echo "*** Unknown build platform: $host ***" ;; esac AC_SUBST(custom_guint64_format) # SET SOURCE_ROOT_DIR (REQUIRED FOR viterbi.h) source_root_dir=`pwd` AC_SUBST(source_root_dir) echo "Set source_root_dir to $source_root_dir" echo "----------------------------------------------------------------" # CHECK FOR socklen_t # THIS IS TO FIX PROBLEMS WHERE socklen_t ISN'T DEFINED (eg. ON OSF) # AC_MSG_CHECKING(for socklen_t) AC_TRY_COMPILE([#include #include socklen_t x;], [], [AC_MSG_RESULT(yes)], [AC_TRY_COMPILE([#include #include int accept(int, struct sockaddr *, size_t *);], [], [AC_MSG_RESULT(size_t) AC_DEFINE(socklen_t, size_t)], [ AC_MSG_RESULT(int) AC_DEFINE(socklen_t, int)])]) # ALLOW EITHER GLIB-1 OR GLIB-2 TO BE USED # # IT SEEMS TRICKY TO ALLOW COMPILATION AGAINST BOTH GLIB-1 AND GLIB-2 # THAT WILL WORK FOR ONE LIBRARY WHEN THE OTHER IS ABSENT ... # # ... WHAT FOLLOWS IS FAR FROM IDEAL, BUT SEEMS TO WORK. # ANY SUGGESTIONS FOR A BETTER WAY OF DOING THIS WILL BE WELCOME. # AC_ARG_ENABLE(glib2, [ --enable-glib2 Use glib2 library --disable-glib2 Do not glib2 (use glib1 instead)], [enable_glib2="$enableval"],[enable_glib2=yes]) if test "$enable_glib2" = yes; then # AM_PATH_GLIB_2_0(2.0.0, # [LIBS="$LIBS $GLIB_LIBS" CFLAGS="$CFLAGS $GLIB_CFLAGS"], # AC_MSG_ERROR(Cannot find GLIB2: Is pkg-config in path?)) # PKG_CHECK_MODULES(GLIB, [glib-2.0], [:], [:]) AC_PATH_PROG(PKG_CONFIG, pkg-config, no) if test "$PKG_CONFIG" = no; then echo "ERROR: Could not find pkg-config ... is glib-2 installed ???" exit 1 fi echo "Using GLIB-2" glib_cflags=`pkg-config --cflags glib-2.0` glib_libs=`pkg-config --libs glib-2.0` CFLAGS="$CFLAGS $glib_cflags" LIBS="$LIBS $glib_libs" elif test "$enable_glib2" = no; then echo "Using GLIB-1" # AM_PATH_GLIB([1.2.0], # [LIBS="$LIBS $GLIB_LIBS" CFLAGS="$CFLAGS $GLIB_CFLAGS"], # AC_MSG_ERROR(Cannot find GLIB1: Is glib-config in path?)) AC_PATH_PROG(GLIB_CONFIG, glib-config, no) if test "$GLIB_CONFIG" = no; then echo "ERROR: Could not find glib-config ... is glib-1 installed ???" exit 1 fi glib_cflags=`glib-config --cflags glib` glib_libs=`glib-config --libs glib` CFLAGS="$CFLAGS $glib_cflags" LIBS="$LIBS $glib_libs" else echo "error: must be yes or no: --enable-glib2:[$enable_glib2]" exit 1 fi AC_SUBST(glib_libs) AC_SUBST(glib_cflags) # ALLOW ASSERTIONS TO BE TURNED ON OR OFF # THIS ALSO ACTIVATES DEBUGGING OR OPTIMISATION COMPILE OPTIONS AC_ARG_ENABLE(assert, [ --enable-assert Use code for assertion tests --disable-assert Do not use assertion code], [enable_assert="$enableval"],[enable_assert=no]) if test "$enable_assert" = yes; then echo "Turning ASSERTIONS on" CFLAGS="$CFLAGS -g" elif test "$enable_assert" = no; then CFLAGS="$CFLAGS -DG_DISABLE_ASSERT" echo "Turning assertions off" if test "$GCC" = "yes"; then # Not currently using -fomit-frame-pointer as clashes with -pg # CFLAGS="$CFLAGS -O3 -fomit-frame-pointer -finline-functions" CFLAGS="$CFLAGS -O3 -finline-functions" fi else echo "error: must be yes or no: --enable-assert:[$enable_assert]" exit 1 fi # ALLOW PARANOID CODE CHECKING TO BE TURNED ON OR OFF AC_ARG_ENABLE(paranoia, [ --enable-paranoia Use paranoid compilation options --disable-paranoia Do not use paranoid compilation options], [enable_paranoia="$enableval"],[enable_paranoia=no]) if test "$enable_paranoia" = yes; then echo "Turning PARANOID options on" if test "$GCC" = "yes"; then CFLAGS="$CFLAGS -Wall -Werror \ -Wstrict-prototypes -Wmissing-prototypes\ -Wmissing-declarations\ -D_POSIX_SOURCE -D_GNU_SOURCE" fi elif test "$enable_paranoia" = no; then echo "Turning paranoid options off" else echo "error: must be yes or no: --enable-paranoia:[$enable_paranoia]" exit 1 fi # ALLOW PROFILING WITH GPROF AC_ARG_ENABLE(gprof, [ --enable-gprof Generate executables suitable for gprof profiling --disable-gprof Do not create executables for gprof profiling], [enable_gprof="$enableval"],[enable_gprof=no]) if test "$enable_gprof" = yes; then echo "Building executables for profiling with GPROF" echo "Statically linking for gprof" echo " Run: then run: gprof gmon.out" CFLAGS="$CFLAGS -O0 -static -pg" LDFLAGS="$LDFLAGS -g -pg -nodefaultlibs" LIBS="$LIBS -lc_p -lgcc -lm" elif test "$enable_gprof" = no; then echo "Not using options for profiling with gprof" else echo "error: must be yes or no: --enable-gprof:[$enable_gprof]" exit 1 fi # ALLOW COVERAGE TESTING WITH GCOV AC_ARG_ENABLE(gcov, [ --enable-gcov Generate executables suitable for gcov analysis --disable-gcov Do not create executables for gcov analysis], [enable_gcof="$enableval"],[enable_gcov=no]) if test "$enable_gcov" = yes; then echo "Building executables for analysis with GCOV" echo " Run then run: gcov .c" echo " then view: .c.gcov" CFLAGS="$CFLAGS -fprofile-arcs -ftest-coverage" elif test "$enable_gcov" = no; then echo "Not using options for analysis with gcov" else echo "error: must be yes or no: --enable-gcov:[$enable_gcov]" exit 1 fi # FORCE LARGEFILE SUPPORT FOR 32 BIT SYSTEMS AC_ARG_ENABLE(largefile, [ --enable-largefile Enable largefile support on 32 bit systems --disable-largefile Do not enable largefile support on 32 bit systems], [enable_largefile="$enableval"],[enable_largefile=yes]) if test "$enable_largefile" = yes; then echo "Adding support for LARGE FILES (>2Gb) for a 32 bit system" CFLAGS="$CFLAGS -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64" elif test "$enable_largefile" = no; then echo "Not forcing largefile support for 32 bit systems" else echo "error: must be yes or no:" echo " --enable-largefile:[$enable_largefile]" exit 1 fi # ALLOW USE OF COMPILED MODELS AC_ARG_ENABLE(compiledmodels, [ --enable-compiledmodels Use compiled C4 models --disable-compiledmodels Do not use compiled C4 models], [enable_compiledmodels="$enableval"],[enable_compiledmodels=yes]) if test "$enable_compiledmodels" = yes; then echo "Using COMPILED C4 MODELS" codegen_extra_ldadd="viterbi.o scheduler.o c4_model_archive.a" codegen_extra_sources="c4_model_archive.h" elif test "$enable_compiledmodels" = no; then echo "Not using compiled C4 models" # model_extra_programs="bootstrapper" codegen_extra_sources="" codegen_extra_ldadd="`pwd`/src/c4/viterbi.o \ `pwd`/src/sdp/scheduler.o" else echo "error: must be yes or no:" echo " --enable-compiledmodels:[$enable_compiledmodels]" exit 1 fi # AC_SUBST(model_extra_programs) AC_SUBST(codegen_extra_sources) AC_SUBST(codegen_extra_ldadd) # ALLOW INSTALLATION OF UTILITIES AC_ARG_ENABLE(utilities, [ --enable-utilities Install all utilities --disable-utilities Do not install utilities], [enable_utilities="$enableval"],[enable_utilities=yes]) if test "$enable_utilities" = yes; then echo "Installing UTILITIES" installed_util_list=" \ fastaclean fastaclip fastachecksum fastacomposition \ fastadiff fastaexplode fastafetch fastahardmask \ fastaindex fastalength fastanrdb fastaoverlap \ fastareformat fastaremove fastarevcomp fastasoftmask \ fastasort fastasplit fastasubseq fastatranslate \ fastavalidcds fasta2esd fastaannotatecdna esd2esi" elif test "$enable_utilities" = no; then echo "Not installing utilities" installed_util_list="" else echo "error: must be yes or no:" echo " --enable-utilities:[$enable_utilities]" exit 1 fi AC_SUBST(installed_util_list) # ALLOW DISABLING OF PTHREADS AC_ARG_ENABLE(pthreads, [ --enable-pthreads Use pthreads for exonerate-server --disable-pthreads Do not use pthreads for exonerate-server], [enable_pthreads="$enableval"],[enable_pthreads=yes]) if test "$enable_pthreads" = yes; then echo "Using PTHREADS" CFLAGS="$CFLAGS -DUSE_PTHREADS" # for g_thread_init() g_thread_init_ldflags=`pkg-config --libs gthread-2.0` LDFLAGS="$LDFLAGS $g_thread_init_ldflags -lpthread" elif test "$enable_pthreads" = no; then echo "Not using pthreads" else echo "error: must be yes or no:" echo " --enable-pthreads:[$enable_pthreads]" exit 1 fi AC_SUBST(use_pthreads) echo "----------------------------------------------------------------" AC_OUTPUT(Makefile \ doc/Makefile \ doc/man/Makefile \ doc/man/man1/Makefile \ src/Makefile \ src/general/Makefile \ src/sequence/Makefile \ src/database/Makefile \ src/struct/Makefile \ src/comparison/Makefile \ src/c4/Makefile \ src/bsdp/Makefile \ src/sdp/Makefile \ src/hub/Makefile \ src/model/Makefile \ src/program/Makefile \ src/util/Makefile \ test/Makefile \ test/util/Makefile \ test/ipcress/Makefile \ test/exonerate/Makefile \ test/data/Makefile \ test/data/cdna/Makefile \ test/data/protein/Makefile ) exonerate-2.4.0/install-sh0000744000175000017500000001273610371635214012445 00000000000000#!/bin/sh # # install - install a program, script, or datafile # This comes from X11R5 (mit/util/scripts/install.sh). # # Copyright 1991 by the Massachusetts Institute of Technology # # Permission to use, copy, modify, distribute, and sell this software and its # documentation for any purpose is hereby granted without fee, provided that # the above copyright notice appear in all copies and that both that # copyright notice and this permission notice appear in supporting # documentation, and that the name of M.I.T. not be used in advertising or # publicity pertaining to distribution of the software without specific, # written prior permission. M.I.T. makes no representations about the # suitability of this software for any purpose. It is provided "as is" # without express or implied warranty. # # Calling this script install-sh is preferred over install.sh, to prevent # `make' implicit rules from creating a file called install from it # when there is no Makefile. # # This script is compatible with the BSD install script, but was written # from scratch. It can only install one file at a time, a restriction # shared with many OS's install programs. # set DOITPROG to echo to test this script # Don't use :- since 4.3BSD and earlier shells don't like it. doit="${DOITPROG-}" # put in absolute paths if you don't have them in your path; or use env. vars. mvprog="${MVPROG-mv}" cpprog="${CPPROG-cp}" chmodprog="${CHMODPROG-chmod}" chownprog="${CHOWNPROG-chown}" chgrpprog="${CHGRPPROG-chgrp}" stripprog="${STRIPPROG-strip}" rmprog="${RMPROG-rm}" mkdirprog="${MKDIRPROG-mkdir}" transformbasename="" transform_arg="" instcmd="$mvprog" chmodcmd="$chmodprog 0755" chowncmd="" chgrpcmd="" stripcmd="" rmcmd="$rmprog -f" mvcmd="$mvprog" src="" dst="" dir_arg="" while [ x"$1" != x ]; do case $1 in -c) instcmd="$cpprog" shift continue;; -d) dir_arg=true shift continue;; -m) chmodcmd="$chmodprog $2" shift shift continue;; -o) chowncmd="$chownprog $2" shift shift continue;; -g) chgrpcmd="$chgrpprog $2" shift shift continue;; -s) stripcmd="$stripprog" shift continue;; -t=*) transformarg=`echo $1 | sed 's/-t=//'` shift continue;; -b=*) transformbasename=`echo $1 | sed 's/-b=//'` shift continue;; *) if [ x"$src" = x ] then src=$1 else # this colon is to work around a 386BSD /bin/sh bug : dst=$1 fi shift continue;; esac done if [ x"$src" = x ] then echo "install: no input file specified" exit 1 else true fi if [ x"$dir_arg" != x ]; then dst=$src src="" if [ -d $dst ]; then instcmd=: chmodcmd="" else instcmd=mkdir fi else # Waiting for this to be detected by the "$instcmd $src $dsttmp" command # might cause directories to be created, which would be especially bad # if $src (and thus $dsttmp) contains '*'. if [ -f $src -o -d $src ] then true else echo "install: $src does not exist" exit 1 fi if [ x"$dst" = x ] then echo "install: no destination specified" exit 1 else true fi # If destination is a directory, append the input filename; if your system # does not like double slashes in filenames, you may need to add some logic if [ -d $dst ] then dst="$dst"/`basename $src` else true fi fi ## this sed command emulates the dirname command dstdir=`echo $dst | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'` # Make sure that the destination directory exists. # this part is taken from Noah Friedman's mkinstalldirs script # Skip lots of stat calls in the usual case. if [ ! -d "$dstdir" ]; then defaultIFS=' ' IFS="${IFS-${defaultIFS}}" oIFS="${IFS}" # Some sh's can't handle IFS=/ for some reason. IFS='%' set - `echo ${dstdir} | sed -e 's@/@%@g' -e 's@^%@/@'` IFS="${oIFS}" pathcomp='' while [ $# -ne 0 ] ; do pathcomp="${pathcomp}${1}" shift if [ ! -d "${pathcomp}" ] ; then $mkdirprog "${pathcomp}" else true fi pathcomp="${pathcomp}/" done fi if [ x"$dir_arg" != x ] then $doit $instcmd $dst && if [ x"$chowncmd" != x ]; then $doit $chowncmd $dst; else true ; fi && if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd $dst; else true ; fi && if [ x"$stripcmd" != x ]; then $doit $stripcmd $dst; else true ; fi && if [ x"$chmodcmd" != x ]; then $doit $chmodcmd $dst; else true ; fi else # If we're going to rename the final executable, determine the name now. if [ x"$transformarg" = x ] then dstfile=`basename $dst` else dstfile=`basename $dst $transformbasename | sed $transformarg`$transformbasename fi # don't allow the sed command to completely eliminate the filename if [ x"$dstfile" = x ] then dstfile=`basename $dst` else true fi # Make a temp file name in the proper directory. dsttmp=$dstdir/#inst.$$# # Move or copy the file name to the temp name $doit $instcmd $src $dsttmp && trap "rm -f ${dsttmp}" 0 && # and set any options; do chmod last to preserve setuid bits # If any of these fail, we abort the whole thing. If we want to # ignore errors from any of these, just make sure not to ignore # errors from the above "$doit $instcmd $src $dsttmp" command. if [ x"$chowncmd" != x ]; then $doit $chowncmd $dsttmp; else true;fi && if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd $dsttmp; else true;fi && if [ x"$stripcmd" != x ]; then $doit $stripcmd $dsttmp; else true;fi && if [ x"$chmodcmd" != x ]; then $doit $chmodcmd $dsttmp; else true;fi && # Now rename the file to the real destination. $doit $rmcmd -f $dstdir/$dstfile && $doit $mvcmd $dsttmp $dstdir/$dstfile fi && exit 0 exonerate-2.4.0/missing0000744000175000017500000001421310371635214012030 00000000000000#! /bin/sh # Common stub for a few missing GNU programs while installing. # Copyright (C) 1996, 1997 Free Software Foundation, Inc. # Franc,ois Pinard , 1996. # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2, or (at your option) # any later version. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA # 02111-1307, USA. if test $# -eq 0; then echo 1>&2 "Try \`$0 --help' for more information" exit 1 fi case "$1" in -h|--h|--he|--hel|--help) echo "\ $0 [OPTION]... PROGRAM [ARGUMENT]... Handle \`PROGRAM [ARGUMENT]...' for when PROGRAM is missing, or return an error status if there is no known handling for PROGRAM. Options: -h, --help display this help and exit -v, --version output version information and exit Supported PROGRAM values: aclocal touch file \`aclocal.m4' autoconf touch file \`configure' autoheader touch file \`config.h.in' automake touch all \`Makefile.in' files bison create \`y.tab.[ch]', if possible, from existing .[ch] flex create \`lex.yy.c', if possible, from existing .c lex create \`lex.yy.c', if possible, from existing .c makeinfo touch the output file yacc create \`y.tab.[ch]', if possible, from existing .[ch]" ;; -v|--v|--ve|--ver|--vers|--versi|--versio|--version) echo "missing - GNU libit 0.0" ;; -*) echo 1>&2 "$0: Unknown \`$1' option" echo 1>&2 "Try \`$0 --help' for more information" exit 1 ;; aclocal) echo 1>&2 "\ WARNING: \`$1' is missing on your system. You should only need it if you modified \`acinclude.m4' or \`configure.in'. You might want to install the \`Automake' and \`Perl' packages. Grab them from any GNU archive site." touch aclocal.m4 ;; autoconf) echo 1>&2 "\ WARNING: \`$1' is missing on your system. You should only need it if you modified \`configure.in'. You might want to install the \`Autoconf' and \`GNU m4' packages. Grab them from any GNU archive site." touch configure ;; autoheader) echo 1>&2 "\ WARNING: \`$1' is missing on your system. You should only need it if you modified \`acconfig.h' or \`configure.in'. You might want to install the \`Autoconf' and \`GNU m4' packages. Grab them from any GNU archive site." files=`sed -n 's/^[ ]*A[CM]_CONFIG_HEADER(\([^)]*\)).*/\1/p' configure.in` test -z "$files" && files="config.h" touch_files= for f in $files; do case "$f" in *:*) touch_files="$touch_files "`echo "$f" | sed -e 's/^[^:]*://' -e 's/:.*//'`;; *) touch_files="$touch_files $f.in";; esac done touch $touch_files ;; automake) echo 1>&2 "\ WARNING: \`$1' is missing on your system. You should only need it if you modified \`Makefile.am', \`acinclude.m4' or \`configure.in'. You might want to install the \`Automake' and \`Perl' packages. Grab them from any GNU archive site." find . -type f -name Makefile.am -print | sed 's/\.am$/.in/' | while read f; do touch "$f"; done ;; bison|yacc) echo 1>&2 "\ WARNING: \`$1' is missing on your system. You should only need it if you modified a \`.y' file. You may need the \`Bison' package in order for those modifications to take effect. You can get \`Bison' from any GNU archive site." rm -f y.tab.c y.tab.h if [ $# -ne 1 ]; then eval LASTARG="\${$#}" case "$LASTARG" in *.y) SRCFILE=`echo "$LASTARG" | sed 's/y$/c/'` if [ -f "$SRCFILE" ]; then cp "$SRCFILE" y.tab.c fi SRCFILE=`echo "$LASTARG" | sed 's/y$/h/'` if [ -f "$SRCFILE" ]; then cp "$SRCFILE" y.tab.h fi ;; esac fi if [ ! -f y.tab.h ]; then echo >y.tab.h fi if [ ! -f y.tab.c ]; then echo 'main() { return 0; }' >y.tab.c fi ;; lex|flex) echo 1>&2 "\ WARNING: \`$1' is missing on your system. You should only need it if you modified a \`.l' file. You may need the \`Flex' package in order for those modifications to take effect. You can get \`Flex' from any GNU archive site." rm -f lex.yy.c if [ $# -ne 1 ]; then eval LASTARG="\${$#}" case "$LASTARG" in *.l) SRCFILE=`echo "$LASTARG" | sed 's/l$/c/'` if [ -f "$SRCFILE" ]; then cp "$SRCFILE" lex.yy.c fi ;; esac fi if [ ! -f lex.yy.c ]; then echo 'main() { return 0; }' >lex.yy.c fi ;; makeinfo) echo 1>&2 "\ WARNING: \`$1' is missing on your system. You should only need it if you modified a \`.texi' or \`.texinfo' file, or any other file indirectly affecting the aspect of the manual. The spurious call might also be the consequence of using a buggy \`make' (AIX, DU, IRIX). You might want to install the \`Texinfo' package or the \`GNU make' package. Grab either from any GNU archive site." file=`echo "$*" | sed -n 's/.*-o \([^ ]*\).*/\1/p'` if test -z "$file"; then file=`echo "$*" | sed 's/.* \([^ ]*\) *$/\1/'` file=`sed -n '/^@setfilename/ { s/.* \([^ ]*\) *$/\1/; p; q; }' $file` fi touch $file ;; *) echo 1>&2 "\ WARNING: \`$1' is needed, and you do not seem to have it handy on your system. You might have modified some files without having the proper tools for further handling them. Check the \`README' file, it often tells you about the needed prerequirements for installing this package. You may also peek at any GNU archive site, in case some other package would contain this missing \`$1' program." exit 1 ;; esac exit 0 exonerate-2.4.0/mkinstalldirs0000744000175000017500000000132110371635214013233 00000000000000#! /bin/sh # mkinstalldirs --- make directory hierarchy # Author: Noah Friedman # Created: 1993-05-16 # Public domain # $Id: mkinstalldirs,v 1.1 2002/07/23 14:57:00 gs2 Exp $ errstatus=0 for file do set fnord `echo ":$file" | sed -ne 's/^:\//#/;s/^://;s/\// /g;s/^#/\//;p'` shift pathcomp= for d do pathcomp="$pathcomp$d" case "$pathcomp" in -* ) pathcomp=./$pathcomp ;; esac if test ! -d "$pathcomp"; then echo "mkdir $pathcomp" mkdir "$pathcomp" || lasterr=$? if test ! -d "$pathcomp"; then errstatus=$lasterr fi fi pathcomp="$pathcomp/" done done exit $errstatus # mkinstalldirs ends here exonerate-2.4.0/doc/0000777000175000017500000000000011222703703011256 500000000000000exonerate-2.4.0/doc/Makefile.am0000644000175000017500000000006410371635214013233 00000000000000 SUBDIRS = man MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/doc/Makefile.in0000644000175000017500000002076611222703644013256 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = .. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ SUBDIRS = man MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best all: all-redirect .SUFFIXES: $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps doc/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status # This directory's subdirectories are mostly independent; you can cd # into them and run `make' without going through this Makefile. # To change the values of `make' variables: instead of editing Makefiles, # (1) if the variable is set in `config.status', edit `config.status' # (which will cause the Makefiles to be regenerated when you run `make'); # (2) otherwise, pass the desired values on the `make' command line. @SET_MAKE@ all-recursive install-data-recursive install-exec-recursive \ installdirs-recursive install-recursive uninstall-recursive \ check-recursive installcheck-recursive info-recursive dvi-recursive: @set fnord $(MAKEFLAGS); amf=$$2; \ dot_seen=no; \ target=`echo $@ | sed s/-recursive//`; \ list='$(SUBDIRS)'; for subdir in $$list; do \ echo "Making $$target in $$subdir"; \ if test "$$subdir" = "."; then \ dot_seen=yes; \ local_target="$$target-am"; \ else \ local_target="$$target"; \ fi; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \ || case "$$amf" in *=*) exit 1;; *k*) fail=yes;; *) exit 1;; esac; \ done; \ if test "$$dot_seen" = "no"; then \ $(MAKE) $(AM_MAKEFLAGS) "$$target-am" || exit 1; \ fi; test -z "$$fail" mostlyclean-recursive clean-recursive distclean-recursive \ maintainer-clean-recursive: @set fnord $(MAKEFLAGS); amf=$$2; \ dot_seen=no; \ rev=''; list='$(SUBDIRS)'; for subdir in $$list; do \ rev="$$subdir $$rev"; \ test "$$subdir" != "." || dot_seen=yes; \ done; \ test "$$dot_seen" = "no" && rev=". $$rev"; \ target=`echo $@ | sed s/-recursive//`; \ for subdir in $$rev; do \ echo "Making $$target in $$subdir"; \ if test "$$subdir" = "."; then \ local_target="$$target-am"; \ else \ local_target="$$target"; \ fi; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \ || case "$$amf" in *=*) exit 1;; *k*) fail=yes;; *) exit 1;; esac; \ done && test -z "$$fail" tags-recursive: list='$(SUBDIRS)'; for subdir in $$list; do \ test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) tags); \ done tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: tags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SUBDIRS)'; for subdir in $$list; do \ if test "$$subdir" = .; then :; else \ test -f $$subdir/TAGS && tags="$$tags -i $$here/$$subdir/TAGS"; \ fi; \ done; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = doc distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done for subdir in $(SUBDIRS); do \ if test "$$subdir" = .; then :; else \ test -d $(distdir)/$$subdir \ || mkdir $(distdir)/$$subdir \ || exit 1; \ chmod 777 $(distdir)/$$subdir; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) top_distdir=../$(top_distdir) distdir=../$(distdir)/$$subdir distdir) \ || exit 1; \ fi; \ done info-am: info: info-recursive dvi-am: dvi: dvi-recursive check-am: all-am check: check-recursive installcheck-am: installcheck: installcheck-recursive install-exec-am: install-exec: install-exec-recursive install-data-am: install-data: install-data-recursive install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-recursive uninstall-am: uninstall: uninstall-recursive all-am: Makefile all-redirect: all-recursive install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: installdirs-recursive installdirs-am: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-recursive clean-am: clean-tags clean-generic mostlyclean-am clean: clean-recursive distclean-am: distclean-tags distclean-generic clean-am distclean: distclean-recursive maintainer-clean-am: maintainer-clean-tags maintainer-clean-generic \ distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-recursive .PHONY: install-data-recursive uninstall-data-recursive \ install-exec-recursive uninstall-exec-recursive installdirs-recursive \ uninstalldirs-recursive all-recursive check-recursive \ installcheck-recursive info-recursive dvi-recursive \ mostlyclean-recursive distclean-recursive clean-recursive \ maintainer-clean-recursive tags tags-recursive mostlyclean-tags \ distclean-tags clean-tags maintainer-clean-tags distdir info-am info \ dvi-am dvi check check-am installcheck-am installcheck install-exec-am \ install-exec install-data-am install-data install-am install \ uninstall-am uninstall all-redirect all-am all installdirs-am \ installdirs mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/doc/man/0000777000175000017500000000000011222703703012031 500000000000000exonerate-2.4.0/doc/man/Makefile.am0000644000175000017500000000006510371635214014007 00000000000000 SUBDIRS = man1 MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/doc/man/Makefile.in0000644000175000017500000002100211222703644014011 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ SUBDIRS = man1 MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best all: all-redirect .SUFFIXES: $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps doc/man/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status # This directory's subdirectories are mostly independent; you can cd # into them and run `make' without going through this Makefile. # To change the values of `make' variables: instead of editing Makefiles, # (1) if the variable is set in `config.status', edit `config.status' # (which will cause the Makefiles to be regenerated when you run `make'); # (2) otherwise, pass the desired values on the `make' command line. @SET_MAKE@ all-recursive install-data-recursive install-exec-recursive \ installdirs-recursive install-recursive uninstall-recursive \ check-recursive installcheck-recursive info-recursive dvi-recursive: @set fnord $(MAKEFLAGS); amf=$$2; \ dot_seen=no; \ target=`echo $@ | sed s/-recursive//`; \ list='$(SUBDIRS)'; for subdir in $$list; do \ echo "Making $$target in $$subdir"; \ if test "$$subdir" = "."; then \ dot_seen=yes; \ local_target="$$target-am"; \ else \ local_target="$$target"; \ fi; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \ || case "$$amf" in *=*) exit 1;; *k*) fail=yes;; *) exit 1;; esac; \ done; \ if test "$$dot_seen" = "no"; then \ $(MAKE) $(AM_MAKEFLAGS) "$$target-am" || exit 1; \ fi; test -z "$$fail" mostlyclean-recursive clean-recursive distclean-recursive \ maintainer-clean-recursive: @set fnord $(MAKEFLAGS); amf=$$2; \ dot_seen=no; \ rev=''; list='$(SUBDIRS)'; for subdir in $$list; do \ rev="$$subdir $$rev"; \ test "$$subdir" != "." || dot_seen=yes; \ done; \ test "$$dot_seen" = "no" && rev=". $$rev"; \ target=`echo $@ | sed s/-recursive//`; \ for subdir in $$rev; do \ echo "Making $$target in $$subdir"; \ if test "$$subdir" = "."; then \ local_target="$$target-am"; \ else \ local_target="$$target"; \ fi; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \ || case "$$amf" in *=*) exit 1;; *k*) fail=yes;; *) exit 1;; esac; \ done && test -z "$$fail" tags-recursive: list='$(SUBDIRS)'; for subdir in $$list; do \ test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) tags); \ done tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: tags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SUBDIRS)'; for subdir in $$list; do \ if test "$$subdir" = .; then :; else \ test -f $$subdir/TAGS && tags="$$tags -i $$here/$$subdir/TAGS"; \ fi; \ done; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = doc/man distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done for subdir in $(SUBDIRS); do \ if test "$$subdir" = .; then :; else \ test -d $(distdir)/$$subdir \ || mkdir $(distdir)/$$subdir \ || exit 1; \ chmod 777 $(distdir)/$$subdir; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) top_distdir=../$(top_distdir) distdir=../$(distdir)/$$subdir distdir) \ || exit 1; \ fi; \ done info-am: info: info-recursive dvi-am: dvi: dvi-recursive check-am: all-am check: check-recursive installcheck-am: installcheck: installcheck-recursive install-exec-am: install-exec: install-exec-recursive install-data-am: install-data: install-data-recursive install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-recursive uninstall-am: uninstall: uninstall-recursive all-am: Makefile all-redirect: all-recursive install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: installdirs-recursive installdirs-am: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-recursive clean-am: clean-tags clean-generic mostlyclean-am clean: clean-recursive distclean-am: distclean-tags distclean-generic clean-am distclean: distclean-recursive maintainer-clean-am: maintainer-clean-tags maintainer-clean-generic \ distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-recursive .PHONY: install-data-recursive uninstall-data-recursive \ install-exec-recursive uninstall-exec-recursive installdirs-recursive \ uninstalldirs-recursive all-recursive check-recursive \ installcheck-recursive info-recursive dvi-recursive \ mostlyclean-recursive distclean-recursive clean-recursive \ maintainer-clean-recursive tags tags-recursive mostlyclean-tags \ distclean-tags clean-tags maintainer-clean-tags distdir info-am info \ dvi-am dvi check check-am installcheck-am installcheck install-exec-am \ install-exec install-data-am install-data install-am install \ uninstall-am uninstall all-redirect all-am all installdirs-am \ installdirs mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/doc/man/man1/0000777000175000017500000000000011222703703012665 500000000000000exonerate-2.4.0/doc/man/man1/Makefile.am0000644000175000017500000000020010754046546014643 00000000000000 man_MANS = exonerate.1 ipcress.1 exonerate-server.1 fastautils.1 EXTRA_DIST = $(man_MANS) MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/doc/man/man1/Makefile.in0000644000175000017500000001402611222703645014656 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ man_MANS = exonerate.1 ipcress.1 exonerate-server.1 fastautils.1 EXTRA_DIST = $(man_MANS) MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = man1dir = $(mandir)/man1 MANS = $(man_MANS) NROFF = nroff DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best all: all-redirect .SUFFIXES: $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps doc/man/man1/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status install-man1: $(mkinstalldirs) $(DESTDIR)$(man1dir) @list='$(man1_MANS)'; \ l2='$(man_MANS)'; for i in $$l2; do \ case "$$i" in \ *.1*) list="$$list $$i" ;; \ esac; \ done; \ for i in $$list; do \ if test -f $(srcdir)/$$i; then file=$(srcdir)/$$i; \ else file=$$i; fi; \ ext=`echo $$i | sed -e 's/^.*\\.//'`; \ inst=`echo $$i | sed -e 's/\\.[0-9a-z]*$$//'`; \ inst=`echo $$inst | sed '$(transform)'`.$$ext; \ echo " $(INSTALL_DATA) $$file $(DESTDIR)$(man1dir)/$$inst"; \ $(INSTALL_DATA) $$file $(DESTDIR)$(man1dir)/$$inst; \ done uninstall-man1: @list='$(man1_MANS)'; \ l2='$(man_MANS)'; for i in $$l2; do \ case "$$i" in \ *.1*) list="$$list $$i" ;; \ esac; \ done; \ for i in $$list; do \ ext=`echo $$i | sed -e 's/^.*\\.//'`; \ inst=`echo $$i | sed -e 's/\\.[0-9a-z]*$$//'`; \ inst=`echo $$inst | sed '$(transform)'`.$$ext; \ echo " rm -f $(DESTDIR)$(man1dir)/$$inst"; \ rm -f $(DESTDIR)$(man1dir)/$$inst; \ done install-man: $(MANS) @$(NORMAL_INSTALL) $(MAKE) $(AM_MAKEFLAGS) install-man1 uninstall-man: @$(NORMAL_UNINSTALL) $(MAKE) $(AM_MAKEFLAGS) uninstall-man1 tags: TAGS TAGS: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = doc/man/man1 distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-exec: install-exec-am install-data-am: install-man install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall-man uninstall: uninstall-am all-am: Makefile $(MANS) all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: $(mkinstalldirs) $(DESTDIR)$(mandir)/man1 mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-generic mostlyclean-am clean: clean-am distclean-am: distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: install-man1 uninstall-man1 install-man uninstall-man tags \ distdir info-am info dvi-am dvi check check-am installcheck-am \ installcheck install-exec-am install-exec install-data-am install-data \ install-am install uninstall-am uninstall all-redirect all-am all \ installdirs mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/doc/man/man1/exonerate.10000644000175000017500000011145511222677036014675 00000000000000.\" Exonerate man Page .TH exonerate 1 "November 2002" exonerate "sequence comparison tool" .SH NAME .\" exonerate \- a generic tool for sequence comparison .\" .SH SYNOPSIS .B exonerate [ options ] .\" .SH DESCRIPTION .BR exonerate is a general tool for sequence comparison. It uses the .B C4 dynamic programming library. It is designed to be both general and fast. It can produce either gapped or ungapped alignments, according to a variety of different alignment models. The C4 library allows sequence alignment using a reduced space full dynamic programming implementation, but also allows automated generation of heuristics from the alignment models, using bounded sparse dynamic programming, so that these alignments may also be rapidly generated. Alignments generated using these heuristics will represent a valid path through the alignment model, yet (unlike the exhaustive alignments), the results are not guaranteed to be optimal. .\" .RE .SH CONVENTIONS .T A number of conventions (and idiosyncracies) are used within exonerate. An understanding of them facilitates interpretation of the output. .\" .TP .B Coordinates An in-between coordinate system is used, where the positions are counted between the symbols, rather than on the symbols. This numbering scheme starts from zero. This numbering is shown below for the sequence "ACGT": .\" .SP .nf A C G T 0 1 2 3 4 .SP .fi .\" Hence the subsequence "CG" would have start=1, end=3, and length=2. .\" This coordinate system is used internally in exonerate, and for all the output formats produced with the exception of the "human readable" alignment display and the GFF output where convention and standards dictate otherwise. .\" .TP .B Reverse Complements When an alignment is reported on the reverse complement of a sequence, the coordinates are simply given on the reverse complement copy of the sequence. Hence positions on the sequences are never negative. .\" Generally, the forward strand is indicated by '+', the reverse strand by '-', and an unknown or not-applicable strand (as in the case of a protein sequence) is indicated by '.' .\" .TP .B Alignment Scores Currently, only the raw alignment scores are displayed. This score just is the sum of transistion scores used in the dynamic programming. For example, in the case of a Smith-Waterman alignment, this will be the sum of the substitution matrix scores and the gap penalties. .\" .RE .SH GENERAL OPTIONS .T Most arguments have short and long forms. The long forms are more likely to be stable over time, and hence should be used in scripts which call exonerate. .\" .TP .B "\-h | \--shorthelp" Show help. This will display a concise summary of the available options, defaults and values currently set. .\" .TP .B "--help" This shows all the help options including the defaults, the value currently set, and the environment variable which may be used to set each parameter. There will be an indication of which options are mandatory. Mandatory options have no default, and must have a value supplied for exonerate to run. If mandatory options are used in order, their flags may be skipped from the command line (see examples below). Unlike this man page, the information from this option will always be up to date with the latest version of the program. .\" .TP .B "\-v | \--version" Display the version number. Also displays other information such as the build date and glib version used. .\" .RE .SH SEQUENCE INPUT OPTIONS Pairwise comparisons will be performed between all query sequences and all target sequences. .\" Generally, for the best performance, shorter sequences (eg. ESTs, shotgun reads, proteins) should be used as the query sequences, and longer sequences (eg. genomic sequences) should be used as the target sequences. .TP .B "-q | \--query " Specify the query sequences required. These must be in a FASTA format file. Single or muiltiple query sequences may be supplied. Additionally multiple copies of the fasta file may be supplied following a --query flag, or by using with multiple --query flags. .TP .B "-t | \--target" Specify the target sequences required. Also, must be in a FASTA format file. As with the query sequences, single or multiple target sequences and files may be supplied. The target filename may by replace by a server name and port number in the form of .B hostname:port when using .B exonerate-server. See the man page for .B exonerate-server for more information on running exonerate in client:server mode. .B NEW(v2.4.0): multiple servers may now be used. These will be queried in parallel if you have set the .B --cores option. .B NEW(v2.4.0): If an input file is not a FASTA format file, it is assumed to contain a list of other fasta files, directories or servers (one per line). .\" .TP .B "-Q | \--querytype" Specify the query type to use. If this is not supplied, the query type is assumed to be DNA when the first sequence in the file contains more than 85% [ACGTN] bases. Otherwise, it is assumed to be peptide. This option forces the query type as some nucleotide and peptide sequences can fall either side of this threshold. .\" .TP .B "-T | \--targettype" Specify the target type to use. The same as .B --querytype (above), except that it applies to the target. .\" Specifying the sequence type will avoid the overhead of having to read the first sequence in the database twice (which may be significant with chromosome-sized sequences) .\" .TP .B "\--querychunkid" .TP .B "\--querychunktotal" .TP .B "\--targetchunkid" .TP .B "\--targetchunktotal" These options to facilitate running exonerate on compute farms, and avoid having to split up sequence databases into small chunks to run on different nodes. If, for example, you wished to split the target database into three parts, you would run three exonerate jobs on different nodes including the options: .PP .RS .PD 0 .TP --targetchunkid 1 --targetchunktotal 3 .TP --targetchunkid 2 --targetchunktotal 3 .TP --targetchunkid 3 --targetchunktotal 3 .PP NB. The granularity offered by this option only goes down to a single sequence, so when there are more chunks than sequences in the database, some processes will do nothing. .RE .\" .TP .B "\-V | \--verbose" Be verbose - show information about what is going on during the analysis. The default is 1 (little information), the higher the number given, the more information is printed. To silence all the default output from exonerate, use --verbose 0 --showalignment no --showvulgar no .\" .SH ANALYSIS OPTIONS .\" .TP .B "\-E | \--exhaustive" Specify whether or not exhaustive alignment should be used. By default, this is FALSE, and alignment heuristics will be used. If it is set to TRUE, an exhaustive alignment will be calculated. This requires quadratic time, and will be much, much slower, but will provide the optimal result for the given model. .\" .TP .B "\-B | \--bigseq" Perform alignment of large (multi-megabase) sequences. This is very memory efficient and fast when both sequences are chromosome-sized, but currently does not currently permit the use of a word neighbourhood (ie. exactly matching seeds only). .\" .TP .B "\--revcomp" Include comparison of the reverse complement of the query and target where possible. By default, this option is enabled, but when you know the gene is definitely on the forward strand of the query and target, this option can halve the time taken to compute alignments. .\" .TP .B "\--forcescan" Force the FSM to scan the query sequence rather than the target. This option is useful, for example, if you have a single piece of genomic sequence and you with to compare it to the whole of dbEST. By scanning the database, rather than the query, the analysis will be completed much more quickly, as the overheads of multiple query FSM construction, multiple target reading and splice site predictions will be removed. By default, exonerate will guess the optimal strategy based on database sequence sizes. .\" .TP .B "\--saturatethreshold" When set to zero, this option does nothing. Otherwise, once more than this number of words (in addition to the expected number of words by chance) have matched a position on the query, the position on the query will be 'numbed' (ignore further matches) for the current pairwise comparison. .\" .TP .B "\--customserver" When using exonerate in client:server mode with a non-standard server, this command allows you to send a custom command to the server. This command is sent by the client (exonerate) before any other commands, and is provided as a way of passing parameters or other commands specific to the custom server. See the .B exonerate-server man page for more information on running exonerate in client:server mode. .\" .TP .B "\--cores" The number of cores/CPUs/threads that should be used. On a multi-core or multi-CPU machine, increasing this ammount allows alignment computations to run in parallel on separate CPUs/cores. NB. Generally, it is better to parallelise the analysis by splitting it up into separate jobs, but this option may prove useful for problems such as interactive single-gene queries. .\" .SH FASTA DATABASE OPTIONS .TP .B "\--fastasuffix" If any of the inputs given with .B --query or .B --target are directories, then exonerate will recursively descent these directories, reading all files ending with this suffix as fasta format input. .\" .SH GAPPED ALIGNMENT OPTIONS .TP .B "\-m | \--model" Specify the alignment model to use. The models currently supported are: .\" .PP .RS .PD 0 .\" .TP .B ungapped The simplest type of model, used by default. An appropriate model with be selected automatically for the type of input sequences provided. .\" .TP .B ungapped:trans This ungapped model includes translation of all frames of both the query and target sequences. This is similar to an ungapped tblastx type search. .\" .TP .B affine:global This performs gapped global alignment, similar to the Needleman-Wunsch algorithm, except with affine gaps. Global alignment requires that both the sequences in their entirety are included in the alignment. .\" .TP .B affine:bestfit This performs a best fit or best location alignment of the query onto the target sequence. The entire query sequence will be included in the alignment, but only the best location for its alignment on the target sequence. .\" .TP .B affine:local This is local alignment with affine gaps, similar to the Smith-Waterman-Gotoh algorithm. A general-purpose alignment algorithm. As this is local alignment, any subsequence of the query and target sequence may appear in the alignment. .\" .TP .B affine:overlap This type of alignment finds the best overlap between the query and target. The overlap alignment must include the start of the query or target and the end of the query or the target sequence, to align sequences which overlap at the ends, or in the mid-section of a longer sequence.. This is the type of alignment frequently used in assembly algorithms. .\" .TP .B est2genome This model is similar to the affine:local model, but it also includes intron modelling on the target sequence to allow alignment of spliced to unspliced coding sequences for both forward and reversed genes. This is similar to the alignment models used in programs such as EST_GENOME and sim4. .\" .TP .B ner NERs are non-equivalenced regions - large regions in both the query and target which are not aligned. This model can be used for protein alignments where strongly conserved helix regions will be aligned, but weakly conserved loop regions are not. Similarly, this model could be used to look for co-linearly conserved regions in comparison of genomic sequences. .\" .TP .B protein2dna This model compares a protein sequence to a DNA sequence, incorporating all the appropriate gaps and frameshifts. .\" .TP .B protein2dna:bestfit This is a bestfit version of the protein2dna model, with which the entire protein is included in the alignment. It is currently only available when using exhaustive alignment. .\" .TP .B protein2genome This model allows alignment of a protein sequence to genomic DNA. This is similar to the protein2dna model, with the addition of modelling of introns and intron phases. This model is simliar to those used by genewise. .\" .TP .B protein2genome:bestfit This is a bestfit version of the protein2genome model, with which the entire protein is included in the alignment. It is currently only available when using exhaustive alignment. .\" .TP .B coding2coding This model is similar to the ungapped:trans model, except that gaps and frameshifts are allowed. It is similar to a gapped tblastx search. .\" .TP .B coding2genome This is similar to the est2genome model, except that the query sequence is translated during comparison, allowing a more sensitive comparison. .\" .TP .B cdna2genome This combines properties of the est2genome and coding2genome models, to allow modeling of an whole cDNA where a central coding region can be flanked by non-coding UTRs. When the CDS start and end is known it may be specified using the --annotation option (see below) to permit only the correct coding region to appear in the alignemnt. .\" .TP .B genome2genome This model is similar to the coding2coding model, except introns are modelled on both sequences. (not working well yet) .\" .PD .RE .\" .TP .\" The short names u, u:t, a:g, a:b, a:l, a:o, e2g, ner, p2d, p2d:b p2g, p2g:b, c2c, c2g cd2g and g2g can also be used for specifying models. .TP .B "\-s | \--score" This is the overall score threshold. Alignments will not be reported below this threshold. For heuristic alignments, the higher this threshold, the less time the analysis will take. .\" .TP .B "\--percent" Report only alignments scoring at least this percentage of the maximal score for each query. eg. use .B --percent 90 to report alignments with 90% of the maximal score optainable for that query. This option is useful not only because it reduces the spurious matches in the output, but because it generates query-specific thresholds (unlike .B --score ) for a set of queries of differing lengths, and will also speed up the search considerably. .B NB. with this option, it is possible to have a cDNA match its corresponding gene exactly, yet still score less than 100%, due to the addition of the intron penalty scores, hence this option must be used with caution. .\" .TP .B "\--showalignment" Show the alignments in an human readable form. .\" .TP .B "\--showsugar" Display "sugar" output for ungapped alignments. Sugar is Simple UnGapped Alignment Report, which displays ungapped alignments one-per-line. The sugar line starts with the string "sugar:" for easy extraction from the output, and is followed by the the following 9 fields in the order below: .\" .PP .RS .PD 0 .\" .TP 16 .B query_id Query identifier .\" .TP 16 .B query_start Query position at alignment start .\" .TP 16 .B query_end Query position alignment end .\" .TP 16 .B query_strand Strand of query matched .\" .TP 16 .B target_id | .\" .TP 16 .B target_start | the same 4 fields .\" .TP 16 .B target_end | for the target sequence .\" .TP 16 .B target_strand | .\" .TP 16 .B score The raw alignment score .\" .PD .RE .\" .TP .B "\--showcigar" Show the alignments in "cigar" format. Cigar is a Compact Idiosyncratic Gapped Alignment Report, which displays gapped alignments one-per-line. The format starts with the same 9 fields as sugar output (see above), and is followed by a series of pairs where operation is one of match, insert or delete, and the length describes the number of times this operation is repeated. .\" .TP .B "\--showvulgar" Shows the alignments in "vulgar" format. Vulgar is Verbose Useful Labelled Gapped Alignment Report, This format also starts with the same 9 fields as sugar output (see above), and is followed by a series of triplets. The label may be one of the following: .\" .PP .RS .PD 0 .\" .TP .B M Match .\" .TP .B C Codon .\" .TP .B G Gap .\" .TP .B N Non-equivalenced region .\" .TP .B 5 5' splice site .\" .TP .B 3 3' splice site .\" .TP .B I Intron .\" .TP .B S Split codon .\" .TP .B F Frameshift .\" .PD .RE .LP .\" .TP .B "\--showquerygff" Report GFF output for features on the query sequence. See http://www.sanger.ac.uk/Software/formats/GFF for more information. .\" .TP .B "\--showtargetgff" Report GFF output for features on the target sequence. .\" .TP .B "\--ryo" Roll-your-own output format. This allows specification of a printf-esque format line which is used to specify which information to include in the output, and how it is to be shown. The format field may contain the following fields: .PP .RS .PD 0 .TP .B %[qt][idlsSt] For either {query,target}, report the {id,definition,length,sequence,Strand,type} Sequences are reported in a fasta-format like block (no headers). .TP .B %[qt]a[bels] For either {query,target} region which occurs .B in the alignment, report the {begin,end,length,sequence} .TP .B %[qt]c[bels] For either {query,target} region which occurs in the .B coding sequence in the alignment, report the {begin,end,length,sequence} .TP .B %s The raw score .TP .B %r The rank (in results from a bestn search) .TP .B %m Model name .TP .B %e[tism] Equivalenced {total,id,similarity,mismatches} (ie. %em == (%et - %ei)) .TP .B %p[isS] Percent {id,similarity,Self} over the equivalenced portions of the alignment. (ie. %pi == 100*(%ei / %et)). Percent Self is the score over the equivalenced portions of the alignment as a percentage of the self comparison score of the query sequence. .TP .B %g Gene orientation ('+' = forward, '-' = reverse, '.' = unknown) .TP .B %S Sugar block (the 9 fields used in sugar output (see above) .TP .B %C Cigar block (the fields of a cigar line after the sugar portion) .TP .B %V Vulgar block (the fields of a vulgar line after the sugar portion) .TP .B %% Expands to a percentage sign (%) .TP .B \en Newline .TP .B \et Tab .TP .B \e\e Expands to a backslash (\e) .TP .B \e{ Open curly brace .TP .B \e} Close curly brace .TP .B { Begin per-transition output section .TP .B } End per-transition output section .TP .B %P[qt][sabe] Per-transition output for {query,target} {sequence,advance,begin,end} .TP .B %P[nsl] Per-transition output for {name,score,label} .PD .RE .PP This option is very useful and flexible. For example, to report all the sections of query sequences which feature in alignments in fasta format, use: .PP --ryo \fB">%qi %qd\en%qas\en"\fP .P To output all the symbols and scores in an alignment, try something like: .PP --ryo \fB"%V{%Pqs %Pts %Ps\en}"\fP .LP .\" .TP .B "\-n | \--bestn" Report the best N results for each query. (Only results scoring better than the score threshold will be reported). The option reduces the amount of output generated, and also allows exonerate to speed up the search. .\" .TP .B "\-S | \--subopt" This option allows for the reporting of (Waterman-Eggert style) suboptimal alignments. (It is on by default.) All suboptimal (ie. non-intersecting) alignments will be reported for each pair of sequences scoring at least the threshold provided by .B --score. When this option is used with exhaustive alignments, several full quadratic time passes will be required, so the running time will be considerably increased. .\" .TP .B "\-g | \--gappedextension" Causes a gapped extension stage to be performed ie. dynamic programming is applied in arbitrarily shaped and dynamically sized regions surrounding HSP seeds. The extension threshold is controlled by the --extensionthreshold option. Although sometimes slower than BSDP, gapped extension improves sensitivity with weak, gap-rich alignments such as during cross-species comparison. .B NB. This option is now the default. Set it to false to reverse to the old BSDP type alignments. This option may be slower than BSDP for some large scale analyses with simple alignment models. .\" .TP .B "\--refine" Force exonerate to refine alignments generated by heuristics using dynamic programming over larger regions. This takes more time, but improves the quality of the final alignments. The strategies available for refinement are: .\" .PP .RS .PD 0 .\" .TP .B none The default - no refinement is used. .\" .TP .B full An exhaustive alignment is calculated from the pair of sequences in their entirety. .\" .TP .B region DP is applied just to the region of the sequences covered by the heuristic alignment. .\" .PD .RE .LP .\" .TP .B "\--refineboundary" Specify an extra boundary to be included in the region subject to alignment during refinement by region. .\" .RE .SH VITERBI ALGORITM OPTIONS .TP .B "\-D | \--dpmemory" The exhaustive alignment traceback routines use a Hughey-style reduced memory technique. This option specifies how much memory will be used for this. Generally, the more memory is permitted here, the faster the alignments will be produced. .\" .SH CODE GENERATION OPTIONS .TP .B "\-C | \--compiled" This option allows disabling of generated code for dynamic programming. It is mainly used during development of exonerate. When set to FALSE, an "interpreted" version of the dynamic programming implementation is used, which is much slower. .\" .SH HEURISTIC OPTIONS .\" .PD 0 .P .B "\--terminalrangeint" .P .B "\--terminalrangeext" .P .B "\--joinrangeint" .P .B "\--joinrangeext" .P .B "\--spanrangeint" .P .TP .B "\--spanrangeext" These options are used to specify the size of the sub-alignment regions to which DP is applied around the ends of the HSPs. This can be at the HSP ends (terminal range), between HSPs (join range), or between HSPs which may be connected by a large region such as an intron or non-equivalenced region (span range). These ranges can be specified for a number of matches back onto the HSP (internal range) or out from the HSP (external range). .PD .SH SEEDED DYNAMIC PROGRAMMING OPTIONS .TP .B "\-x | \--extensionthreshold" This is the amount by which the score will be allowed to degrade during SDP. This is the equivalent of the hspdropoff penalties, except it is applied during dynamic programming, not HSP extension. Decreasing this parameter will increase the speed of the SDP, and increasing it will increase the sensitivity. .\" .TP .B "\--singlepass " By default the suboptimal SDP alignments are reported by a singlepass algorithm, but may miss some suboptimal alignments that are close together. This option can be used to force the use of a multipass suboptimal alignment algorithm for SDP, resulting in higher quality suboptimal alignments. .\" .SH BSDP OPTIONS .\" .TP .B "\--joinfilter" (experimental) Only allow consider this number of SARs for joining HSPs together. The SARs with the highest potential for appearing in a high-scoring alignment are considered. This option useful for limiting time and memory usage when searching unmasked data with repetitive sequences, but should not be set too low, as valid matches may be ignored. Something like .B --joinfilter 32 seems to work well. .\" .SH SEQUENCE OPTIONS .\" .TP .B "\--annotation" Specify basic sequence annotation information. This is most useful with the cdna2genome model, but will work with other models. The annotation file contains four fields per line: .PP .RS .PD 0 .TP .TP Here is a simple example of such a file for 4 cDNAs: .PP .TP dhh.human.cdna + 308 1191 .P dhh.mouse.cdna + 250 1191 .P csn7a.human.cdna + 178 828 .P csn7a.mouse.cdna + 126 828 .PP .P These annotation lines will also work when only the first two fields are used. This can be used when specifying which strand of a specific sequence should be included in a comparison. .\" .RE .SH SYMBOL COMPARISON OPTIONS .TP .B "\--softmaskquery" Indicate that the query is softmasked. See description below for .B --softmasktarget .\" .TP .B "\--softmasktarget" Indicate that the target is softmasked. In a softmasked sequence file, instead of masking regions by Ns or Xs they are masked by putting those regions in lower case (and with unmasked regions in upper case). This option allows the masking to be ignored by some parts of the program, combining the speed of searching masked data with sensitivity of searching unmasked data. The utility .B fastasoftmask supplied which is supplied with exonerate can be used for producing softmasked sequence from conventionally masked sequence. .\" .TP .B "\-d | \--dnasubmat" Specify the the substitution matrix to be used for DNA comparison. This should be a path to a substitution matrix in same format as that which is used by blast. .\" .TP .B "\-p | \--proteinsubmat" Specify the the substitution matrix to be used for protein comparison. (Both DNA and protein substitution matrices are required for some types of analysis). .\" The use of the special names, .B nucleic, blosum62, pam250, edit or .B identity will cause built-in substitution matrices to be used. .\" .RE .SH ALIGNMENT SEEDING OPTIONS .TP .B "\-M | \--fsmmemory" Specify the amount of memory to use for the FSM in heuristic analyses. exonerate multiplexes the query to accelerate large-throughput database queries. This figure should always be less than the physical memory on the machine, but when searching large databases, generally, the more memory it is allowed to use, the faster it will go. .\" .TP .B "\--forcefsm" Force the use of more compact finite state machines for analyses involving big sequences and large word neighbourhoods. By default, exonerate will pick a sensible strategy, so this option will rarely need to be set. .\" .TP .B "\--wordjump" The jump between query words used to yield the word neighbourhood. If set to 1, every word is used, if set to 2, every other word is used, and if set to the wordlength, only non-overlapping words will be used. This option reduces the memory requirements when using very large query sequences, and makes the search run faster, but it also damages search sensitivity when high values are set. .\" .TP .B "\--wordambiguity" This option may be used to allow alignment seeds containing IUPAC ambiguity symbols. The limit is the maximum number of ambiguous words allowed at a single position. If this limit is reached then the position is not used for alignment seeding. Using this option may slow down a search. For large datasets, it is recommended to use .B esd2esi --wordambiguity instead, as then the speed overhead is only incurred during indexing, rather than during the database searching itself. NB. This option only works for IUPAC symbols in the target sequence. Query words containing IUPAC symbols are (currently) excluded from seeding. .\" .SH AFFINE MODEL OPTIONS .TP .B "\-o | \--gapopen" This is the gap open penalty. .\" .TP .B "\-e | \--gapextend" This is the gap extension penalty. .\" .TP .B "\--codongapopen" This is the codon gap open penalty. .\" .TP .B "\--codongapextend" This is the codon gap extension penalty. .\" .SH NER OPTIONS .TP .B "\--minner" Minimum NER length allowed. .\" .TP .B "\--maxner" Maximum NER length allowed. NB. this option only affects heuristic alignments. .\" .TP .B "\--neropen" Penalty for opening a non-equivalenced region. .\" .SH INTRON MODELLING OPTIONS .TP .B "\--minintron" Minimum intron length limit. NB. this option only affects heuristic alignments. This is not a hard limit - it only affects size of introns which are sought during heuristic alignment. .\" .TP .B "\--maxintron" Maximum intron length limit. See notes above for .B --minintron .\" .TP .B "\-i | \--intronpenalty" Penalty for introduction of an intron. .\" .SH FRAMESHIFT MODELLING OPTIONS .TP .B "\-f | \--frameshift" The penalty for the inclusion of a frameshift in an alignment. .\" .SH ALPHABET OPTIONS .TP .B "\--useaatla" Use three-letter abbreviations for AA names. ie. when displaying alignment "Met" is used instead of " M " .\" .SH TRANSLATION OPTIONS .TP .B "\--geneticcode" Specify an alternative genetic code. The default code (1) is the standard genetic code. Other genetic codes may be specified by in shorthand or longhand form. In shorthand form, a number between 1 and 23 is used to specify one of 17 built-in genetic code variants. These are genetic code variants taken from: .B http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi These are: .PP .RS .TP .B 1 The Standard Code .TP .B 2 The Vertebrate Mitochondrial Code .TP .B 3 The Yeast Mitochondrial Code .TP .B 4 The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code .TP .B 5 The Invertebrate Mitochondrial Code .TP .B 6 The Ciliate, Dasycladacean and Hexamita Nuclear Code .TP .B 9 The Echinoderm and Flatworm Mitochondrial Code .TP .B 10 The Euplotid Nuclear Code .TP .B 11 The Bacterial and Plant Plastid Code .TP .B 12 The Alternative Yeast Nuclear Code .TP .B 13 The Ascidian Mitochondrial Code .TP .B 14 The Alternative Flatworm Mitochondrial Code .TP .B 15 Blepharisma Nuclear Code .TP .B 16 Chlorophycean Mitochondrial Code .TP .B 21 Trematode Mitochondrial Code .TP .B 22 Scenedesmus obliquus mitochondrial Code .TP .B 23 Thraustochytrium Mitochondrial Code", .PP In longhand form, a genetic code variant may be provided as a 64 byte string in TCAG order, eg. the standard genetic code in this form would be: FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG .\" .SH HSP CREATION OPTIONS .TP .B "\--hspfilter" Use aggressive HSP filtering to speed up heuristic searches. The threshold specifies the number of HSPs centred about a point in the query which will be stored. Any lower scoring HSPs will be discarded. .\" This is an experimental option to handle speed problems caused by some sequences. A value of about 100 seems to work well. .\" .TP .B "\--useworddropoff" When this is TRUE, the score threshold for admitting words into the word neighbourhood is set to be the initial word score minus the word threshold (see below). This strategy is designed to prevent restricting the word SSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG .\" When this is FALSE, the word threshold is taken to be an absolute value. .\" .TP .B "\--seedrepeat" The seedrepeat parameter sets the number of seeds which must be found on the same diagonal or reading frame before HSP extension will occur. Increasing the value for .B --seedrepeat will speed up searches, and is usually a better option than using longer word lengths, particularly when using the .B exonerate-server where increasing word lengths requires recomputing the index, and greater increases memory requirements. .\" .TP .B "\-w \--dnawordlen" .TP .B "\-W \--proteinwordlen" .TP .B "\-W \--codonnwordlen" The word length used for DNA, protein or codon words. When performing DNA vs protein comparisons, a the DNA wordlength will always (automatically) be triple the protein wordlength. .\" .TP .B "\--dnahspdropoff" .TP .B "\--proteinhspdropoff" .TP .B "\--codonhspdropoff" The amount by which an HSP score will be allowed to degrade during HSP extension. Separate threshold can be set for dna or protein comparisons. .\" .TP .B "\--dnahspthreshold" .TP .B "\--proteinhspthreshold" .TP .B "\--codonhspthreshold" The HSP score thresholds. An HSP must score at least this much before it will be reported or be used in preparation of a heuristic alignment. .TP .B "\--dnawordlimit " .TP .B "\--proteinwordlimit " .TP .B "\--codonwordlimit " The threshold for admitting DNA or protein words into the word neighbourhood. The behaviour of this option is altered by the .B --useworddropoff option (see above). .\" .TP .B "\--geneseed" Exclude HSPs from gapped alignment computation which cannot feature in a alignment containing at least one HSP scoring at least this threshold. This option provides considerable speed up for gapped alignment computation, but may cause some very gap-rich alignments to be missed. It is useful when aligning similar sequences back onto genome quickly, eg. try --geneseed 250 .\" .TP .B "\--geneseedrepeat" The geneseedrepeat parameter is like the seedrepeat parameter, but is only applied when looking for the geneseed hsps. Using a larger value for .B --geneseedrepeat will speed up searches when the .B --geneseed parameter is also used. (experimental, implementation incomplete) .\" .SH ALIGNMENT OPTIONS .TP .B "\--alignmentwidth" Width of alignment display. The default is 80. .\" .TP .B "\--forwardcoordinates" By default, all coordinates are reported on the forward strand. Setting this option to false reverts to the old behaviour (pre-0.8.3) whereby alignments on the reverse complement of a sequence are reported using coordinates on the reverse complement. .\" .SH SUB-ALIGNMENT REGION OPTIONS .TP .B "\--quality" This option excludes HSPs from BSDP when their components outside of the SARs fall below this quality threshold. .\" .SH SPLICE SITE PREDICTION OPTIONS .TP .B "\--splice3" .TP .B "\--splice5" Provide a file containing a custom PSSM (position specific score matrix) for prediction of the intron splice sites. The file format for splice data is simple: lines beginning with \'#\' are comments, a line containing just the word \'splice\' denotes the position of the splice site, and the other lines show the observed relative frequencies of the bases flanking the splice sites in the chosen organism (in ACGT order). .B Example 5' splice data file: .SP .nf # start of example 5' splice data # A C G T 28 40 17 14 59 14 13 14 8 5 81 6 splice 0 0 100 0 0 0 0 100 54 2 42 2 74 8 11 8 5 6 85 4 16 18 21 45 # end of test 5' splice data .SP .fi .B Example 3' splice data file: .SP .nf # start of example 3' splice data # A C G T 10 31 14 44 8 36 14 43 6 34 12 48 6 34 8 52 9 37 9 45 9 38 10 44 8 44 9 40 9 41 8 41 6 44 6 45 6 40 6 48 23 28 26 23 2 79 1 18 100 0 0 0 0 0 100 0 splice 28 14 47 11 # end of example 3' splice data .SP .fi .TP .B "\--forcegtag" Only allow splice sites at gt....ag sites (or ct....ac sites when the gene is reversed) .\" With this restriction in place, the splice site prediction scores are still used and allow tie breaking when there is more than one possible splice site. .\" .SH STRATEGIES FOR SPEED Keep all data on local disks. Apply the highest acceptable score thresholds using a combination of --score, --percent and --bestn. Repeat mask and dust the genomic (target) sequence. (Softmask these sequences and use --softmasktarget). Increase the --fsmmemory option to allow more query multiplexing. Increase the value for --seedrepeat When using an alignment model containing introns, set --geneseed as high as possible. If you are compiling exonerate yourself, see the README file supplied with the source code for details of compile-time optimisations. .\" .SH STRATEGIES FOR SENSITIVITY Not documented yet. Increase the word neighbourhood. Decrease the HSP threshold. Increase the SAR ranges. Run exhaustively. .\" .\" .SH ENVIRONMENT Not documented yet. .\" .SH EXAMPLES .\" .B "exonerate cdna.fasta genomic.fasta" .RS This simplest way in which exonerate may be used. By default, an ungapped alignment model will be used. .RE .\" .B "exonerate --exhaustive y --model est2genome cdna.fasta genomic.masked.fasta" .RS Exhaustively align cdnas to genomic sequence. This will be much, much slower, but more accurate. This option causes exonerate to behave like EST_GENOME. .RE .\" .B "exonerate --exhaustive --model affine:local" .B "query.fasta target.fasta" .RS If the affine:local model is used with exhaustive alignment, you have the Smith-Waterman algorithm. .RE .\" .B "exonerate --exhaustive --model affine:global" .B "protein.fasta protein.fasta" .RS Switch to a global model, and you have Needleman-Wunsch. .RE .\" .B "exonerate --wordthreshold 1 --gapped no --showhsp yes protein.fasta genome.fasta" .RS Generate ungapped Protein:DNA alignments .RE .\" .B "exonerate --model coding2coding --score 1000 --bigseq yes --proteinhspthreshold 90 chr21.fa chr22.fa" .RS Perform quick-and-dirty translated pairwise alignment of two very large DNA sequences. .RE Many similar combinations should work. Try them out. .\" .RE .SH VERSION This documentation accompanies version 2.2.0 of the exonerate package. .\" .SH AUTHOR Guy St.C. Slater. . .L See the AUTHORS file accompanying the source code for a list of contributors. .SH AVAILABILITY This source code for the exonerate package is available under the terms of the GNU .I general public licence. Please see the file COPYING which was distrubuted with this package, or http://www.gnu.org/licenses/gpl.txt for details. This package has been developed as part of the ensembl project. Please see http://www.ensembl.org/ for more information. .SH "SEE ALSO" .BR exonerate-server (1), .BR ipcress (1), .BR blast (1L). exonerate-2.4.0/doc/man/man1/ipcress.10000644000175000017500000001371211076060142014337 00000000000000.\" iPCRess man Page .TH ipcress 1 "March 2003" ipcress "PCR simulation tool" .SH NAME .\" ipcress \- In-silico PCR experiment simulation system .SH SYNOPSIS .B ipcress [ options ] .SH DESCRIPTION .BR ipcress is the .IR I n-silico .IR PCR .IR E xperiment .IR S imulation .IR S ystem. This is a tool for simulation of PCR experiments. You supply a file containing primers and a set of sequences, and it predicts PCR products. .P .\" Ipcress is similar to the e-PCR program from the NCBI, but is much faster, and does not suffer from problems identifying matches when there are ambiguity symbols near primer ends. .P .\" If you supply many primers pairs together, ipcress will simulate the PCR experiments in parallel, allowing genome wide simulation of large numbers of experiments. It uses many libraries from the .B exonerate sequence comparison tool. .\" .RE .SH INPUT FORMAT .T The input for ipcress is a simple white-space delimited file describing one experiment per line. Each line contains the following 5 fields: .\" .PP .RS .PD 0 .\" .TP 18 .B id An identifier for this experiment .\" .TP 18 .B primer_A Sequence for the first primer .\" .TP 18 .B primer_B Sequence for the second primer .\" .TP 18 .B min_product_len Minimum product length to report .\" .TP 18 .B max_product_len Maximum product length to report .\" .PD .RE .P Here is an example line in this format: .SP .nf ID0001 CATGCATGCATGC CGATGCANGCATGCT 900 1100 .SP .fi .\" .RE .SH OUTPUT FORMAT .T The output format describes one PCR product per-line, and is prefixed by "ipcress:", followed by the following 11 fields: .\" .PP .RS .PD 0 .\" .TP 16 .B sequence_id The sequence identifier .\" .TP 16 .B experiment_id The PCR experiment id .\" .TP 16 .B product_length The PCR product length .\" .TP 16 .B primer_5 The 5' primer (either A or B) .\" .TP 16 .B pos_5 Position of the 5' primer .\" .TP 16 .B mismatch_5 Number of mismatches on 5' primer .\" .TP 16 .B primer_3 | .\" .TP 16 .B pos_3 | Same fields for the 3' primer .\" .TP 16 .B mismatch_3 | .\" .TP 16 .B description A description of the PCR product .\" .PD .RE .P .\" The description field is one of the following 4 strings: .\" .PP .RS .PD 0 .\" .TP 10 .B forward Normal product, primer A followed by B .\" .TP 10 .B revcomp Normal product, primer B followed by A .\" .TP 10 .B single_A Bad product generated by primer_A only .\" .TP 10 .B single_B Bad product generated by primer_B only .\" .PD .RE .P .\" There is also a human-readable output displayed, is not designed for parsing (see: --pretty below). .\" .RE .SH GENERAL OPTIONS .T Most arguments have short and long forms. The long forms are more likely to be stable over time, and hence should be used in scripts which call ipcress. .\" .TP .B "\-h | \--shorthelp" Show help. This will display a concise summary of the available options, defaults and values currently set. .\" .TP .B "--help" This shows all the help options including the defaults, the value currently set, and the environment variable which may be used to set each parameter. There will be an indication of which options are mandatory. Mandatory options have no default, and must have a value supplied for ipcress to run. If mandatory options are used in order, their flags may be skipped from the command line (see examples below). Unlike this man page, the information from this option will always be up to date with the latest version of the program. .\" .TP .B "\-v | \--version" Display the version number. Also displays other information such as the build date and glib version used. .RE .\" .SH FILE INPUT OPTIONS .TP .B "-i | \--input " PCR experiment data in the ipcress file format described above. .TP .B "-s | \--sequence" Specify the sequences. Multiple files may be specified here, which reduces the FSM building overhead, and makes ipcress run faster than running the process separately. .RE .\" .SH IPCRESS PARAMETERS .TP .B "\-m | \--mismatch " Specify the number of mismatches allowed per primer. Allowing mismatches reduces the speed of the program as a large primer neighbourhood must be constructed, and fewer experiments can be fitted in memory prior to each scan of the sequence databases. .\" .TP .B "\-M | \--memory" Specify the amount of memory the program should use. The more memory made available ipcress, the faster it will run, as more PCR experiments can be conducted in each scan of the sequence databases. This does not include memory used during the scan (for storing partial results and sequences), so the actual amount of memory used will be slightly higher. .\" .TP .B "\-p | \--pretty" Display results in a human-readable format, not designed for parsing. .\" .TP .B "\-P | \--products" Display PCR products as a FASTA format sequence. .\" .TP .B "\-S | \--seed" Specifiy the seed length for the wordneighbourhood for the FSM. If set to zero, the full primer is used. Shorter words reduce the size of the neighbourhood, but increase the time taken by ipcress to filter false positive matches. .\" .SH ENVIRONMENT Not documented yet. .\" .SH EXAMPLES .\" .B "ipcress test.ipcress sequence.fasta" .RS This is the simplest way that ipcress can be used. .RE .\" .B "ipcress dbsts_human.ipcress --sequence ncbi30/*.fasta --mismatch 1" .RS Compare a input file against a set of fasta files, allowing one mismatch in each primer. .RE .\" .RE .SH VERSION This documentation accompanies version 2.2.0 of the exonerate package. .\" .SH AUTHOR Guy St.C. Slater. . .L See the AUTHORS file accompanying the source code for a list of contributors. .SH AVAILABILITY This source code for the exonerate package is available under the terms of the GNU .I general public licence. Please see the file COPYING which was distrubuted with this package, or http://www.gnu.org/licenses/gpl.txt for details. This package has been developed as part of the ensembl project. Please see http://www.ensembl.org/ for more information. .SH "SEE ALSO" .BR exonerate (1), e-PCR exonerate-2.4.0/doc/man/man1/exonerate-server.10000644000175000017500000002370511076060111016164 00000000000000.\" Exonerate man Page .TH exonerate-server 1 "January 2008" exonerate-server "sequence comparison server" .SH NAME .\" exonerate-server \- a sequence comparison server for exonerate .\" .SH SYNOPSIS .B exonerate-server [ options ] .\" .SH DESCRIPTION .BR exonerate-server is a multi-threaded server for the exonerate sequence alignment program. It uses a set of sequences and a corresponding index file to allow fast of large datasets. .\" .RE .SH OVERVIEW .T .\" Firstly, an .B .esd file must be made from the sequence files. The .B .esd file is an Exonerate Sequence Dataset file, and can be used to group together any set of sequences where each sequences containing unique identifiers. This is done by using the .B fasta2esd utility. .RS .TP .B "fasta2esd genome.fasta genome.esd" .P .RE Next, an .B .esi file my be made from the .B .esd file. The .B .esi file is an Exonerate Sequence Index file, and contains an index or set of indices corresponding to a particular dataset. This is done by using the .B esd2esi utility. .\" .RS .TP .B "esd2esi genome.esd genome.esi" .P .RE Once the .B .esi file has been generated, the exonerate-server may be started. .RS .TP .B "exonerate-server genome.esi" .P .RE While the server is running, exonerate may be used to query the server by replacing the target sequences in the command line with the name of the server and port number. The default port number for the exonerate-server is 12886. .RS .TP .B "exonerate query.fasta localhost:12886" .P .RE .RE .SH OPTIONS .T Some of the command line options for the exonerate-server are the same as for the exonerate client, and these are documented in the man page for .B exonerate. The other options which are specific to .B exonerate-server are documented here. .TP .B "\--port" Specify the port on which the server should listen. By default, .B exonerate-server will listen on port 12886, but alternative ports may be specified with this option. .\" .TP .B "\--input" Specify the index file to be used when the server is started. This option is mandatory. The index file is a .B .esi file generated by the .B esd2esi utility. .\" .TP .B "\--preload" By default the indices contained in the .B .esi file, and the sequences referenced in the corresponding .B .esd file are loaded into memory when the server is started. This is necessary to achieve fast performance that would otherwise be hampered by frequent disk accesses. This option allows the index and sequence preloading to be turned off, which allows the server to run much more slowly, but with faster startup and a smaller memory footprint. It is not advised to turn preloading off unless testing or debugging the server. .\" .TP .B "\--maxconnections" The server is multithreaded. This option sets the number client processes which are allowed to connect to the server simultaneously. For good performance, it should not be set to more than the number of CPUs on the machine on which the server is running. .\" .TP .B "\--verbosity" Set the verbosity level for the server. If it is zero, the server will be silent, and the higher the number, the more messages are reported by the server about what is happening. .\" .RE .SH INTERFACE .T This section documents the communication interface between the client and server. The interface is documented for people wishing to write their own custom server to sit behind exonerate - for normal use of exonerate, it is not necessary to know this. .\" .P The interface works by the client sending simple command lines and the server sending simple reply lines over a socket. All the commands and replies are simple lines of ASCII text, so it is possible to use telnet as a client for testing a server. .\" .P Any command is a single line of text, but a reply may contain many lines of text. The replies are in the form of .B : .\" .P Any reply can include lines with the tag .B warning: or .B error: . These .B warning: and .B error: tags are echoed by the client, and the client will exit after receiving any .B error: reply. .P .\" When the server is returning a multiline reply, the first line must show the number of lines in the whole reply as: .B linecount: For examples, see the replies from the .B "get hsps" commands in the example session below. .P .\" The client will only open a single connection to any server, although a multithreaded server is obviously required to allow multiple clients to connect simultaneously. .\" .P .SS Commands and replies used in for the interface. .P .PP .PD 0 .TP 10 Command: .B version .TP 10 Reply: version .TP 10 Command: .B exit .TP 10 Reply: ( no reply - server closes connection ) .TP 10 Command: .B dbinfo .TP 10 Reply: dbinfo: The .B dbinfo command returns information about the database loaded on the server. The returned fields are: .RS .PD 0 .TP 16 .B either dna or protein .TP 16 .B either softmasked or unmasked .TP 16 .B the number of sequences in the database .TP 16 .B the length of the longest sequence in the database .TP 16 .B the total length of all the sequences in the database .RE .TP 10 Command: .B lookup .TP 10 Reply: lookup: The lookup command is used to map an external identifier to an internal identifier. .TP 10 Command: .B get info .TP 10 Reply: seqinfo: [ ] The get info command returns information about a sequence in the database. The returned fields are: .RS .TP 16 .B the sequence length .TP 16 .B a gcg format checksum (see below) .TP 16 .B the external id (eg. from fasta header) .TP 16 .B a description line for the sequence (also from the fasta header), this field is optional an may be ommitted. .RE .TP 10 Command: .B get seq .TP 10 Reply: seq: The get seq command returns a whole sequence on one line. .TP 10 Command: .B get subseq .TP 10 Reply: subseq: The get subseq command returns part of a sequence. The start of the sequence is position zero. eg. get subseq 0 0 10 will return the first 10 bases of the first sequence in the database. .TP 10 Command: .B set query .TP 10 Reply: ok: The seq query command is used to send a query sequence to the server. It returns the length of the sequence and a gcg checksum .TP 10 Command: .B revcomp .TP 10 Reply: ok: strand The revcomp query command makes the server reverse complement the query. This is to save the bandwidth of sending the query twice. The revcomp target command is to tell the server to treat the database as its reverse complement. The client only sends this command when searching a translated database, so need not be implemented for most types of search. .TP 10 Command: .B set param .TP 10 Reply: ok: The set parameter command sends parameters from the exonerate command line to the server. This commands can all be ignored by the client for a basic implementation, but cannot be ignored for optimal performance. .TP 10 Command: .B get hsps .TP 10 Reply: hspset: { } .TP 10 Or: hspset: empty The get hsps command is the main command for getting sets of hsps. The server may return multiple hspsets. The returned fields are: .RS .TP 16 .B The internal id of the target sequence for these HSPsets. .TP 16 .B The hsp query start position .TP 16 .B The hsp target start position .TP 16 .B The hsp length .RE .RS The last three fields represent an HSP, and may be repeated many times on one .B hspset: reply line. .RE .PD .PP .\" .SS A simple example client server dialog. .P .PP .nf .SP % telnet localhost 12886 Trying 127.0.0.1... Connected to localhost.localdomain. Escape character is '^]'. % version version: exonerate-server 2.0.0 % dbinfo dbinfo: dna softmasked 100000 1701 38113579 % lookup AA159529.1 lookup: 88065 % get info 88065 seqinfo: 62 2028 AA159529.1 zo72g05.s1 Stratagene pancreas (#937208) Homo sapiens cDNA % get seq 88065 seq: NAACTCATCNTTTTCTGCTGNATCCTCTTCACCAGTTTGGGGGANGGCCTGCACTTCCANAG % get subseq 88065 10 20 subseq: TTTTCTGCTGNATCCTCTTC % set query NAACTCATCNTTTTCTGCTGNATCCTCTTCACCAGTTTGGGGGANGGCCTGCACTTCCANAG ok: 62 2028 % get hsps linecount: 15 hspset: 12423 1 349 41 hspset: 44900 1 356 47 hspset: 61781 1 358 41 36 392 26 hspset: 70065 1 349 41 36 383 26 hspset: 88065 1 1 61 hspset: 91032 1 357 41 36 391 26 hspset: 91442 1 350 41 36 384 26 hspset: 92971 1 348 41 36 382 26 hspset: 94311 1 375 41 hspset: 95381 1 346 41 36 380 26 hspset: 96808 10 385 32 36 410 26 hspset: 88449 18 11 22 hspset: 91036 6 6 56 hspset: 93736 36 400 26 % revcomp query ok: query strand revcomp % get hsps linecount: 6 hspset: 12564 0 64 26 20 83 41 hspset: 61780 0 266 61 hspset: 29148 0 116 61 hspset: 25849 15 445 22 hspset: 93938 26 265 34 % exit Connection closed by foreign host. .SP .fi .\" .SH ENVIRONMENT Not documented yet. .\" .SH EXAMPLES .\" .B 1. Example of creating a translated index and running a fast protein2genome search using exonerate-server .B fasta2esd human.genomic.fasta human.genomic.esd .B esd2esi --translate yes human.genomic.esd human.genomic.trans.esi .B exonerate-server --port 1234 human.genomic.trans.esi .B exonerate pep.fasta localhost:1234 --model p2g --seedrepeat 3 --geneseed 250 .\" .RE .SH VERSION This documentation accompanies version 2.2.0 of the exonerate package. .\" .SH AUTHOR Guy St.C. Slater. . .L See the AUTHORS file accompanying the source code for a list of contributors. .SH AVAILABILITY This source code for the exonerate package is available under the terms of the GNU .I general public licence. Please see the file COPYING which was distrubuted with this package, or http://www.gnu.org/licenses/gpl.txt for details. This package has been developed as part of the ensembl project. Please see http://www.ensembl.org/ for more information. .SH "SEE ALSO" .BR exonerate (1), exonerate-2.4.0/doc/man/man1/fastautils.10000644000175000017500000000276611076060131015053 00000000000000.\" iPCRess man Page .TH fastautils 1 "March 2003" fastautils "FASTA format file manipulation utilities" .SH NAME .\" fastautils \- FASTA format file manipulation utilities .SH SYNOPSIS .B fastachecksum [ options ] .P .B fastaclean .P .B fastaclip .P .B fastacomposition .P .B fastadiff .P .B fastaexplode .P .B fastafetch .P .B fastahardmask .P .B fastaindex .P .B fastalength .P .B fastanrdb .P .B fastaoverlap .P .B fastareformat .P .B fastaremove .P .B fastarevcomp .P .B fastasoftmask .P .B fastasort .P .B fastasplit .P .B fastasubseq .P .B fastatranslate .P .B fastavalidcds .P .\" .SH DESCRIPTION These are utilities for the manipulation of FASTA format sequence databases which are distributed with the exonerate sequence alignment program. .P .\" ... .\" .RE .\" .SH ENVIRONMENT Not documented yet. .\" .SH EXAMPLES .\" .B "fastalength sequence.fasta" .RS blah blah .RE .\" .SH VERSION This documentation accompanies version 2.2.0 of the exonerate package. .\" .SH AUTHOR Guy St.C. Slater. . .L See the AUTHORS file accompanying the source code for a list of contributors. .SH AVAILABILITY This source code for the exonerate package is available under the terms of the GNU .I general public licence. Please see the file COPYING which was distrubuted with this package, or http://www.gnu.org/licenses/gpl.txt for details. This package has been developed as part of the ensembl project. Please see http://www.ensembl.org/ for more information. .SH "SEE ALSO" .BR exonerate (1), e-PCR exonerate-2.4.0/src/0000777000175000017500000000000011222703703011300 500000000000000exonerate-2.4.0/src/Makefile.am0000644000175000017500000000021310577771362013266 00000000000000 SUBDIRS = struct general sequence comparison database \ c4 bsdp sdp model hub program util MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/src/Makefile.in0000644000175000017500000002111411222703645013265 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = .. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ SUBDIRS = struct general sequence comparison database c4 bsdp sdp model hub program util MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best all: all-redirect .SUFFIXES: $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps src/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status # This directory's subdirectories are mostly independent; you can cd # into them and run `make' without going through this Makefile. # To change the values of `make' variables: instead of editing Makefiles, # (1) if the variable is set in `config.status', edit `config.status' # (which will cause the Makefiles to be regenerated when you run `make'); # (2) otherwise, pass the desired values on the `make' command line. @SET_MAKE@ all-recursive install-data-recursive install-exec-recursive \ installdirs-recursive install-recursive uninstall-recursive \ check-recursive installcheck-recursive info-recursive dvi-recursive: @set fnord $(MAKEFLAGS); amf=$$2; \ dot_seen=no; \ target=`echo $@ | sed s/-recursive//`; \ list='$(SUBDIRS)'; for subdir in $$list; do \ echo "Making $$target in $$subdir"; \ if test "$$subdir" = "."; then \ dot_seen=yes; \ local_target="$$target-am"; \ else \ local_target="$$target"; \ fi; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \ || case "$$amf" in *=*) exit 1;; *k*) fail=yes;; *) exit 1;; esac; \ done; \ if test "$$dot_seen" = "no"; then \ $(MAKE) $(AM_MAKEFLAGS) "$$target-am" || exit 1; \ fi; test -z "$$fail" mostlyclean-recursive clean-recursive distclean-recursive \ maintainer-clean-recursive: @set fnord $(MAKEFLAGS); amf=$$2; \ dot_seen=no; \ rev=''; list='$(SUBDIRS)'; for subdir in $$list; do \ rev="$$subdir $$rev"; \ test "$$subdir" != "." || dot_seen=yes; \ done; \ test "$$dot_seen" = "no" && rev=". $$rev"; \ target=`echo $@ | sed s/-recursive//`; \ for subdir in $$rev; do \ echo "Making $$target in $$subdir"; \ if test "$$subdir" = "."; then \ local_target="$$target-am"; \ else \ local_target="$$target"; \ fi; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \ || case "$$amf" in *=*) exit 1;; *k*) fail=yes;; *) exit 1;; esac; \ done && test -z "$$fail" tags-recursive: list='$(SUBDIRS)'; for subdir in $$list; do \ test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) tags); \ done tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: tags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SUBDIRS)'; for subdir in $$list; do \ if test "$$subdir" = .; then :; else \ test -f $$subdir/TAGS && tags="$$tags -i $$here/$$subdir/TAGS"; \ fi; \ done; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = src distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done for subdir in $(SUBDIRS); do \ if test "$$subdir" = .; then :; else \ test -d $(distdir)/$$subdir \ || mkdir $(distdir)/$$subdir \ || exit 1; \ chmod 777 $(distdir)/$$subdir; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) top_distdir=../$(top_distdir) distdir=../$(distdir)/$$subdir distdir) \ || exit 1; \ fi; \ done info-am: info: info-recursive dvi-am: dvi: dvi-recursive check-am: all-am check: check-recursive installcheck-am: installcheck: installcheck-recursive install-exec-am: install-exec: install-exec-recursive install-data-am: install-data: install-data-recursive install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-recursive uninstall-am: uninstall: uninstall-recursive all-am: Makefile all-redirect: all-recursive install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: installdirs-recursive installdirs-am: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-recursive clean-am: clean-tags clean-generic mostlyclean-am clean: clean-recursive distclean-am: distclean-tags distclean-generic clean-am distclean: distclean-recursive maintainer-clean-am: maintainer-clean-tags maintainer-clean-generic \ distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-recursive .PHONY: install-data-recursive uninstall-data-recursive \ install-exec-recursive uninstall-exec-recursive installdirs-recursive \ uninstalldirs-recursive all-recursive check-recursive \ installcheck-recursive info-recursive dvi-recursive \ mostlyclean-recursive distclean-recursive clean-recursive \ maintainer-clean-recursive tags tags-recursive mostlyclean-tags \ distclean-tags clean-tags maintainer-clean-tags distdir info-am info \ dvi-am dvi check check-am installcheck-am installcheck install-exec-am \ install-exec install-data-am install-data install-am install \ uninstall-am uninstall all-redirect all-am all installdirs-am \ installdirs mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/src/struct/0000777000175000017500000000000011222703703012624 500000000000000exonerate-2.4.0/src/struct/Makefile.am0000644000175000017500000000307511222647560014611 00000000000000 TESTS = fsm.test matrix.test pqueue.test slist.test rangetree.test \ vfsm.test recyclebin.test sparsecache.test dejavu.test \ bitarray.test splaytree.test noitree.test # huffman.test fmindex.test noinst_PROGRAMS = $(TESTS) noinst_HEADERS = fsm.h matrix.h pqueue.h slist.h rangetree.h vfsm.h \ recyclebin.h sparsecache.h dejavu.h bitarray.h \ splaytree.h noitree.h # huffman.h fmindex.h INCLUDES = -DCUSTOM_GUINT64_FORMAT="\"@custom_guint64_format@\"" fsm_test_SOURCES = fsm.test.c fsm.c fsm_test_LDADD = recyclebin.o matrix_test_SOURCES = matrix.test.c matrix.c pqueue_test_SOURCES = pqueue.test.c pqueue.c recyclebin.c slist_test_SOURCES = slist.test.c slist.c recyclebin.c rangetree_test_SOURCES = rangetree.test.c rangetree.c rangetree_test_LDADD = recyclebin.o vfsm_test_SOURCES = vfsm.test.c vfsm.c vfsm_test_LDADD = -lm recyclebin_test_SOURCES = recyclebin.test.c recyclebin.c sparsecache_test_SOURCES = sparsecache.test.c sparsecache.c dejavu_test_SOURCES = dejavu.test.c dejavu.c bitarray_test_SOURCES = bitarray.test.c bitarray.c splaytree_test_SOURCES = splaytree.test.c splaytree.c splaytree_test_LDADD = recyclebin.o noitree_test_SOURCES = noitree.test.c noitree.c noitree_test_LDADD = recyclebin.o splaytree.o # huffman_test_SOURCES = huffman.test.c huffman.c # huffman_test_LDADD = bitarray.o pqueue.o recyclebin.o # fmindex_test_SOURCES = fmindex.test.c fmindex.c # fmindex_test_LDADD = bitarray.o pqueue.o recyclebin.o huffman.o # Files to clear away MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/src/struct/Makefile.in0000644000175000017500000003166711222703645014627 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ TESTS = fsm.test matrix.test pqueue.test slist.test rangetree.test vfsm.test recyclebin.test sparsecache.test dejavu.test bitarray.test splaytree.test noitree.test # huffman.test fmindex.test noinst_PROGRAMS = $(TESTS) noinst_HEADERS = fsm.h matrix.h pqueue.h slist.h rangetree.h vfsm.h recyclebin.h sparsecache.h dejavu.h bitarray.h splaytree.h noitree.h # huffman.h fmindex.h INCLUDES = -DCUSTOM_GUINT64_FORMAT="\"@custom_guint64_format@\"" fsm_test_SOURCES = fsm.test.c fsm.c fsm_test_LDADD = recyclebin.o matrix_test_SOURCES = matrix.test.c matrix.c pqueue_test_SOURCES = pqueue.test.c pqueue.c recyclebin.c slist_test_SOURCES = slist.test.c slist.c recyclebin.c rangetree_test_SOURCES = rangetree.test.c rangetree.c rangetree_test_LDADD = recyclebin.o vfsm_test_SOURCES = vfsm.test.c vfsm.c vfsm_test_LDADD = -lm recyclebin_test_SOURCES = recyclebin.test.c recyclebin.c sparsecache_test_SOURCES = sparsecache.test.c sparsecache.c dejavu_test_SOURCES = dejavu.test.c dejavu.c bitarray_test_SOURCES = bitarray.test.c bitarray.c splaytree_test_SOURCES = splaytree.test.c splaytree.c splaytree_test_LDADD = recyclebin.o noitree_test_SOURCES = noitree.test.c noitree.c noitree_test_LDADD = recyclebin.o splaytree.o # huffman_test_SOURCES = huffman.test.c huffman.c # huffman_test_LDADD = bitarray.o pqueue.o recyclebin.o # fmindex_test_SOURCES = fmindex.test.c fmindex.c # fmindex_test_LDADD = bitarray.o pqueue.o recyclebin.o huffman.o # Files to clear away MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = PROGRAMS = $(noinst_PROGRAMS) DEFS = @DEFS@ -I. -I$(srcdir) CPPFLAGS = @CPPFLAGS@ LDFLAGS = @LDFLAGS@ LIBS = @LIBS@ fsm_test_OBJECTS = fsm.test.o fsm.o fsm_test_DEPENDENCIES = recyclebin.o fsm_test_LDFLAGS = matrix_test_OBJECTS = matrix.test.o matrix.o matrix_test_LDADD = $(LDADD) matrix_test_DEPENDENCIES = matrix_test_LDFLAGS = pqueue_test_OBJECTS = pqueue.test.o pqueue.o recyclebin.o pqueue_test_LDADD = $(LDADD) pqueue_test_DEPENDENCIES = pqueue_test_LDFLAGS = slist_test_OBJECTS = slist.test.o slist.o recyclebin.o slist_test_LDADD = $(LDADD) slist_test_DEPENDENCIES = slist_test_LDFLAGS = rangetree_test_OBJECTS = rangetree.test.o rangetree.o rangetree_test_DEPENDENCIES = recyclebin.o rangetree_test_LDFLAGS = vfsm_test_OBJECTS = vfsm.test.o vfsm.o vfsm_test_DEPENDENCIES = vfsm_test_LDFLAGS = recyclebin_test_OBJECTS = recyclebin.test.o recyclebin.o recyclebin_test_LDADD = $(LDADD) recyclebin_test_DEPENDENCIES = recyclebin_test_LDFLAGS = sparsecache_test_OBJECTS = sparsecache.test.o sparsecache.o sparsecache_test_LDADD = $(LDADD) sparsecache_test_DEPENDENCIES = sparsecache_test_LDFLAGS = dejavu_test_OBJECTS = dejavu.test.o dejavu.o dejavu_test_LDADD = $(LDADD) dejavu_test_DEPENDENCIES = dejavu_test_LDFLAGS = bitarray_test_OBJECTS = bitarray.test.o bitarray.o bitarray_test_LDADD = $(LDADD) bitarray_test_DEPENDENCIES = bitarray_test_LDFLAGS = splaytree_test_OBJECTS = splaytree.test.o splaytree.o splaytree_test_DEPENDENCIES = recyclebin.o splaytree_test_LDFLAGS = noitree_test_OBJECTS = noitree.test.o noitree.o noitree_test_DEPENDENCIES = recyclebin.o splaytree.o noitree_test_LDFLAGS = CFLAGS = @CFLAGS@ COMPILE = $(CC) $(DEFS) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) CCLD = $(CC) LINK = $(CCLD) $(AM_CFLAGS) $(CFLAGS) $(LDFLAGS) -o $@ HEADERS = $(noinst_HEADERS) DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best SOURCES = $(fsm_test_SOURCES) $(matrix_test_SOURCES) $(pqueue_test_SOURCES) $(slist_test_SOURCES) $(rangetree_test_SOURCES) $(vfsm_test_SOURCES) $(recyclebin_test_SOURCES) $(sparsecache_test_SOURCES) $(dejavu_test_SOURCES) $(bitarray_test_SOURCES) $(splaytree_test_SOURCES) $(noitree_test_SOURCES) OBJECTS = $(fsm_test_OBJECTS) $(matrix_test_OBJECTS) $(pqueue_test_OBJECTS) $(slist_test_OBJECTS) $(rangetree_test_OBJECTS) $(vfsm_test_OBJECTS) $(recyclebin_test_OBJECTS) $(sparsecache_test_OBJECTS) $(dejavu_test_OBJECTS) $(bitarray_test_OBJECTS) $(splaytree_test_OBJECTS) $(noitree_test_OBJECTS) all: all-redirect .SUFFIXES: .SUFFIXES: .S .c .o .s $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps src/struct/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status mostlyclean-noinstPROGRAMS: clean-noinstPROGRAMS: -test -z "$(noinst_PROGRAMS)" || rm -f $(noinst_PROGRAMS) distclean-noinstPROGRAMS: maintainer-clean-noinstPROGRAMS: .c.o: $(COMPILE) -c $< .s.o: $(COMPILE) -c $< .S.o: $(COMPILE) -c $< mostlyclean-compile: -rm -f *.o core *.core clean-compile: distclean-compile: -rm -f *.tab.c maintainer-clean-compile: fsm.test: $(fsm_test_OBJECTS) $(fsm_test_DEPENDENCIES) @rm -f fsm.test $(LINK) $(fsm_test_LDFLAGS) $(fsm_test_OBJECTS) $(fsm_test_LDADD) $(LIBS) matrix.test: $(matrix_test_OBJECTS) $(matrix_test_DEPENDENCIES) @rm -f matrix.test $(LINK) $(matrix_test_LDFLAGS) $(matrix_test_OBJECTS) $(matrix_test_LDADD) $(LIBS) pqueue.test: $(pqueue_test_OBJECTS) $(pqueue_test_DEPENDENCIES) @rm -f pqueue.test $(LINK) $(pqueue_test_LDFLAGS) $(pqueue_test_OBJECTS) $(pqueue_test_LDADD) $(LIBS) slist.test: $(slist_test_OBJECTS) $(slist_test_DEPENDENCIES) @rm -f slist.test $(LINK) $(slist_test_LDFLAGS) $(slist_test_OBJECTS) $(slist_test_LDADD) $(LIBS) rangetree.test: $(rangetree_test_OBJECTS) $(rangetree_test_DEPENDENCIES) @rm -f rangetree.test $(LINK) $(rangetree_test_LDFLAGS) $(rangetree_test_OBJECTS) $(rangetree_test_LDADD) $(LIBS) vfsm.test: $(vfsm_test_OBJECTS) $(vfsm_test_DEPENDENCIES) @rm -f vfsm.test $(LINK) $(vfsm_test_LDFLAGS) $(vfsm_test_OBJECTS) $(vfsm_test_LDADD) $(LIBS) recyclebin.test: $(recyclebin_test_OBJECTS) $(recyclebin_test_DEPENDENCIES) @rm -f recyclebin.test $(LINK) $(recyclebin_test_LDFLAGS) $(recyclebin_test_OBJECTS) $(recyclebin_test_LDADD) $(LIBS) sparsecache.test: $(sparsecache_test_OBJECTS) $(sparsecache_test_DEPENDENCIES) @rm -f sparsecache.test $(LINK) $(sparsecache_test_LDFLAGS) $(sparsecache_test_OBJECTS) $(sparsecache_test_LDADD) $(LIBS) dejavu.test: $(dejavu_test_OBJECTS) $(dejavu_test_DEPENDENCIES) @rm -f dejavu.test $(LINK) $(dejavu_test_LDFLAGS) $(dejavu_test_OBJECTS) $(dejavu_test_LDADD) $(LIBS) bitarray.test: $(bitarray_test_OBJECTS) $(bitarray_test_DEPENDENCIES) @rm -f bitarray.test $(LINK) $(bitarray_test_LDFLAGS) $(bitarray_test_OBJECTS) $(bitarray_test_LDADD) $(LIBS) splaytree.test: $(splaytree_test_OBJECTS) $(splaytree_test_DEPENDENCIES) @rm -f splaytree.test $(LINK) $(splaytree_test_LDFLAGS) $(splaytree_test_OBJECTS) $(splaytree_test_LDADD) $(LIBS) noitree.test: $(noitree_test_OBJECTS) $(noitree_test_DEPENDENCIES) @rm -f noitree.test $(LINK) $(noitree_test_LDFLAGS) $(noitree_test_OBJECTS) $(noitree_test_LDADD) $(LIBS) tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = src/struct distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done check-TESTS: $(TESTS) @failed=0; all=0; \ srcdir=$(srcdir); export srcdir; \ for tst in $(TESTS); do \ if test -f $$tst; then dir=.; \ else dir="$(srcdir)"; fi; \ if $(TESTS_ENVIRONMENT) $$dir/$$tst; then \ all=`expr $$all + 1`; \ echo "PASS: $$tst"; \ elif test $$? -ne 77; then \ all=`expr $$all + 1`; \ failed=`expr $$failed + 1`; \ echo "FAIL: $$tst"; \ fi; \ done; \ if test "$$failed" -eq 0; then \ banner="All $$all tests passed"; \ else \ banner="$$failed of $$all tests failed"; \ fi; \ dashes=`echo "$$banner" | sed s/./=/g`; \ echo "$$dashes"; \ echo "$$banner"; \ echo "$$dashes"; \ test "$$failed" -eq 0 info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am $(MAKE) $(AM_MAKEFLAGS) check-TESTS check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall: uninstall-am all-am: Makefile $(PROGRAMS) $(HEADERS) all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-noinstPROGRAMS mostlyclean-compile \ mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-noinstPROGRAMS clean-compile clean-tags clean-generic \ mostlyclean-am clean: clean-am distclean-am: distclean-noinstPROGRAMS distclean-compile distclean-tags \ distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-noinstPROGRAMS \ maintainer-clean-compile maintainer-clean-tags \ maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: mostlyclean-noinstPROGRAMS distclean-noinstPROGRAMS \ clean-noinstPROGRAMS maintainer-clean-noinstPROGRAMS \ mostlyclean-compile distclean-compile clean-compile \ maintainer-clean-compile tags mostlyclean-tags distclean-tags \ clean-tags maintainer-clean-tags distdir check-TESTS info-am info \ dvi-am dvi check check-am installcheck-am installcheck install-exec-am \ install-exec install-data-am install-data install-am install \ uninstall-am uninstall all-redirect all-am all installdirs \ mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/src/struct/fsm.test.c0000644000175000017500000000576611162714342014471 00000000000000/****************************************************************\ * * * Library for FSM-based word matching. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strlen() */ #include "fsm.h" static gpointer test_merge_Func(gpointer a, gpointer b, gpointer user_data){ register gint ia = GPOINTER_TO_INT(a), ib = GPOINTER_TO_INT(b); g_assert(a); g_assert(b); return GINT_TO_POINTER(ia + ib); } static void test_traverse_Func(guint seq_pos, gpointer node_data, gpointer user_data){ register gint node_count = GPOINTER_TO_INT(node_data), *total = (gint*)user_data; *total += node_count; return; } int main(void){ gint count = 0; register FSM *f = FSM_create("abcdefghijklmnopqrstuvwxyz", test_merge_Func, test_merge_Func, NULL); #define TESTSETSIZE 32 char *testset[TESTSETSIZE] = { /* C KEYWORDS */ "auto", "break", "case", "char", "const", "continue", "default", "do", "double", "else", "enum", "extern", "float", "for", "goto", "if", "int", "long", "register", "return", "short", "signed", "sizeof", "static", "struct", "switch", "typedef", "union", "unsigned", "void", "volatile", "while" }; char *testsen = "-unSIGNed--switCHAR---DOUBLElse--sizeoFOR"; /* Total=9 2 1 1 1 1 1 1 1 */ register gint i; register guchar *tolower_filter = (guchar*) "----------------------------------------------------------------" "-abcdefghijklmnopqrstuvwxyz------abcdefghijklmnopqrstuvwxyz-----" "----------------------------------------------------------------" "----------------------------------------------------------------"; for(i = 0; i < TESTSETSIZE; i++) FSM_add(f, testset[i], strlen(testset[i]), GINT_TO_POINTER(1)); FSM_compile(f); FSM_add_traversal_filter(f, tolower_filter); FSM_traverse(f, testsen, test_traverse_Func, &count); g_print("FSM count is [%d]\n", count); g_assert(count == 9); FSM_destroy(f); return 0; } exonerate-2.4.0/src/struct/fsm.c0000644000175000017500000001654311162714341013505 00000000000000/****************************************************************\ * * * Library for FSM-based word matching. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ /* Simplified FSM code derived from * my earlier implementation in the ESTate package: * http://www.hgmp.mrc.ac.uk/~gslater/ESTate/ */ #include "fsm.h" #include /* For memcpy() */ FSM *FSM_create(gchar *alphabet, FSM_Join_Func merge_func, FSM_Join_Func combine_func, gpointer user_data){ register FSM *f = g_new0(FSM, 1); register guchar *p; g_assert(alphabet); g_assert(merge_func); g_assert(combine_func); for(p = ((guchar*)alphabet); *p; p++) f->index[*p] = 1; for(p = f->index+ALPHABETSIZE; p > f->index; p--) if(*p) *p = ++f->width; f->width++; /* Add one for NULL */ memcpy(f->insertion_filter, f->index, sizeof(gchar)*ALPHABETSIZE); memcpy(f->traversal_filter, f->index, sizeof(gchar)*ALPHABETSIZE); f->merge_func = merge_func; f->combine_func = combine_func; f->user_data = user_data; f->recycle = RecycleBin_create("FSM", sizeof(FSM_Node)*f->width, 1024); f->chunk_count = 1; f->root = RecycleBin_alloc_blank(f->recycle); f->is_compiled = FALSE; return f; } gsize FSM_memory_usage(FSM *f){ return sizeof(FSM) + (sizeof(FSM_Node) * f->width * f->chunk_count); } void FSM_info(FSM *f){ g_print("FSM information:\n" " Alphabet size = %d\n" " Size of node = %d\n" " Size of chunk = %d\n" " Chunks used = %lu\n" " Bytes used = %lu\n", f->width, (gint)sizeof(FSM_Node), (gint)sizeof(FSM_Node)*f->width, f->chunk_count, (gulong)FSM_memory_usage(f)); return; } void FSM_destroy(FSM *f){ RecycleBin_destroy(f->recycle); g_free(f); return; } static void FSM_destroy_with_data_recur(FSM *f, FSM_Node *n, FSM_Destroy_Func fdf, gpointer user_data){ register gint i; register FSM_Node *t; for(i = 1; i < f->width; i++){ if(n[i].data){ fdf(n[i].data, user_data); n[i].data = NULL; } if(n[i].next){ t = n[i].next; n[i].next = NULL; FSM_destroy_with_data_recur(f, t, fdf, user_data); } } return; } void FSM_destroy_with_data(FSM *f, FSM_Destroy_Func fdf, gpointer user_data){ FSM_destroy_with_data_recur(f, f->root, fdf, user_data); FSM_destroy(f); return; } gpointer FSM_add(FSM *f, gchar *seq, guint len, gpointer node_data){ register FSM_Node **ptp, *n = f->root; register guchar *useq, *end; g_assert(!f->is_compiled); for(useq = (guchar*)seq, end = useq+len-1; useq < end; useq++){ if(!f->insertion_filter[*useq]) g_error("Illegal char (%d)[%c] in FSM word [%s]", *useq, *useq, seq); if(!*(ptp = &n[f->insertion_filter[*useq]].next)){ *ptp = RecycleBin_alloc_blank(f->recycle); f->chunk_count++; } n = *ptp; } if(n[f->insertion_filter[*useq]].data){ if(node_data) n[f->insertion_filter[*useq]].data = f->merge_func(n[f->insertion_filter[*useq]].data, node_data, f->user_data); } else { n[f->insertion_filter[*useq]].data = node_data; } return n[f->insertion_filter[*useq]].data; } /* FSM_compile : Converts the pre-FSM trie to the FSM. 1: All position zero nodes must have no score and point to root. 2: All unused nodes must me made to point to the node which would be reached by insertion of their longest suffix (this will always be above the current node, and it's location is independent of any visits to root on this path). 3: The scores from subsequences must also be inherited. 4: This is achieved with a level-order trie traversal. 5: A queue is used to remove traversal recursion and backtracking. 6: No space is required for the queue, as it is makes temporary use of the position zero nodes during compilation. 7: This algorithm is linear with the number of states. */ void FSM_compile(FSM *f){ register gint i; FSM_Node a; register FSM_Node *suffix, *prev; register FSM_Node *out = &a, *in = out; g_assert(!f->is_compiled); out->next = in; /* Initialise queue */ f->root->next = f->root; for(i = 1; i < f->width; i++) if(f->root[i].next){ in->data = f->root; in->next = f->root[i].next; in = f->root[i].next; } else f->root[i].next = f->root; while(out != in){ prev = out; suffix = out->data; out = out->next; for(i = 1; i < f->width; i++){ if(suffix[i].data){ if(out[i].data) out[i].data = f->combine_func(out[i].data, suffix[i].data, f->user_data); else out[i].data = suffix[i].data; } if(out[i].next){ in->data = suffix[i].next; in->next = out[i].next; in = out[i].next; } else out[i].next = suffix[i].next; } prev->next = f->root; prev->data = NULL; } out->next = f->root; out->data = NULL; f->is_compiled = TRUE; return; } void FSM_traverse(FSM *f, gchar *seq, FSM_Traverse_Func ftf, gpointer user_data){ register FSM_Node *n = f->root; register guchar *p = (guchar*)seq; register gint c; g_assert(f->is_compiled); do { if(n[c = f->traversal_filter[*p]].data) ftf(p-(guchar*)seq, n[c].data, user_data); n = n[c].next; } while(*++p); return; } static void FSM_add_filter(FSM *f, guchar *src, guchar *dst){ register gint i; if(dst){ /* Merge filter */ for(i = 0; i < ALPHABETSIZE; i++) dst[i] = dst[src[i]]; } else { /* Reset to original index */ memcpy(dst, f->index, sizeof(gchar)*ALPHABETSIZE); } return; } void FSM_add_insertion_filter(FSM *f, guchar *filter){ FSM_add_filter(f, filter, f->insertion_filter); return; } void FSM_add_traversal_filter(FSM *f, guchar *filter){ FSM_add_filter(f, filter, f->traversal_filter); return; } /**/ exonerate-2.4.0/src/struct/matrix.test.c0000644000175000017500000000713411162714342015177 00000000000000/****************************************************************\ * * * Simple matrix creation routines * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "matrix.h" int main(void){ register gint **m2d, ***m3d; register gshort ***ms3d, ****ms4d; register gint i, j, k, l; g_message("testing 2d matrix routines\n"); m2d = (gint**)Matrix2d_create(5, 10, sizeof(gint)); for(i = 0; i < 5; i++){ for(j = 0; j < 10; j++){ g_assert(!m2d[i][j]); m2d[i][j] = i+j; } } for(i = 0; i < 5; i++){ for(j = 0; j < 10; j++){ g_print("[%2d]", m2d[i][j]); } g_print("\n"); } g_print("\n"); g_free(m2d); /**/ g_print("\n"); g_message("testing 3d matrix routines\n"); m3d = (gint***)Matrix3d_create(4, 3, 2, sizeof(gint)); for(i = 0; i < 4; i++){ for(j = 0; j < 3; j++){ for(k = 0; k < 2; k++){ g_assert(!m3d[i][j][k]); m3d[i][j][k] = i+j+k; } } } for(i = 0; i < 4; i++){ for(j = 0; j < 3; j++){ g_print(" {"); for(k = 0; k < 2; k++){ g_print("[%2d]", m3d[i][j][k]); } g_print("}"); } g_print("\n"); } g_print("\n"); g_free(m3d); /**/ g_message("testing 3d matrix routines on gshorts\n"); ms3d = (gshort***)Matrix3d_create(4, 3, 2, sizeof(gshort)); for(i = 0; i < 4; i++){ for(j = 0; j < 3; j++){ for(k = 0; k < 2; k++){ g_assert(!ms3d[i][j][k]); ms3d[i][j][k] = i+j+k; } } } for(i = 0; i < 4; i++){ for(j = 0; j < 3; j++){ g_print(" {"); for(k = 0; k < 2; k++){ g_print("[%2d]", ms3d[i][j][k]); } g_print("}"); } g_print("\n"); } g_print("\n"); g_free(ms3d); /**/ g_message("testing 4d matrix routines on gshorts\n"); ms4d = (gshort****)Matrix4d_create(5, 4, 3, 2, sizeof(gshort)); for(i = 0; i < 5; i++){ for(j = 0; j < 4; j++){ for(k = 0; k < 3; k++){ for(l = 0; l < 2; l++){ g_assert(!ms4d[i][j][k][l]); ms4d[i][j][k][l] = i+j+k+l; } } } } for(i = 0; i < 5; i++){ for(j = 0; j < 4; j++){ g_print(" {"); for(k = 0; k < 3; k++){ for(l = 0; l < 2; l++){ g_print("[%2d]", ms4d[i][j][k][l]); } g_print(","); } g_print("}\n"); } g_print("\n"); } g_print("\n"); g_free(ms4d); /**/ return 0; } exonerate-2.4.0/src/struct/matrix.c0000644000175000017500000001732411162714342014223 00000000000000/****************************************************************\ * * * Simple matrix creation routines * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "matrix.h" static void Matrix_layout(gpointer *index, gint length, gchar *data, gsize block){ register gint i; register gchar *p; g_assert(index); g_assert(data); g_assert(length > 0); g_assert(block > 0); for(i = 0, p = data; i < length; i++){ index[i] = p; p += block; } return; } /**/ gsize Matrix2d_size(gint a, gint b, gsize cell){ register gulong index_size = a * sizeof(gpointer), row_size = b * cell, data_size = a * row_size, total_size = index_size + data_size; register gdouble check_index_size = a * sizeof(gpointer), check_row_size = b * cell, check_data_size = a * check_row_size, check_total_size = check_index_size + check_data_size; g_assert(a > 0); g_assert(b > 0); g_assert(cell > 0); if((check_total_size - total_size) > 1) /* overflow */ return 0; return total_size; } void Matrix2d_init(gchar **m, gint a, gint b, gsize cell){ Matrix_layout((gpointer*)m, a, (gchar*)&m[a], b*cell); return; } gpointer *Matrix2d_create(gint a, gint b, gsize cell){ register gchar **m; register gulong index_size = a * sizeof(gpointer), row_size = b * cell, data_size = a * row_size; g_assert(a > 0); g_assert(b > 0); g_assert(cell > 0); g_assert(Matrix2d_size(a, b, cell)); m = g_malloc0(index_size + data_size); Matrix_layout((gpointer*)m, a, (gchar*)&m[a], row_size); return (gpointer*)m; } /**/ gsize Matrix3d_size(gint a, gint b, gint c, gsize cell){ register gulong primary_index_size = a * sizeof(gpointer), secondary_index_size = b * sizeof(gpointer), row_size = c * cell, block_size = secondary_index_size + (b * row_size), total_size; register gdouble check_primary_index_size = a * sizeof(gpointer), check_secondary_index_size = b * sizeof(gpointer), check_row_size = c * cell, check_block_size = check_secondary_index_size + (b * check_row_size), check_total_size; g_assert(a > 0); g_assert(b > 0); g_assert(c > 0); g_assert(cell > 0); block_size += (block_size % sizeof(gpointer)); /* Pad */ check_block_size += (block_size % sizeof(gpointer)); /* Pad */ total_size = primary_index_size + (a * block_size); check_total_size = check_primary_index_size + (a * check_block_size); if((check_total_size - total_size) > 1) /* overflow */ return 0; return total_size; } gpointer **Matrix3d_create(gint a, gint b, gint c, gsize cell){ register gint i; register gchar ***m; register gulong primary_index_size = a * sizeof(gpointer), secondary_index_size = b * sizeof(gpointer), row_size = c * cell, block_size = secondary_index_size + (b * row_size), total_size; g_assert(a > 0); g_assert(b > 0); g_assert(c > 0); g_assert(cell > 0); g_assert(Matrix3d_size(a, b, c, cell)); block_size += (block_size % sizeof(gpointer)); /* Pad */ total_size = primary_index_size + (a * block_size); m = g_malloc0(total_size); Matrix_layout((gpointer*)m, a, (gchar*)&m[a], block_size); for(i = 0; i < a; i++) Matrix_layout((gpointer*)m[i], b, (gchar*)m[i]+secondary_index_size, row_size); return (gpointer**)m; } /**/ gsize Matrix4d_size(gint a, gint b, gint c, gint d, gsize cell){ register gulong primary_index_size = a * sizeof(gpointer), secondary_index_size = b * sizeof(gpointer), tertiary_index_size = c * sizeof(gpointer), row_size = d * cell, block_size = tertiary_index_size + (c * row_size), sheet_size, total_size; register gdouble check_primary_index_size = a * sizeof(gpointer), check_secondary_index_size = b * sizeof(gpointer), check_tertiary_index_size = c * sizeof(gpointer), check_row_size = d * cell, check_block_size = check_tertiary_index_size + (c * check_row_size), check_sheet_size, check_total_size; g_assert(a > 0); g_assert(b > 0); g_assert(c > 0); g_assert(d > 0); g_assert(cell > 0); block_size += (block_size % sizeof(gpointer)); /* Pad */ check_block_size += (block_size % sizeof(gpointer)); /* Pad */ sheet_size = secondary_index_size + (b * block_size), check_sheet_size = check_secondary_index_size + (b * check_block_size), sheet_size += (sheet_size % sizeof(gpointer)); /* Pad */ check_sheet_size += (sheet_size % sizeof(gpointer)); /* Pad */ total_size = primary_index_size + (a * sheet_size); check_total_size = check_primary_index_size + (a * check_sheet_size); if((check_total_size - total_size) > 1) /* overflow */ return 0; return total_size; } gpointer ***Matrix4d_create(gint a, gint b, gint c, gint d, gsize cell){ register gint i, j; register gchar ***m; register gpointer *ptr; register gulong primary_index_size = a * sizeof(gpointer), secondary_index_size = b * sizeof(gpointer), tertiary_index_size = c * sizeof(gpointer), row_size = d * cell, block_size = tertiary_index_size + (c * row_size), sheet_size, total_size; g_assert(a > 0); g_assert(b > 0); g_assert(c > 0); g_assert(d > 0); g_assert(cell > 0); g_assert(Matrix4d_size(a, b, c, d, cell)); block_size += (block_size % sizeof(gpointer)); /* Pad */ sheet_size = secondary_index_size + (b * block_size), sheet_size += (sheet_size % sizeof(gpointer)); /* Pad */ total_size = primary_index_size + (a * sheet_size); m = g_malloc0(total_size); Matrix_layout((gpointer*)m, a, (gchar*)&m[a], sheet_size); for(i = 0; i < a; i++){ ptr = (gpointer*)m[i]; Matrix_layout(ptr, b, (gchar*)ptr+secondary_index_size, block_size); for(j = 0; j < b; j++){ Matrix_layout((gpointer*)ptr[j], c, (gchar*)ptr[j]+tertiary_index_size, row_size); } } return (gpointer***)m; } /* FIXME: tidy and reduce redundancy between create and size functions */ exonerate-2.4.0/src/struct/pqueue.test.c0000644000175000017500000000752311162714342015201 00000000000000/****************************************************************\ * * * Priority queue library using pairing heaps. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "pqueue.h" #include /* For strcmp() */ typedef struct { gchar *word; gint real_priority; gint priority1; gint priority2; gint priority3; } Data; #define DATA_TOTAL 12 static gboolean test_comp_low_int(gpointer low, gpointer high, gpointer user_data){ register Data *low_d = (Data*)low, *high_d = (Data*)high; return low_d->real_priority < high_d->real_priority; } static gboolean test_traverse_func(gpointer data, gpointer user_data){ register Data *d = (Data*)data; g_print("Traverse [%s]\n", d->word); return FALSE; } int main(void){ register guint i; register PQueueSet *pq_set = PQueueSet_create(); register PQueue *pq = PQueue_create(pq_set, test_comp_low_int, NULL); register Data *dn; PQueue_Node *acc[DATA_TOTAL]; Data data[DATA_TOTAL] = { { "REMOVE-1", -1, 300, 200, 50}, { "jumps", -1, 39, 38, 32}, { "dog", -1, 72, 61, 60}, { "brown", -1, 99, 40, 8}, { "over", -1, 68, 35, 33}, { "REMOVE-2", -1, 25, 20, 15}, { "fox", -1, 20, 19, 17}, { "quick", -1, 99, 19, 3}, { "lazy", -1, 99, 67, 51}, { "the", -1, 3, 2, 1}, { "a", -1, 82, 41, 34}, { "REMOVE-3", -1, 40, 30, 20} }; gint r[3] = {0, 5, 11}; gchar *answer[DATA_TOTAL-3] = { "the", "quick", "brown", "fox", "jumps", "over", "a", "lazy", "dog"}; g_print("Adding words with initial priorities\n"); for(i = 0; i < DATA_TOTAL; i++){ g_print("Adding [%s]\n", data[i].word); data[i].real_priority = data[i].priority1; acc[i] = PQueue_push(pq, (gpointer)&data[i]); } g_print("Set intermediate priorities\n"); for(i = 0; i < DATA_TOTAL; i++){ g_print("Raising [%s] from %d to %d\n", data[i].word, data[i].priority1, data[i].priority2); data[i].real_priority = data[i].priority2; PQueue_raise(pq, acc[i]); } g_print("Set final priorities\n"); for(i = 0; i < DATA_TOTAL; i++){ g_print("Raising [%s] from %d to %d\n", data[i].word, data[i].priority2, data[i].priority3); data[i].real_priority = data[i].priority3; PQueue_raise(pq, acc[i]); } g_print("Testing traverse\n"); PQueue_traverse(pq, test_traverse_func, NULL); for(i = 0; i < 3; i++){ g_print("Testing remove of \"%s\"\n", data[r[i]].word); PQueue_remove(pq, acc[r[i]]); g_print("Testing traverse\n"); PQueue_traverse(pq, test_traverse_func, NULL); } g_print("Popping data\n"); for(i = 0; i < (DATA_TOTAL-3); i++){ dn = PQueue_pop(pq); g_print("Word: [%s] Answer: [%s]\n", dn->word, answer[i]); g_assert(!strcmp(dn->word, answer[i])); } PQueue_destroy(pq, NULL, NULL); PQueueSet_destroy(pq_set); return 0; } exonerate-2.4.0/src/struct/pqueue.c0000644000175000017500000001736011220200532014205 00000000000000/****************************************************************\ * * * Priority queue library using pairing heaps. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "pqueue.h" PQueueSet *PQueueSet_create(void){ register PQueueSet *pq_set = g_new(PQueueSet, 1); pq_set->pq_recycle = RecycleBin_create("PQueue", sizeof(PQueue), 64); pq_set->node_recycle = RecycleBin_create("PQueue_Node", sizeof(PQueue), 64); return pq_set; } void PQueueSet_destroy(PQueueSet *pq_set){ g_assert(pq_set); RecycleBin_destroy(pq_set->pq_recycle); RecycleBin_destroy(pq_set->node_recycle); g_free(pq_set); return; } PQueue *PQueue_create(PQueueSet *pq_set, PQueue_Compare_Func comp_func, gpointer user_data){ register PQueue *pq; g_assert(pq_set); g_assert(comp_func); pq = RecycleBin_alloc(pq_set->pq_recycle); pq->root = NULL; pq->total = 0; pq->comp_func = comp_func; pq->set = pq_set; pq->user_data = user_data; return pq; } static void PQueue_Node_destroy(PQueueSet *pq_set, PQueue_Node *pqn){ if(pqn){ PQueue_Node_destroy(pq_set, pqn->left); PQueue_Node_destroy(pq_set, pqn->next); RecycleBin_recycle(pq_set->node_recycle, pqn); } return; } static void PQueue_Node_destroy_with_func(PQueueSet *pq_set, PQueue_Node *pqn, PQueue_Free_Func free_func, gpointer *user_data){ if(pqn){ PQueue_Node_destroy_with_func(pq_set, pqn->left, free_func, user_data); PQueue_Node_destroy_with_func(pq_set, pqn->next, free_func, user_data); free_func(pqn->data, user_data); RecycleBin_recycle(pq_set->node_recycle, pqn); } return; } void PQueue_destroy(PQueue *pq, PQueue_Free_Func free_func, gpointer user_data){ if(free_func) PQueue_Node_destroy_with_func(pq->set, pq->root, free_func, user_data); else PQueue_Node_destroy(pq->set, pq->root); RecycleBin_recycle(pq->set->pq_recycle, pq); return; } #ifndef Swap #define Swap(x,y,temp) ((temp)=(x),(x)=(y),(y)=(temp)) #endif /* Swap */ static PQueue_Node *PQueue_Node_order(PQueue *pq, PQueue_Node *a, PQueue_Node *b){ register PQueue_Node *tmp; g_assert(a); g_assert(b); if(pq->comp_func(a->data, b->data, pq->user_data)){ /* Add b before a */ a->next = b->next; if(a->next) a->next->prev = a; Swap(a, b, tmp); } else { /* Add a before b */ b->prev = a->prev; } a->prev = b; a->next = b->left; if(a->next) a->next->prev = a; b->left = a; return b; } PQueue_Node *PQueue_push(PQueue *pq, gpointer data){ register PQueue_Node *pqn = RecycleBin_alloc(pq->set->node_recycle); /* Create empty heap and meld with existing heap */ pqn->data = data; pqn->left = pqn->next = pqn->prev = NULL; if(pq->root) pq->root = PQueue_Node_order(pq, pq->root, pqn); else pq->root = pqn; /* Is the first node */ pq->total++; return pqn; } static void PQueue_Node_extract(PQueue_Node *pqn){ if(pqn->next) pqn->next->prev = pqn->prev; if(pqn->prev->left == pqn) pqn->prev->left = pqn->next; else pqn->prev->next = pqn->next; pqn->next = NULL; return; } void PQueue_raise(PQueue *pq, PQueue_Node *pqn){ if(pq->root == pqn) /* If the root node */ return; PQueue_Node_extract(pqn); pq->root = PQueue_Node_order(pq, pq->root, pqn); return; } void PQueue_change(PQueue *pq, PQueue_Node *pqn){ register PQueue_Node *npqn; npqn = PQueue_push(pq, PQueue_remove(pq, pqn)); g_assert(npqn == pqn); return; } static PQueue_Node *PQueue_Node_combine(PQueue *pq, PQueue_Node *pqn){ register gint i, count; register GPtrArray *combine; register PQueue_Node *npqn; if(!pqn->next){ /* Single member */ return pqn; } combine = g_ptr_array_new(); for(count = 0; pqn; count++){ g_ptr_array_add(combine, pqn); pqn->prev->next = NULL; /* Break links */ pqn = pqn->next; } count--; for(i = 0; i < count; i+=2) combine->pdata[i] = PQueue_Node_order(pq, combine->pdata[i], combine->pdata[i+1]); if(!(count&1)) /* Pick up spare if odd count */ combine->pdata[i-2] = PQueue_Node_order(pq, combine->pdata[i-2], combine->pdata[i]); for(i-=2; i>=2; i-=2) combine->pdata[i-2] = PQueue_Node_order(pq, combine->pdata[i-2], combine->pdata[i]); npqn = combine->pdata[0]; g_ptr_array_free(combine, TRUE); return npqn; } gpointer PQueue_pop(PQueue *pq){ register PQueue_Node *pqn = pq->root; register gpointer data; if(!pq->root) return NULL; /* Queue is empty */ data = pq->root->data; g_assert((pq->total <= 1) || pq->root->left); pq->root = pq->root->left ? PQueue_Node_combine(pq, pq->root->left) : NULL; g_assert(pqn); RecycleBin_recycle(pq->set->node_recycle, pqn); if(!--pq->total) /* Set root as NULL if PQueue is now empty */ pq->root = NULL; return data; } PQueue *PQueue_join(PQueue *a, PQueue *b){ g_assert(a->set == b->set); g_assert(a->comp_func == b->comp_func); g_assert(a->user_data == b->user_data); a->root = PQueue_Node_combine(a, b->root); a->total += b->total; RecycleBin_recycle(a->set->pq_recycle, b); return a; } gpointer PQueue_remove(PQueue *pq, PQueue_Node *pqn){ register gpointer data = pqn->data; register PQueue_Node *nb; g_assert(pqn); g_assert(pq->root); if(pqn == pq->root) return PQueue_pop(pq); g_assert(pq->total > 1); PQueue_Node_extract(pqn); if(--pq->total){ if(pqn->left){ nb = PQueue_Node_combine(pq, pqn->left); pq->root = PQueue_Node_order(pq, pq->root, nb); } } else { pq->root = NULL; } RecycleBin_recycle(pq->set->node_recycle, pqn); return data; } static void PQueue_traverse_recur(PQueue *pq, PQueue_Node *pqn, PQueue_Traverse_Func tf, gpointer user_data, gboolean *stop){ if((!pqn || (*stop))) return; (*stop) = tf(pqn->data, user_data); PQueue_traverse_recur(pq, pqn->left, tf, user_data, stop); PQueue_traverse_recur(pq, pqn->next, tf, user_data, stop); return; } void PQueue_traverse(PQueue *pq, PQueue_Traverse_Func tf, gpointer user_data){ gboolean stop = FALSE; PQueue_traverse_recur(pq, pq->root, tf, user_data, &stop); return; } exonerate-2.4.0/src/struct/recyclebin.c0000644000175000017500000001266111220200470015020 00000000000000/****************************************************************\ * * * Efficient Memory Allocation Routines * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "recyclebin.h" #ifndef USE_PTHREADS static GTree *global_recycle_bin_tree = NULL; #endif /* USE_PTHREADS */ #ifndef USE_PTHREADS static gint RecycleBin_compare(gconstpointer a, gconstpointer b){ return a - b; } #endif /* USE_PTHREADS */ RecycleBin *RecycleBin_create(gchar *name, gsize node_size, gint nodes_per_chunk){ register RecycleBin *recycle_bin = g_new(RecycleBin, 1); g_assert(name); g_assert(node_size >= sizeof(gpointer)); recycle_bin->ref_count = 1; recycle_bin->name = g_strdup(name); recycle_bin->chunk_list = g_ptr_array_new(); recycle_bin->chunk_pos = nodes_per_chunk; recycle_bin->nodes_per_chunk = nodes_per_chunk; recycle_bin->node_size = node_size; recycle_bin->count = 0; recycle_bin->recycle = NULL; #ifndef USE_PTHREADS if(!global_recycle_bin_tree) global_recycle_bin_tree = g_tree_new(RecycleBin_compare); g_tree_insert(global_recycle_bin_tree, recycle_bin, recycle_bin); #endif /* USE_PTHREADS */ return recycle_bin; } void RecycleBin_destroy(RecycleBin *recycle_bin){ register gint i; if(--recycle_bin->ref_count) return; #ifndef USE_PTHREADS g_assert(global_recycle_bin_tree); g_assert(g_tree_lookup(global_recycle_bin_tree, recycle_bin)); g_tree_remove(global_recycle_bin_tree, recycle_bin); if(!g_tree_nnodes(global_recycle_bin_tree)){ g_tree_destroy(global_recycle_bin_tree); global_recycle_bin_tree = NULL; } #endif /* USE_PTHREADS */ for(i = 0; i < recycle_bin->chunk_list->len; i++) g_free(recycle_bin->chunk_list->pdata[i]); g_ptr_array_free(recycle_bin->chunk_list, TRUE); g_free(recycle_bin->name); g_free(recycle_bin); return; } RecycleBin *RecycleBin_share(RecycleBin *recycle_bin){ g_assert(recycle_bin); recycle_bin->ref_count++; return recycle_bin; } gsize RecycleBin_memory_usage(RecycleBin *recycle_bin){ g_assert(recycle_bin); return recycle_bin->node_size * recycle_bin->nodes_per_chunk * recycle_bin->chunk_list->len; } gpointer RecycleBin_alloc(RecycleBin *recycle_bin){ register RecycleBin_Node *node; register gchar *chunk; g_assert(recycle_bin); if(recycle_bin->recycle){ node = (gpointer)recycle_bin->recycle; recycle_bin->recycle = node->next; } else { if(recycle_bin->chunk_pos == recycle_bin->nodes_per_chunk){ chunk = g_malloc(recycle_bin->nodes_per_chunk * recycle_bin->node_size); g_ptr_array_add(recycle_bin->chunk_list, chunk); recycle_bin->chunk_pos = 1; node = (RecycleBin_Node*)chunk; } else { chunk = recycle_bin->chunk_list->pdata [recycle_bin->chunk_list->len-1]; node = (RecycleBin_Node*) (chunk + (recycle_bin->node_size * recycle_bin->chunk_pos++)); } } recycle_bin->count++; return (gpointer)node; } gpointer RecycleBin_alloc_blank(RecycleBin *recycle_bin){ register gpointer data = RecycleBin_alloc(recycle_bin); memset(data, 0, recycle_bin->node_size); return data; } void RecycleBin_recycle(RecycleBin *recycle_bin, gpointer data){ register RecycleBin_Node *node = data; g_assert(node != recycle_bin->recycle); node->next = recycle_bin->recycle; recycle_bin->recycle = node; recycle_bin->count--; return; } #ifndef USE_PTHREADS static gint RecycleBin_profile_traverse(gpointer key, gpointer value, gpointer data){ register RecycleBin *recycle_bin = value; g_assert(recycle_bin); g_message(" RecycleBin [%s] %d Mb (%d items)", recycle_bin->name, (gint)(RecycleBin_memory_usage(recycle_bin)>>20), recycle_bin->count); return FALSE; } #endif /* USE_PTHREADS */ void RecycleBin_profile(void){ g_message("BEGIN RecycleBin profile"); #ifdef USE_PTHREADS g_message("multi-threaded RecycleBin_profile() not implemented"); #else /* USE_PTHREADS */ if(global_recycle_bin_tree) g_tree_traverse(global_recycle_bin_tree, RecycleBin_profile_traverse, G_IN_ORDER, NULL); else g_message("no active RecycleBins"); #endif /* USE_PTHREADS */ g_message("END RecycleBin profile"); return; } /* FIXME: could use Trash Stacks when migrating to glib-2 */ exonerate-2.4.0/src/struct/slist.test.c0000644000175000017500000000427111162714343015031 00000000000000/****************************************************************\ * * * Efficient single-linked list routines. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strcmp() */ #include "slist.h" typedef struct { gchar *crib[9]; gint count; } SList_TestInfo; static gboolean slist_test_traverse(gpointer data, gpointer user_data){ register gchar *word = data; SList_TestInfo *sti = user_data; g_message("Word [%5s] crib[%5s] count[%d]", word, sti->crib[sti->count], sti->count); g_assert(!strcmp(word, sti->crib[sti->count])); sti->count++; return FALSE; } int main(void){ register SListSet *slist_set = SListSet_create(); register SList *a = SList_create(slist_set), *b = SList_create(slist_set), *c; SList_TestInfo sti = { {"The", "quick", "brown", "fox", "jumps", "over", "a", "lazy", "dog"}, 0}; g_message("check [%d][%d]", SList_isempty(a), SList_isempty(b)); SList_queue(a, "jumps"); SList_stack(a, "fox"); SList_queue(a, "over"); SList_stack(a, "brown"); SList_queue(a, "a"); SList_stack(b, "quick"); SList_queue(a, "lazy"); SList_stack(b, "The"); SList_queue(a, "dog"); c = SList_join(b, a); SList_destroy(a); SList_traverse(c, slist_test_traverse, &sti); SList_destroy(c); SListSet_destroy(slist_set); return 0; } exonerate-2.4.0/src/struct/slist.c0000644000175000017500000001524111162714342014051 00000000000000/****************************************************************\ * * * Efficient single-linked list routines. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "slist.h" /**/ SListSet *SListSet_create(void){ register SListSet *slist_set = g_new(SListSet, 1); slist_set->list_recycle = RecycleBin_create("SList", sizeof(SList), 256); slist_set->node_recycle = RecycleBin_create("SListNode", sizeof(SListNode), 1024); return slist_set; } void SListSet_destroy(SListSet *slist_set){ g_assert(slist_set); RecycleBin_destroy(slist_set->list_recycle); RecycleBin_destroy(slist_set->node_recycle); g_free(slist_set); return; } gsize SListSet_memory_usage(SListSet *slist_set){ return sizeof(SListSet) + RecycleBin_memory_usage(slist_set->list_recycle) + RecycleBin_memory_usage(slist_set->node_recycle); } /**/ static SListNode *SListNode_alloc(SListSet *slist_set){ register SListNode *sln = RecycleBin_alloc(slist_set->node_recycle); sln->next = NULL; sln->data = NULL; return sln; } static void SListNode_free(SListSet *slist_set, SListNode *sln){ RecycleBin_recycle(slist_set->node_recycle, sln); return; } /**/ SList *SList_create(SListSet *slist_set){ register SList *slist; g_assert(slist_set); slist = RecycleBin_alloc(slist_set->list_recycle); slist->head = slist->tail = SListNode_alloc(slist_set); slist->set = slist_set; return slist; } void SList_empty(SList *slist){ g_assert(slist); while(!SList_isempty(slist)) SList_pop(slist); return; } void SList_destroy(SList *slist){ g_assert(slist); SList_empty(slist); RecycleBin_recycle(slist->set->list_recycle, slist); return; } void SList_empty_with_data(SList *slist, SList_DestroyFunc destroy_func, gpointer user_data){ register gpointer *data; g_assert(slist); g_assert(destroy_func); while(!SList_isempty(slist)){ data = SList_pop(slist); destroy_func(data, user_data); } return; } void SList_destroy_with_data(SList *slist, SList_DestroyFunc destroy_func, gpointer user_data){ g_assert(slist); g_assert(destroy_func); SList_empty_with_data(slist, destroy_func, user_data); SList_destroy(slist); return; } void SList_queue(SList *slist, gpointer data){ g_assert(slist); slist->tail->next = SListNode_alloc(slist->set); slist->tail = slist->tail->next; slist->tail->data = data; return; } void SList_stack(SList *slist, gpointer data){ register SListNode *sln; g_assert(slist); sln = SListNode_alloc(slist->set); slist->head->data = data; sln->next = slist->head; slist->head = sln; return; } gpointer SList_pop(SList *slist){ register gpointer data; register SListNode *sln; g_assert(slist); data = slist->head->next->data; sln = slist->head; slist->head = sln->next; SListNode_free(slist->set, sln); return data; } SList *SList_join(SList *left, SList *right){ register SListNode *head, *tail; g_assert(left); g_assert(right); g_assert(left->set == right->set); if(SList_isempty(right)) return left; if(SList_isempty(left)){ head = left->head; tail = left->tail; left->head = right->head; left->tail = right->tail; right->head = head; right->tail = tail; return left; } g_assert(!SList_isempty(right)); left->tail->next = right->head->next; left->tail = right->tail; right->tail = right->head; right->head->next = NULL; return left; } SList *SList_merge(SList *left, SList *right, SList_CompareFunc compare_func, gpointer user_data){ register SList *merged = SList_create(left->set); register SList swap; g_assert(left->set == right->set); while(!(SList_isempty(left) || SList_isempty(right))){ if(compare_func(left->head->next->data, right->head->next->data, user_data)){ SList_queue(merged, SList_pop(left)); } else { SList_queue(merged, SList_pop(right)); } } SList_join(merged, left); SList_join(merged, right); SList_destroy(left); /* Swap left and merged */ swap.head = left->head; swap.tail = left->tail; left->head = merged->head; left->tail = merged->tail; merged->head = swap.head; merged->tail = swap.tail; return merged; } /* FIXME: optimisation : more efficient without the pop/queue */ void SList_remove(SList *slist, SListNode *prev_node){ register SListNode *sln; g_assert(slist); if(!prev_node){ SList_pop(slist); return; } if(prev_node->next){ sln = prev_node->next; prev_node->next = prev_node->next->next; SListNode_free(slist->set, sln); } return; } void SList_insert(SList *slist, SListNode *prev_node, SList *new_slist){ register SListNode *sln; g_assert(slist); g_assert(new_slist); g_assert(slist->set == new_slist->set); if(!prev_node) prev_node = slist->head; sln = prev_node->next; if(!sln) slist->tail = new_slist->tail; prev_node->next = new_slist->head->next; new_slist->tail->next = sln; SListNode_free(new_slist->set, new_slist->head); RecycleBin_recycle(new_slist->set->list_recycle, new_slist); return; } void SList_traverse(SList *slist, SList_TraverseFunc traverse_func, gpointer user_data){ register SListNode *sln; g_assert(slist); g_assert(traverse_func); for(sln = slist->head->next; sln; sln = sln->next) if(traverse_func(sln->data, user_data)) break; return; } exonerate-2.4.0/src/struct/rangetree.test.c0000644000175000017500000000415111162714342015643 00000000000000/****************************************************************\ * * * Trees for 2D range searching. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "rangetree.h" /* 9 1 8 1 7 +-------+ 6 | 1 | 5 1 2 | 1 4 | | 3 +-------1 2 1 1 0 1 2 3 4 5 6 7 8 9 */ static gboolean test_rtrf(gint x, gint y, gpointer info, gpointer user_data){ register gint *total = user_data; (*total)++; g_message("Found (%d,%d) (count:%d)", x, y, *total); return FALSE; } int main(void){ register RangeTree *rt = RangeTree_create(); gint total = 0; RangeTree_add(rt, 3, 5, NULL); /* within */ RangeTree_add(rt, 5, 8, NULL); RangeTree_add(rt, 5, 5, NULL); /* within */ RangeTree_add(rt, 2, 1, NULL); RangeTree_add(rt, 9, 5, NULL); RangeTree_add(rt, 7, 3, NULL); /* within */ RangeTree_add(rt, 3, 9, NULL); RangeTree_add(rt, 6, 6, NULL); /* within */ /**/ RangeTree_find(rt, 3, 5, 3, 5, test_rtrf, &total); g_message("total is [%d] (expect 4)", total); g_assert(total == 4); /**/ total = 0; RangeTree_find(rt, 5, 1, 5, 1, test_rtrf, &total); g_message("total is [%d] (expect 1)", total); g_assert(total == 1); /**/ RangeTree_destroy(rt, NULL, NULL); return 0; } exonerate-2.4.0/src/struct/rangetree.c0000644000175000017500000001670011162714342014670 00000000000000/****************************************************************\ * * * Trees for 2D range searching. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For lrand48() */ #include "rangetree.h" static gint RangeTree_recent_data_compare(gconstpointer a, gconstpointer b){ register RangeTree_Node *rtn_a = (RangeTree_Node*)a, *rtn_b = (RangeTree_Node*)b; if(rtn_a->x == rtn_b->x) return rtn_b->y - rtn_a->y; return rtn_b->x - rtn_a->x; } RangeTree *RangeTree_create(void){ register RangeTree *rt = g_new(RangeTree, 1); rt->root = NULL; rt->recent_data = g_tree_new(RangeTree_recent_data_compare); rt->node_recycle = RecycleBin_create("RangeTree_Node", sizeof(RangeTree_Node), 256); return rt; } void RangeTree_destroy(RangeTree *rt, RangeTree_FreeFunc rtff, gpointer user_data){ g_tree_destroy(rt->recent_data); RecycleBin_destroy(rt->node_recycle); g_free(rt); return; } void RangeTree_add(RangeTree *rt, gint x, gint y, gpointer info){ register RangeTree_Node *rtn = RecycleBin_alloc(rt->node_recycle); g_assert(!RangeTree_check_pos(rt, x, y)); rtn->x = x; rtn->y = y; rtn->info = info; rtn->left = NULL; rtn->right = NULL; g_assert(!g_tree_lookup(rt->recent_data, rtn)); g_tree_insert(rt->recent_data, rtn, rtn); return; } typedef struct { gint tl_x; gint tl_y; gint br_x; gint br_y; } RangeTree_Rectangle; static gboolean RangeTree_inside_rectangle(RangeTree_Rectangle *r, RangeTree_Node *n){ if((n->x < r->tl_x) || (n->y < r->tl_y) || (n->x >= r->br_x) || (n->y >= r->br_y)) return FALSE; return TRUE; } static void RangeTree_find_recur(RangeTree_Node *n, gboolean dir, RangeTree_Rectangle *r, RangeTree_ReportFunc rtrf, gpointer user_data, gboolean *found){ if(!n) return; if(dir ? (r->tl_x < n->x) : (r->tl_y < n->y)) RangeTree_find_recur(n->left, dir^1, r, rtrf, user_data, found); if(*found) return; if(RangeTree_inside_rectangle(r, n)){ if(rtrf(n->x, n->y, n->info, user_data)){ (*found) = TRUE; return; } } if(dir ? (n->x <= r->br_x) : (n->y <= r->br_y)) RangeTree_find_recur(n->right, dir^1, r, rtrf, user_data, found); return; } static void RangeTree_insert(RangeTree *rt, RangeTree_Node *rtn){ register RangeTree_Node *n, *parent; register gboolean dir, dim = FALSE; if(!rt->root){ rt->root = rtn; return; } for(n = parent = rt->root; n; dim ^= 1){ dir = dim ? (rtn->x < n->x) : (rtn->y < n->y); parent = n; n = dir ? n->left : n->right; } if(dir) parent->left = rtn; else parent->right = rtn; return; } static void RangeTree_insert_list(RangeTree *rt, GPtrArray *list){ register gint i, j; register RangeTree_Node *rtn, *swap_rtn; srand(list->len); for(i = 0; i < list->len; i++){ /* Shuffle list */ rtn = list->pdata[i]; j = ((gint)lrand48()) % list->len; swap_rtn = list->pdata[j]; list->pdata[j] = rtn; list->pdata[i] = swap_rtn; } for(i = 0; i < list->len; i++){ /* Insert list */ rtn = list->pdata[i]; RangeTree_insert(rt, rtn); } return; } /* The list contains RangeTree_Data objects to be inserted. * As the range tree is not balanced, this allows * a set of points to be inserted in random order * to avoid the worst-case performance of the RangeTree. */ static gint RangeTree_insert_recent_collect(gpointer key, gpointer value, gpointer data){ register GPtrArray *list = data; g_ptr_array_add(list, value); return FALSE; } static void RangeTree_insert_recent(RangeTree *rt){ register GPtrArray *node_list = g_ptr_array_new(); g_tree_traverse(rt->recent_data, RangeTree_insert_recent_collect, G_IN_ORDER, node_list); if(node_list->len){ RangeTree_insert_list(rt, node_list); g_tree_destroy(rt->recent_data); rt->recent_data = g_tree_new(RangeTree_recent_data_compare); } g_ptr_array_free(node_list, TRUE); return; } static gboolean RangeTree_find_internal(RangeTree *rt, gint x_start, gint x_length, gint y_start, gint y_length, RangeTree_ReportFunc rtrf, gpointer user_data){ gboolean found = FALSE; RangeTree_Rectangle r; g_assert(x_length >= 0); g_assert(y_length >= 0); r.tl_x = x_start; r.tl_y = y_start; r.br_x = x_start + x_length; r.br_y = y_start + y_length; RangeTree_find_recur(rt->root, FALSE, &r, rtrf, user_data, &found); return found; } gboolean RangeTree_find(RangeTree *rt, gint x_start, gint x_length, gint y_start, gint y_length, RangeTree_ReportFunc rtrf, gpointer user_data){ RangeTree_insert_recent(rt); return RangeTree_find_internal(rt, x_start, x_length, y_start, y_length, rtrf, user_data); } static gboolean RangeTree_find_point(gint x, gint y, gpointer info, gpointer user_data){ return TRUE; } gboolean RangeTree_check_pos(RangeTree *rt, gint x, gint y){ RangeTree_Node rtn; rtn.x = x; rtn.y = y; if(g_tree_lookup(rt->recent_data, &rtn)) return TRUE; return RangeTree_find_internal(rt, x, 1, y, 1, RangeTree_find_point, NULL); } gboolean RangeTree_is_empty(RangeTree *rt){ RangeTree_insert_recent(rt); return rt->root?FALSE:TRUE; } static gboolean RangeTree_traverse_recur(RangeTree_Node *rtn, RangeTree_ReportFunc rtrf, gpointer user_data){ if(!rtn) return FALSE; if(RangeTree_traverse_recur(rtn->left, rtrf, user_data) || rtrf(rtn->x, rtn->y, rtn->info, user_data) || RangeTree_traverse_recur(rtn->right, rtrf, user_data)) return TRUE; return FALSE; } gboolean RangeTree_traverse(RangeTree *rt, RangeTree_ReportFunc rtrf, gpointer user_data){ return RangeTree_traverse_recur(rt->root, rtrf, user_data); } exonerate-2.4.0/src/struct/vfsm.test.c0000644000175000017500000000400011162714343014634 00000000000000/****************************************************************\ * * * VFSM Library : complete trie navigation * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "vfsm.h" int main(void){ register gint i, j; register VFSM *vfsm = VFSM_create("ACGT", 12); register guchar *seq = (guchar*) "CGATCGATCTGATCGTAGNTAGCTCGATCGATGNAGCTAGC"; register VFSM_Int state = 0; register gchar *testword; gchar word[13]; VFSM_info(vfsm); for(i = j = 0; seq[i]; i++){ if(!vfsm->index[seq[i]]){ state = 0; continue; } state = VFSM_change_state_MPOW2(vfsm, state, seq[i]); if(VFSM_state2word(vfsm, state, word)) g_message("pos=[%2d] word_number=[%d] word=[%s]", i, ++j, word); } testword = "ACGTAACCGGTT"; g_message("Using test word [%s]", testword); state = VFSM_word2state(vfsm, testword); g_message("Gives state [%d]", (gint)state); state = VFSM_jump(vfsm, state, 2, 'A'); g_message("Jump to state [%d]", (gint)state); g_message("Convert status [%s]", VFSM_state2word(vfsm, state, word)?"true":"false"); g_message("Jump 2:G->A [%s]", word); VFSM_destroy(vfsm); return 0; } exonerate-2.4.0/src/struct/vfsm.c0000644000175000017500000001611511176570150013671 00000000000000/****************************************************************\ * * * VFSM Library : complete trie navigation * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ /* Based on Virtual Finite State Machine routines * from the ESTate package (by the same author). * http://www.hgmp.mrc.ac.uk/~gslater/ESTate/ */ #include /* For strlen() */ #include /* For ffs() */ #include /* For pow() */ #include "vfsm.h" #ifndef G_GNUC_EXTENSION #define G_GNUC_EXTENSION #endif /* G_GNUC_EXTENSION */ VFSM *VFSM_create(gchar *alphabet, guint depth){ register VFSM *vfsm = g_new0(VFSM, 1); register gint i; register VFSM_Int tmp; g_assert(depth); g_assert(alphabet); vfsm->alphabet_size = strlen(alphabet); if(pow(vfsm->alphabet_size, depth) >= VFSM_MAX){ g_free(vfsm); return NULL; } vfsm->alphabet = g_strdup(alphabet); vfsm->depth = depth; g_assert(vfsm->depth >= 3); /* FIXME: fix for depth < 3 */ tmp = vfsm->prs = vfsm->alphabet_size; for(i = 3; i < depth; i++){ tmp *= vfsm->alphabet_size; vfsm->prs += tmp; } vfsm->prs++; vfsm->lrs = vfsm->prs + (tmp * vfsm->alphabet_size); vfsm->prw = vfsm->lrs - vfsm->prs; vfsm->lrw = vfsm->prw * vfsm->alphabet_size; vfsm->total = vfsm->lrw + vfsm->lrs; if(vfsm->alphabet_size & (vfsm->alphabet_size-1)){ vfsm->is_poweroftwo = FALSE; vfsm->log_alphabet_size = 0; } else { vfsm->is_poweroftwo = TRUE; vfsm->log_alphabet_size = ffs(vfsm->alphabet_size)-1; } for(i = 0; i < vfsm->alphabet_size; i++) vfsm->index[(guchar)vfsm->alphabet[i]] = i+1; vfsm->branch_size = g_new(VFSM_Int, depth); tmp = 1; for(i = 0; i < depth; i++){ vfsm->branch_size[depth-i-1] = tmp; tmp *= vfsm->alphabet_size; } return vfsm; } void VFSM_destroy(VFSM *vfsm){ g_free(vfsm->branch_size); g_free(vfsm->alphabet); g_free(vfsm); return; } void VFSM_info(VFSM *vfsm){ register gint i; G_GNUC_EXTENSION /* Allow %llu without pedantic warning */ g_message("VFSM info:\n" " alphabet = [%s]\n" " alphabet_size = %d\n" " depth = %d\n" " is_poweroftwo = [%s]\n" " log_alphabet_size = %d\n" " prs = %" VFSM_PRINT_FORMAT "\n" " prw = %" VFSM_PRINT_FORMAT "\n" " lrs = %" VFSM_PRINT_FORMAT "\n" " lrw = %" VFSM_PRINT_FORMAT "\n" " total = %" VFSM_PRINT_FORMAT "\n", vfsm->alphabet, vfsm->alphabet_size, vfsm->depth, vfsm->is_poweroftwo?"TRUE":"FALSE", vfsm->log_alphabet_size, vfsm->prs, vfsm->prw, vfsm->lrs, vfsm->lrw, vfsm->total ); for(i = 0; i < vfsm->depth; i++){ G_GNUC_EXTENSION /* Allow %llu without pedantic warning */ g_print(" branch_size [%d] [%" VFSM_PRINT_FORMAT "]\n", i, vfsm->branch_size[i]); } g_print("--\n"); return; } static gboolean VFSM_word_is_valid_leaf(VFSM *vfsm, gchar *word){ register gint i; for(i = 0; word[i]; i++) if(!vfsm->index[(guchar)word[i]]) return FALSE; if(i != vfsm->depth) return FALSE; return TRUE; } VFSM_Int VFSM_word2state(VFSM *vfsm, gchar *word){ register gint i; register VFSM_Int pos; g_assert(VFSM_word_is_valid_leaf(vfsm, word)); if(vfsm->is_poweroftwo) for(i = 0, pos = 0; i < vfsm->depth; i++){ pos <<= vfsm->log_alphabet_size; pos |= vfsm->index[(guchar)word[i]]-1; } else for(i = 0, pos = 0; i < vfsm->depth; i++){ pos *= vfsm->alphabet_size; pos += vfsm->index[(guchar)word[i]]-1; } return VFSM_leaf2state(vfsm, pos); } gboolean VFSM_state2word(VFSM *vfsm, VFSM_Int state, gchar *word){ register gint i, j; g_assert(word); if(!VFSM_state_is_leaf(vfsm, state)) return FALSE; state = VFSM_state2leaf(vfsm, state); if(vfsm->is_poweroftwo) for(i = vfsm->depth-1, j = vfsm->alphabet_size-1; i >= 0; i--){ word[i] = vfsm->alphabet[state & j]; state >>= vfsm->log_alphabet_size; } else for(i = vfsm->depth-1; i >= 0; i--){ word[i] = vfsm->alphabet[state % vfsm->alphabet_size]; state /= vfsm->alphabet_size; } word[vfsm->depth] = '\0'; return TRUE; } VFSM_Int VFSM_change_state(VFSM *vfsm, VFSM_Int state, guchar ch){ if(!vfsm->index[ch]) return 0; /* Not a valid character: reset VFSM */ if(VFSM_state_is_leaf(vfsm, state)) /* -> longest suffix */ state = vfsm->prs + ((state-vfsm->lrs) % vfsm->prw); return (state * vfsm->alphabet_size) + vfsm->index[ch]; /* Descend */ } VFSM_Int VFSM_change_state_POW2(VFSM *vfsm, VFSM_Int state, guchar ch){ g_assert(vfsm->is_poweroftwo); if(!vfsm->index[ch]) return 0; /* NOT A VALID CHARACTER: RESET VFSM */ if(VFSM_state_is_leaf(vfsm, state)) /* -> longest suffix */ state = vfsm->prs + ((state-vfsm->lrs) & (vfsm->prw-1)); /* As alphabet_size is a power of 2, so will be vfsm->prw */ return (state << vfsm->log_alphabet_size) /* Descend */ + vfsm->index[ch]; } gchar VFSM_symbol_by_pos(VFSM *vfsm, VFSM_Int state, guint depth){ register guint apos; g_assert(depth < vfsm->depth); apos = VFSM_state2leaf(vfsm, state) / vfsm->branch_size[depth] % vfsm->alphabet_size; g_assert(apos < vfsm->alphabet_size); return vfsm->alphabet[apos]; } gchar VFSM_symbol_by_pos_POW2(VFSM *vfsm, VFSM_Int state, guint depth){ register guint apos; g_assert(depth < vfsm->depth); apos = (VFSM_state2leaf(vfsm, state) >> ((vfsm->depth-depth-1) << (vfsm->log_alphabet_size-1)) & (vfsm->alphabet_size-1)); g_assert(apos < vfsm->alphabet_size); return vfsm->alphabet[apos]; } VFSM_Int VFSM_jump(VFSM *vfsm, VFSM_Int state, guint depth, guchar next){ register guchar curr; g_assert(vfsm); curr = VFSM_symbol_by_pos(vfsm, state, depth); return state + (vfsm->branch_size[depth] * (vfsm->index[next]-vfsm->index[curr])); } exonerate-2.4.0/src/struct/recyclebin.test.c0000644000175000017500000000257611162714342016017 00000000000000/****************************************************************\ * * * Efficient Memory Allocation Routines * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "recyclebin.h" int main(void){ register RecycleBin *rb = RecycleBin_create("test", sizeof(gpointer), 3); register gint i; register GPtrArray *list = g_ptr_array_new(); for(i = 0; i < 12; i++) g_ptr_array_add(list, RecycleBin_alloc(rb)); for(i = 0; i < 12; i++) RecycleBin_recycle(rb, list->pdata[i]); RecycleBin_destroy(rb); g_ptr_array_free(list, TRUE); return 0; } exonerate-2.4.0/src/struct/sparsecache.test.c0000644000175000017500000000570611162714343016160 00000000000000/****************************************************************\ * * * Sparse Cache Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "sparsecache.h" static gpointer test_get_func(gint pos, gpointer page_data, gpointer user_data){ register gint *data = page_data; return GINT_TO_POINTER(data[pos]); } static SparseCache_Page *test_fill_func(gint start, gpointer user_data){ register SparseCache_Page *page = g_new(SparseCache_Page, 1); register gint i, *data = g_new(gint, SparseCache_PAGE_SIZE); g_message("Filling from [%d] with [%d]", start, SparseCache_PAGE_SIZE); for(i = 0; i < SparseCache_PAGE_SIZE; i++){ data[i] = start + i; g_message(" ... filling [%d]", data[i]); } page->data = (gpointer)data; page->data_size = sizeof(gint)*SparseCache_PAGE_SIZE; page->get_func = test_get_func; g_message("done filling"); return page; } int main(void){ register SparseCache *sc = SparseCache_create(1000, test_fill_func, NULL, NULL, NULL); register gint i; for(i = 0; i < 10; i++){ g_message("checking [%d]", i); g_assert(GPOINTER_TO_INT(SparseCache_get(sc, i)) == i); } for(i = 20; i >= 0; i--){ g_message("checking [%d]", i); g_assert(GPOINTER_TO_INT(SparseCache_get(sc, i)) == i); } for(i = 30; i >= 10; i--){ g_message("checking [%d]", i); g_assert(GPOINTER_TO_INT(SparseCache_get(sc, i)) == i); } for(i = 10; i < 50; i++){ g_message("checking [%d]", i); g_assert(GPOINTER_TO_INT(SparseCache_get(sc, i)) == i); } for(i = 60; i >= 0; i--){ g_message("checking [%d]", i); g_assert(GPOINTER_TO_INT(SparseCache_get(sc, i)) == i); } for(i = 0; i < 60; i++){ g_message("checking [%d]", i); g_assert(GPOINTER_TO_INT(SparseCache_get(sc, i)) == i); } for(i = 0; i < 100; i++){ g_message("checking [%d]", i); g_assert(GPOINTER_TO_INT(SparseCache_get(sc, i)) == i); } SparseCache_destroy(sc); g_message("SparseCache OK"); return 0; } exonerate-2.4.0/src/struct/sparsecache.c0000644000175000017500000001053611173372551015202 00000000000000/****************************************************************\ * * * Sparse Cache Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "sparsecache.h" /**/ static void SparseCache_Page_destroy(SparseCache_Page *page){ if(page->data) g_free(page->data); g_free(page); return; } SparseCache *SparseCache_create(gint length, SparseCache_FillFunc fill_func, SparseCache_EmptyFunc empty_func, SparseCache_FreeFunc free_func, gpointer user_data){ register SparseCache *sc = g_new(SparseCache, 1); g_assert(fill_func); sc->ref_count = 1; sc->page_total = (length >> SparseCache_PAGE_SIZE_BIT_WIDTH) + 1; sc->page_used = 0; sc->page_list = g_new0(SparseCache_Page*, sc->page_total); sc->page_memory_usage = 0; sc->length = length; sc->fill_func = fill_func; sc->empty_func = empty_func; sc->free_func = free_func; sc->user_data = user_data; return sc; } void SparseCache_destroy(SparseCache *sc){ register SparseCache_Page *page; register gint i; if(--sc->ref_count) return; for(i = 0; i < sc->page_total; i++){ page = sc->page_list[i]; if(!page) continue; sc->page_memory_usage -= (sizeof(SparseCache_Page)+page->data_size); if(sc->empty_func) sc->empty_func(page, sc->user_data); else SparseCache_Page_destroy(page); } g_free(sc->page_list); if(sc->free_func && sc->user_data) sc->free_func(sc->user_data); g_free(sc); return; } SparseCache *SparseCache_share(SparseCache *sc){ sc->ref_count++; return sc; } static SparseCache_Page *SparseCache_fill(SparseCache *sc, gint page_id){ register SparseCache_Page *page; g_assert(!sc->page_list[page_id]); page = sc->fill_func((page_id << SparseCache_PAGE_SIZE_BIT_WIDTH), sc->user_data); sc->page_list[page_id] = page; sc->page_used++; sc->page_memory_usage += (sizeof(SparseCache_Page)+page->data_size); return page; } gpointer SparseCache_get(SparseCache *sc, gint pos){ register gint page_id = SparseCache_pos2page(pos); register SparseCache_Page *page = sc->page_list[page_id]; g_assert(pos >= 0); g_assert(pos < sc->length); if(!page) page = SparseCache_fill(sc, page_id); return page->get_func((pos & (SparseCache_PAGE_SIZE-1)), page->data, sc->user_data); } gsize SparseCache_memory_usage(SparseCache *sc){ return sizeof(SparseCache) + (sizeof(SparseCache_Page*)*sc->page_total) + sc->page_memory_usage; } void SparseCache_copy(SparseCache *sc, gint start, gint length, gchar *dst){ register gint pos = start, dst_pos = 0, page_id, copy_len, read_remain = length, page_remain, page_start; register SparseCache_Page *page; do { page_id = SparseCache_pos2page(pos); page = sc->page_list[page_id]; if(!page) page = SparseCache_fill(sc, page_id); page_start = page_id*SparseCache_PAGE_SIZE; page_remain = page_start+SparseCache_PAGE_SIZE-pos; copy_len = MIN(page_remain, read_remain); g_assert(page->copy_func); dst_pos += page->copy_func(pos-page_start, copy_len, dst+dst_pos, page->data, sc->user_data); read_remain -= copy_len; pos += copy_len; } while(read_remain > 0); return; } /**/ exonerate-2.4.0/src/struct/dejavu.test.c0000644000175000017500000000352311162714341015146 00000000000000/****************************************************************\ * * * Deja-vu library for fast linear space repeat finding * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strlen() */ #include "dejavu.h" static void test_display_repeat(gint first_pos, gint curr_pos, gint length, gchar *seq, gint len, gpointer user_data){ register gint i; register gint *total = user_data; g_print(">repeat_%d (%d, %d, %d)\n[", (*total)++, first_pos, curr_pos, length); for(i = 0; i < length; i++) g_print("%c", seq[first_pos+i]); g_print("]\n"); return; } int main(void){ register gchar *seq = "TAGACTGTAAGTCCTCTGTGTAAGCTCGTTA"; register gint min_wordlen = 1, max_wordlen = 7; register DejaVu *dv = DejaVu_create(seq, strlen(seq)); gint total = 0; DejaVu_traverse(dv, min_wordlen, max_wordlen, test_display_repeat, &total, NULL, 1); g_message("total = %d", total); g_assert(total == 82); DejaVu_destroy(dv); return 0; } exonerate-2.4.0/src/struct/dejavu.c0000644000175000017500000001456511162714341014200 00000000000000/****************************************************************\ * * * Deja-vu library for fast linear space repeat finding * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ /* Exact repeat finding in large sequences. * * This is speeded up variant of Roger Sayle's linear space * repeat finding algorithm - this algorithm only requires * O(n+k) time, where n = seq length, k = number of repeats. */ #include /* For memset() */ #include "dejavu.h" DejaVu *DejaVu_create(gchar *seq, gint len){ register DejaVu *dv = g_new(DejaVu, 1); g_assert(seq); g_assert(len >= 0); dv->seq = seq; dv->len = len; dv->next = g_new0(gint, len); dv->symbol_list_len = 0; return dv; } void DejaVu_destroy(DejaVu *dv){ g_free(dv->next); g_free(dv); return; } void DejaVu_info(DejaVu *dv){ register gint i; g_print(" Seq:"); for(i = 0; i < dv->len; i++) g_print(" %c", dv->seq[i]); g_print("\n"); g_print(" Pos:"); for(i = 0; i < dv->len; i++) g_print("%3d", i); g_print("\n"); g_print("Next:"); for(i = 0; i < dv->len; i++) g_print("%3d", dv->next[i]); g_print("\n"); return; } static gint DejaVu_init(DejaVu *dv, guchar *filter){ register gint i, ch, start = -1, end = -1; gint first[ALPHABET_SIZE]; gint last[ALPHABET_SIZE]; for(i = 0; i < ALPHABET_SIZE; i++) first[i] = last[i] = -1; for(i = 0; i < dv->len; i++){ /* Set next char */ ch = dv->seq[i]; if(filter[ch] == '-') continue; if(first[ch] == -1) first[ch] = i; if(last[ch] != -1) dv->next[last[ch]] = i; last[ch] = i; } for(i = 0; i < ALPHABET_SIZE; i++){ if(first[i] != -1){ /* exists */ dv->symbol_list[dv->symbol_list_len++] = i; if(first[i] != last[i]){ /* is repeated */ if(end != -1) dv->next[end] = -first[i]; if(start == -1) start = first[i]; end = last[i]; } } } if(end == -1) /* In case there are no repeats */ return -1; dv->next[end] = -start; return start; } /* dv->next holds the position of next repeat instance * except: * when (next < 0) * next holds -position of next repeat species * when (next == -start) * this is the last repeat of this length */ static gint DejaVu_promote_species(DejaVu *dv, gint curr, gint length, gint min_wordlen, DejaVu_TraverseFunc dvtf, guchar *filter, gpointer user_data, gint *first, gint *last){ register gint ch, pos = curr; do { if(length >= min_wordlen) dvtf(curr, pos, length, dv->seq, dv->len, user_data); /* report */ if((pos+length) < dv->len){ ch = dv->seq[pos+length]; if(filter[ch] != '-'){ if(first[ch] == -1) /* record first */ first[ch] = pos; if(last[ch] != -1) /* link from prev */ dv->next[last[ch]] = pos; last[ch] = pos; /* record this */ } } pos = dv->next[pos]; } while(pos > 0); return -pos; } /* FIXME: optimisation: could put dvrf() in separate loop */ static gint DejaVu_promote(DejaVu *dv, gint start, gint length, gint min_wordlen, DejaVu_TraverseFunc dvtf, guchar *filter, gpointer user_data){ register gint i, ch, pos = start, end = -1, new_start = -1; gint first[ALPHABET_SIZE]; gint last[ALPHABET_SIZE]; for(i = 0; i < dv->symbol_list_len; i++){ ch = dv->symbol_list[i]; first[ch] = last[ch] = -1; } do { pos = DejaVu_promote_species(dv, pos, length, min_wordlen, dvtf, filter, user_data, first, last); for(i = 0; i < dv->symbol_list_len; i++){ ch = dv->symbol_list[i]; if(first[ch] != -1){ /* exists */ if(first[ch] != last[ch]){ /* is repeated */ if(end != -1) dv->next[end] = -first[ch]; if(new_start == -1) new_start = first[ch]; end = last[ch]; } first[ch] = -1; } last[ch] = -1; } } while(pos != start); if(end != -1) dv->next[end] = -new_start; return new_start; } void DejaVu_traverse(DejaVu *dv, gint min_wordlen, gint max_wordlen, DejaVu_TraverseFunc dvtf, gpointer user_data, guchar *filter, gint verbosity){ register gint i, start, length = 1; register guchar *plain_filter = NULL; g_assert(min_wordlen >= 0); g_assert(min_wordlen <= max_wordlen); g_assert(dv); g_assert(dvtf); if(!filter){ plain_filter = g_new(guchar, ALPHABET_SIZE); for(i = 0; i < ALPHABET_SIZE; i++) plain_filter[i] = i; filter = plain_filter; } start = DejaVu_init(dv, filter); if(verbosity > 0) g_print("Message: Processing ["); while((start != -1) /* while there are more repeats */ && (length <= max_wordlen)){ if(verbosity > 0) g_print("."); start = DejaVu_promote(dv, start, length++, min_wordlen, dvtf, filter, user_data); } if(verbosity > 0) g_print("]\n"); if(plain_filter) g_free(plain_filter); return; } /**/ exonerate-2.4.0/src/struct/bitarray.test.c0000644000175000017500000000223311162714341015502 00000000000000/****************************************************************\ * * * A simple bitarray data structure * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "bitarray.h" int main(void){ register BitArray *ba = BitArray_create(); register gint i; for(i = 0; i < 16; i++) BitArray_append(ba, i, 4); BitArray_info(ba); BitArray_destroy(ba); return 0; } exonerate-2.4.0/src/struct/bitarray.c0000644000175000017500000001310511162714341014524 00000000000000/****************************************************************\ * * * A simple bitarray data structure * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For memset */ #include "bitarray.h" /**/ #ifndef GetBit #define GetBit(i,n) (((i)>>(n))&1) #endif /* GetBit */ #ifndef BitOn #define BitOn(i,n) ((i)|1<<(n)) #endif /* BitOn */ #ifndef BitOff #define BitOff(i,n) ((i)&~(1<<(n))) #endif /* BitOff */ #ifndef GetBits #define GetBits(x,y,z) (((x)>>(y))&~(~0<<(z))) #endif /* GetBits */ #ifndef SetBits #define SetBits(w,x,y,z) (((x)&~(~(~0<<(z))<<(y)))|((w)<<(y))) #endif /* SetBits */ /**/ BitArray *BitArray_create(void){ register BitArray *ba = g_new(BitArray, 1); ba->alloc = 16; ba->length = 0; ba->data = g_new0(guchar, ba->alloc); return ba; } void BitArray_destroy(BitArray *ba){ g_free(ba->data); g_free(ba); return; } void BitArray_info(BitArray *ba){ register gint64 i; g_print("BitArray length [%d] alloc [%d] data [", (gint)ba->length, (gint)ba->alloc); for(i = 0; i < ba->length; i++) g_print("%d", BitArray_get_bit(ba, i)); g_print("]\n"); return; } void BitArray_empty(BitArray *ba){ ba->length = 0; return; } void BitArray_write(BitArray *ba, FILE *fp){ fwrite(ba->data, sizeof(guchar), BitArray_get_size(ba->length), fp); fflush(fp); /* Keep valgrind quiet ?? */ return; } BitArray *BitArray_read(FILE *fp, gsize size){ register BitArray *ba = g_new(BitArray, 1); ba->alloc = size; ba->length = size * CHAR_BIT; ba->data = g_new(guchar, ba->alloc); fread(ba->data, sizeof(guchar), size, fp); return ba; } void BitArray_append_bit(BitArray *ba, gboolean bit){ register gint64 word = ba->length / CHAR_BIT, pos = ba->length & (CHAR_BIT-1); if(ba->length == (ba->alloc*CHAR_BIT)){ ba->alloc <<= 1; ba->data = g_realloc(ba->data, ba->alloc); memset(ba->data+(ba->alloc>>1), 0, (ba->alloc>>1)); } ba->data[word] = bit?BitOn(ba->data[word], pos) :BitOff(ba->data[word], pos); ba->length++; return; } void BitArray_append(BitArray *ba, guint64 data, guchar width){ register gint64 word = ba->length/ CHAR_BIT, input; register guchar todo = width, bit = ba->length & (CHAR_BIT-1), taken = CHAR_BIT-bit, done = 0; ba->length += width; if(ba->length >= (ba->alloc*CHAR_BIT)){ ba->alloc <<= 1; ba->data = g_realloc(ba->data, ba->alloc); memset(ba->data+(ba->alloc>>1), 0, (ba->alloc>>1)); } do { if(taken >= todo){ /* final partial */ input = GetBits(data, done, todo); ba->data[word] = SetBits(input, ba->data[word], bit, todo); break; } input = GetBits(data, done, taken); ba->data[word] = SetBits(input, ba->data[word], bit, taken); todo -= taken; done += taken; bit = 0; taken = CHAR_BIT; word++; } while(TRUE); return; } /* Previous slow version */ #if 0 void BitArray_append(BitArray *ba, guint64 data, guchar width){ register guchar i; g_assert(width <= (sizeof(guint64)*CHAR_BIT)); for(i = 0; i < width; i++) BitArray_append_bit(ba, GetBit(data, i)); return; } #endif /* 0 */ gboolean BitArray_get_bit(BitArray *ba, guint64 pos){ register gint64 word = pos / CHAR_BIT, bit = pos & (CHAR_BIT-1); return GetBit(ba->data[word], bit); } guint64 BitArray_get(BitArray *ba, guint64 start, guchar width){ register gint64 word = start / CHAR_BIT, data = 0; register guchar todo = width, bit = start & (CHAR_BIT-1), taken = CHAR_BIT-bit, done = 0; do { if(taken >= todo){ /* final partial */ data |= ((guint64)GetBits(ba->data[word], bit, todo) << done); break; } data |= ((guint64)GetBits(ba->data[word], bit, taken) << done); todo -= taken; done += taken; bit = 0; taken = CHAR_BIT; word++; } while(TRUE); return data; } /* Previous slow version */ #if 0 guint64 BitArray_get(BitArray *ba, guint64 start, guchar width){ register gint i; register guint64 data = 0; for(i = 0; i < width; i++){ data |= ((guint64)BitArray_get_bit(ba, start+i) << i); } return data; } #endif /* 0 */ void BitArray_write_int(guint64 num, FILE *fp){ guint64 nbo_num = GUINT64_TO_BE(num); /* BigEndian == NBO */ fwrite(&nbo_num, sizeof(guint64), 1, fp); return; } guint64 BitArray_read_int(FILE *fp){ guint64 num; fread(&num, sizeof(guint64), 1, fp); return GUINT64_FROM_BE(num); } /**/ exonerate-2.4.0/src/struct/splaytree.test.c0000644000175000017500000000606311162714343015704 00000000000000/****************************************************************\ * * * Splay Tree Library * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "splaytree.h" static gint compare_func(gpointer data_a, gpointer data_b, gpointer user_data){ /* g_message("comparing [%d][%d]", (gint)data_a, (gint)data_b); */ return data_a - data_b; } static gboolean traverse_func(gpointer data, gpointer user_data){ g_message("traverse [%d]", GPOINTER_TO_INT(data)); return FALSE; } int main(void){ register SplayTree_Set *sts = SplayTree_Set_create(compare_func, NULL, NULL); register SplayTree *st = NULL; int i; int num[100] = { 87, 18, 95, 86, 32, 96, 25, 21, 36, 65, 42, 78, 72, 29, 85, 66, 22, 13, 61, 53, 88, 14, 28, 84, 47, 67, 40, 26, 24, 1, 70, 60, 15, 97, 9, 6, 63, 45, 37, 64, 79, 3, 41, 46, 16, 38, 35, 10, 59, 71, 17, 93, 8, 81, 51, 31, 82, 98, 80, 12, 27, 62, 30, 5, 90, 52, 2, 68, 89, 34, 94, 20, 11, 4, 92, 58, 99, 77, 7, 23, 43, 33, 0, 73, 57, 39, 91, 56, 83, 75, 74, 49, 50, 44, 55, 19, 54, 76, 69, 48 }; for(i = 0; i < 100; i++){ g_message("insert [%d]", num[i]); st = SplayTree_insert(st, sts, GINT_TO_POINTER(num[i])); } for(i = 0; i < 100; i++){ g_message("lookup [%d]", num[i]); st = SplayTree_lookup(st, sts, GINT_TO_POINTER(num[i])); g_assert(GPOINTER_TO_INT(st->data) == num[i]); } #if 0 for(i = 0; i < 100; i++){ g_message("remove [%d]", num[i]); st = SplayTree_remove(st, sts, GINT_TO_POINTER(num[i]), NULL); } #endif /* 0 */ SplayTree_traverse(st, sts, traverse_func); st = SplayTree_splay(st, sts, GINT_TO_POINTER(0)); for(i = 0; i < 100; i++){ g_message("removing [%d]", GPOINTER_TO_INT(st->data)); st = SplayTree_remove(st, sts, st->data, NULL); } /**/ g_assert(!st); SplayTree_destroy(st, sts); SplayTree_Set_destroy(sts); g_message("done"); return 0; } /**/ exonerate-2.4.0/src/struct/splaytree.c0000644000175000017500000001313511162714343014724 00000000000000/****************************************************************\ * * * Splay Tree Library * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "splaytree.h" SplayTree_Set *SplayTree_Set_create(SplayTree_CompareFunc compare_func, SplayTree_FreeFunc free_func, gpointer user_data){ register SplayTree_Set *sts = g_new(SplayTree_Set, 1); sts->recycle = RecycleBin_create("SplayTree", sizeof(SplayTree), 1024); g_assert(compare_func); sts->compare_func = compare_func; sts->free_func = free_func; sts->user_data = user_data; return sts; } void SplayTree_Set_destroy(SplayTree_Set *sts){ RecycleBin_destroy(sts->recycle); g_free(sts); return; } gsize SplayTree_Set_memory_usage(SplayTree_Set *sts){ return RecycleBin_memory_usage(sts->recycle); } SplayTree *SplayTree_splay(SplayTree *st, SplayTree_Set *sts, gpointer data){ register SplayTree *l, *r, *t; register gint comp; SplayTree n; if(!st) return st; n.left = n.right = NULL; l = r = &n; while((comp = sts->compare_func(data, st->data, sts->user_data))){ if(comp < 0){ if(!st->left) break; if(sts->compare_func(data, st->left->data, sts->user_data) < 0){ /* Rotate right */ t = st->left; st->left = t->right; t->right = st; st = t; if(!st->left) break; } r->left = st; /* Link right */ r = st; st = st->left; } else { if(!st->right) break; if(sts->compare_func(data, st->right->data, sts->user_data) > 0){ /* Rotate left */ t = st->right; st->right = t->left; t->left = st; st = t; if(!st->right) break; } l->right = st; /* Link left */ l = st; st = st->right; } } l->right = st->left; /* Assemble */ r->left = st->right; st->left = n.right; st->right = n.left; return st; } SplayTree *SplayTree_insert(SplayTree *st, SplayTree_Set *sts, gpointer data){ register SplayTree *n; register gint comp; if(!st){ n = RecycleBin_alloc(sts->recycle); n->data = data; n->left = n->right = NULL; return n; } st = SplayTree_splay(st, sts, data); comp = sts->compare_func(data, st->data, sts->user_data); if(!comp) /* Already in tree */ return st; n = RecycleBin_alloc(sts->recycle); n->data = data; if(comp < 0){ n->left = st->left; n->right = st; st->left = NULL; } else { n->right = st->right; n->left = st; st->right = NULL; } return n; } /* If the data is already present, the new data is not used. */ SplayTree *SplayTree_remove(SplayTree *st, SplayTree_Set *sts, gpointer data, gpointer *result){ register SplayTree *n; register gint comp; if(!st) return NULL; st = SplayTree_splay(st, sts, data); comp = sts->compare_func(data, st->data, sts->user_data); if(!comp){ /* found */ if(st->right){ n = SplayTree_splay(st->right, sts, data); g_assert(!n->left); n->left = st->left; } else { n = st->left; } if(result) (*result) = st->data; RecycleBin_recycle(sts->recycle, st); return n; } return st; /* not found */ } /* If the data is found, (*result) is set to the removed data */ SplayTree *SplayTree_lookup(SplayTree *st, SplayTree_Set *sts, gpointer data){ return st?SplayTree_splay(st, sts, data):NULL; } void SplayTree_destroy(SplayTree *st, SplayTree_Set *sts){ if(!st) return; SplayTree_destroy(st->left, sts); SplayTree_destroy(st->right, sts); if(sts->free_func) sts->free_func(st->data, sts->user_data); RecycleBin_recycle(sts->recycle, st); return; } static gboolean SplayTree_traverse_recur(SplayTree *st, SplayTree_Traverse_Func traverse_func, gpointer user_data){ if(!st) return FALSE; if(SplayTree_traverse_recur(st->left, traverse_func, user_data) || traverse_func(st->data, user_data) || SplayTree_traverse_recur(st->right, traverse_func, user_data)) return TRUE; return FALSE; } void SplayTree_traverse(SplayTree *st, SplayTree_Set *sts, SplayTree_Traverse_Func traverse_func){ SplayTree_traverse_recur(st, traverse_func, sts->user_data); return; } /**/ exonerate-2.4.0/src/struct/noitree.test.c0000644000175000017500000000434511162714342015341 00000000000000/****************************************************************\ * * * NOI_Tree : non-overlapping interval tree * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "noitree.h" static void test_traverse_func(gint start, gint length, gpointer user_data){ g_message("traverse with [%d,%d]", start, length); return; } int main(void){ register NOI_Tree_Set *nts = NOI_Tree_Set_create(NULL); register NOI_Tree *nt = NOI_Tree_create(nts); /**/ #if 0 NOI_Tree_insert(nt, nts, 1, 3); NOI_Tree_insert(nt, nts, 7, 3); NOI_Tree_insert(nt, nts, 13, 3); NOI_Tree_insert(nt, nts, 19, 3); NOI_Tree_insert(nt, nts, 25, 3); g_message("traverse 1:"); NOI_Tree_traverse(nt, nts, test_traverse_func); /**/ NOI_Tree_delta_init(nt, nts); NOI_Tree_insert(nt, nts, 6, 19); g_message("traverse 2:"); NOI_Tree_traverse(nt, nts, test_traverse_func); g_message("traverse delta:"); NOI_Tree_delta_traverse(nt, nts, test_traverse_func); /**/ NOI_Tree_destroy(nt, nts); #endif /* 0 */ NOI_Tree_delta_init(nt, nts); NOI_Tree_insert(nt, nts, 100, 10); NOI_Tree_insert(nt, nts, 300, 10); NOI_Tree_insert(nt, nts, 200, 10); g_message("traverse:"); NOI_Tree_traverse(nt, nts, test_traverse_func); g_message("delta traverse:"); NOI_Tree_delta_traverse(nt, nts, test_traverse_func); /**/ NOI_Tree_Set_destroy(nts); g_message("done"); return 0; } /**/ exonerate-2.4.0/src/struct/noitree.c0000644000175000017500000001567311162714342014371 00000000000000/****************************************************************\ * * * NOI_Tree : non-overlapping interval tree * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "noitree.h" /**/ typedef struct { gint start; gint length; } NOI_Tree_Interval; #define NOI_Tree_Interval_end(interval) ((interval)->start+(interval)->length) static gint NOI_Tree_compare_func(gpointer data_a, gpointer data_b, gpointer user_data){ register NOI_Tree_Interval *int_a = data_a, *int_b = data_b; if(NOI_Tree_Interval_end(int_a) < int_b->start) return -1; /* a before b */ if(NOI_Tree_Interval_end(int_b) < int_a->start) return 1; /* b before a */ return 0; /* overlapping */ } static void NOI_Tree_free_func(gpointer data, gpointer user_data){ register NOI_Tree_Interval *interval = data; g_free(interval); return; } typedef struct { gpointer user_data; NOI_Tree_Traverse_Func ntf; } NOI_Tree_Traverse_Data; NOI_Tree_Set *NOI_Tree_Set_create(gpointer user_data){ register NOI_Tree_Set *nts = g_new(NOI_Tree_Set, 1); register NOI_Tree_Traverse_Data *nttd = g_new(NOI_Tree_Traverse_Data, 1); nttd->user_data = user_data; nttd->ntf = NULL; nts->sts = SplayTree_Set_create(NOI_Tree_compare_func, NOI_Tree_free_func, nttd); return nts; } void NOI_Tree_Set_destroy(NOI_Tree_Set *nts){ register NOI_Tree_Traverse_Data *nttd = nts->sts->user_data; g_free(nttd); SplayTree_Set_destroy(nts->sts); g_free(nts); return; } /**/ NOI_Tree *NOI_Tree_create(NOI_Tree_Set *nts){ register NOI_Tree *nt = g_new(NOI_Tree, 1); nt->st = NULL; nt->delta = NULL; return nt; } void NOI_Tree_destroy(NOI_Tree *nt, NOI_Tree_Set *nts){ if(nt->delta) NOI_Tree_destroy(nt->delta, nts); SplayTree_destroy(nt->st, nts->sts); g_free(nt); return; } static NOI_Tree_Interval *NOI_Tree_Interval_merge(NOI_Tree_Interval *int_a, NOI_Tree_Interval *int_b){ register gint start = MIN(int_a->start, int_b->start), end = MAX(NOI_Tree_Interval_end(int_a), NOI_Tree_Interval_end(int_b)); /* g_message("MERGING [%d,%d] [%d,%d] ==> [%d,%d]", int_a->start, int_a->length, int_b->start, int_b->length, start, end-start); */ int_a->start = start; int_a->length = end-start; g_free(int_b); return int_a; } void NOI_Tree_insert(NOI_Tree *nt, NOI_Tree_Set *nts, gint start, gint length){ NOI_Tree_Interval *result = NULL, *prev = NULL; register NOI_Tree_Interval *interval = g_new(NOI_Tree_Interval, 1); register gint cursor = start; /* g_message("[%s], (%d,%d) [%p] [%p]", __FUNCTION__, start, length, nt, interval); */ interval->start = start; interval->length = 0; /* Splay with zero length interval to bring 1st overlapping node to root * This ensures removal of overlapping nodes will be in order */ if(nt->st){ nt->st = SplayTree_splay(nt->st, nts->sts, interval); if(nts->sts->compare_func(interval, nt->st->data, nts->sts->user_data) > 0){ if(nt->st->right){ /* remove 1st if before interval */ nt->st = SplayTree_remove(nt->st, nts->sts, nt->st->data, (gpointer)&prev); cursor = NOI_Tree_Interval_end(prev); } } interval->length = length; while(nt->st && (!nts->sts->compare_func(interval, nt->st->data, nts->sts->user_data))){ nt->st = SplayTree_remove(nt->st, nts->sts, nt->st->data, (gpointer)&result); g_assert(result); if(nt->delta && (cursor < result->start)) NOI_Tree_insert(nt->delta, nts, cursor, result->start-cursor); cursor = NOI_Tree_Interval_end(result); /**/ interval = NOI_Tree_Interval_merge(interval, result); result = NULL; } if(nt->delta && (cursor < NOI_Tree_Interval_end(interval))){ NOI_Tree_insert(nt->delta, nts, cursor, NOI_Tree_Interval_end(interval) - cursor); } } else { interval->length = length; if(nt->delta) NOI_Tree_insert(nt->delta, nts, start, length); } nt->st = SplayTree_insert(nt->st, nts->sts, interval); if(prev) nt->st = SplayTree_insert(nt->st, nts->sts, prev); return; } /* Delta Algorithm: * start with cursor at start of insert_interval * for each overlapping existing interval * if(cursor < curr_interval->start) * add new_interval from cursort to curr_interval->start to delta_tree * move cursor to end of curr_interval * if(cursor < insert_interval_end) * add new_interval from cursor to insert_interval_end to delta_tree */ static gboolean NOI_Tree_SplayTree_traverse_func(gpointer data, gpointer user_data){ register NOI_Tree_Interval *interval = data; register NOI_Tree_Traverse_Data *nttd = user_data; g_assert(nttd->ntf); nttd->ntf(interval->start, interval->length, nttd->user_data); return FALSE; } void NOI_Tree_traverse(NOI_Tree *nt, NOI_Tree_Set *nts, NOI_Tree_Traverse_Func ntf){ register NOI_Tree_Traverse_Data *nttd = nts->sts->user_data; g_assert(!nttd->ntf); nttd->ntf = ntf; SplayTree_traverse(nt->st, nts->sts, NOI_Tree_SplayTree_traverse_func); nttd->ntf = NULL; return; } void NOI_Tree_delta_init(NOI_Tree *nt, NOI_Tree_Set *nts){ if(nt->delta) NOI_Tree_destroy(nt->delta, nts); nt->delta = NOI_Tree_create(nts); return; } void NOI_Tree_delta_traverse(NOI_Tree *nt, NOI_Tree_Set *nts, NOI_Tree_Traverse_Func ntf){ if(nt->delta) NOI_Tree_traverse(nt->delta, nts, ntf); return; } /**/ exonerate-2.4.0/src/struct/fsm.h0000644000175000017500000000626411162714341013511 00000000000000/****************************************************************\ * * * Library for FSM-based word matching. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_FSM_H #define INCLUDED_FSM_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include /* For CHAR_BIT */ #include "recyclebin.h" #ifndef ALPHABETSIZE #define ALPHABETSIZE (1<<(CHAR_BIT)) #endif /* ALPHABETSIZE */ typedef struct FSM_Node { struct FSM_Node *next; gpointer data; } FSM_Node; typedef gpointer (*FSM_Join_Func)(gpointer a, gpointer b, gpointer user_data); typedef struct { guchar index[ALPHABETSIZE]; /* Map chars to the array */ guchar insertion_filter[ALPHABETSIZE]; guchar traversal_filter[ALPHABETSIZE]; guchar width; /* Alphabet size for this FSM */ FSM_Node *root; /* Base of the finite state machine */ FSM_Join_Func merge_func; /* Function for node data merging */ FSM_Join_Func combine_func; /* Function for node data combination */ RecycleBin *recycle; /* RecycleBin for node chunks */ gulong chunk_count; /* Number of chunks used */ gpointer user_data; gboolean is_compiled; } FSM; /* merge_func is called when two identical words are submitted. * combine_func is called when one word is a subseq of another. */ FSM *FSM_create(gchar *alphabet, FSM_Join_Func merge_func, FSM_Join_Func combine_func, gpointer user_data); gsize FSM_memory_usage(FSM *f); void FSM_info(FSM *f); void FSM_destroy(FSM *f); typedef void (*FSM_Destroy_Func)(gpointer data, gpointer user_data); void FSM_destroy_with_data(FSM *f, FSM_Destroy_Func fdf, gpointer user_data); gpointer FSM_add(FSM *f, gchar *seq, guint len, gpointer node_data); void FSM_compile(FSM *f); typedef void (*FSM_Traverse_Func)(guint seq_pos, gpointer node_data, gpointer user_data); void FSM_traverse(FSM *f, gchar *seq, FSM_Traverse_Func ftf, gpointer user_data); void FSM_add_insertion_filter(FSM *f, guchar *filter); void FSM_add_traversal_filter(FSM *f, guchar *filter); /* Filters must be ALPHABETSIZE. * If filter is NULL, mapping is reset. */ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_FSM_H */ exonerate-2.4.0/src/struct/matrix.h0000644000175000017500000000337311162714342014227 00000000000000/****************************************************************\ * * * Simple matrix creation routines * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_MATRIX_H #define INCLUDED_MATRIX_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include gsize Matrix2d_size(gint a, gint b, gsize cell); gsize Matrix3d_size(gint a, gint b, gint c, gsize cell); gsize Matrix4d_size(gint a, gint b, gint c, gint d, gsize cell); /* Sizes may be unexpected due to word alignment padding */ gpointer *Matrix2d_create(gint a, gint b, gsize cell); void Matrix2d_init(gchar **m, gint a, gint b, gsize cell); gpointer **Matrix3d_create(gint a, gint b, gint c, gsize cell); gpointer ***Matrix4d_create(gint a, gint b, gint c, gint d, gsize cell); /* All the matrices are freed with a single call to g_free() */ /* FIXME: implement general N-dimentional versions */ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_MATRIX_H */ exonerate-2.4.0/src/struct/pqueue.h0000644000175000017500000000561711174110613014224 00000000000000/****************************************************************\ * * * Priority queue library using pairing heaps. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_PQUEUE_H #define INCLUDED_PQUEUE_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "recyclebin.h" typedef struct PQueue_Node { gpointer data; struct PQueue_Node *left; struct PQueue_Node *next; struct PQueue_Node *prev; } PQueue_Node; typedef void (*PQueue_Free_Func)(gpointer data, gpointer user_data); typedef gboolean (*PQueue_Compare_Func)(gpointer low, gpointer high, gpointer user_data); typedef struct { RecycleBin *pq_recycle; RecycleBin *node_recycle; } PQueueSet; typedef struct { PQueue_Node *root; /* The root node */ gint total; /* Number of members */ PQueue_Compare_Func comp_func; /* Comparison function */ PQueueSet *set; gpointer user_data; } PQueue; PQueueSet *PQueueSet_create(void); void PQueueSet_destroy(PQueueSet *pq_set); PQueue *PQueue_create(PQueueSet *pq_set, PQueue_Compare_Func comp_func, gpointer user_data); void PQueue_destroy(PQueue *pq, PQueue_Free_Func free_func, gpointer user_data); PQueue_Node *PQueue_push(PQueue *pq, gpointer data); void PQueue_raise(PQueue *pq, PQueue_Node *pqn); void PQueue_change(PQueue *pq, PQueue_Node *pqn); gpointer PQueue_pop(PQueue *pq); PQueue *PQueue_join(PQueue *a, PQueue *b); gpointer PQueue_remove(PQueue *pq, PQueue_Node *pqn); typedef gboolean (*PQueue_Traverse_Func)(gpointer data, gpointer user_data); /* Returns TRUE to stop the traversal */ void PQueue_traverse(PQueue *pq, PQueue_Traverse_Func tf, gpointer user_data); #define PQueue_total(pq) ((pq)->total) #define PQueue_top(pq) \ ((pq)?(((pq)->root)?((pq)->root->data):(NULL)):(NULL)) #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_PQUEUE_H */ exonerate-2.4.0/src/struct/slist.h0000644000175000017500000001053111162714342014053 00000000000000/****************************************************************\ * * * Efficient single-linked list routines. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ /* Note: SList is used instead of the glib GSList object, * as it provides: * o All constant time operations (including adding and joining) * ( except SList_traverse() and SList_destroy_with_data() ) * o Ease of destroying a set of lists at once * o Ability to add to either end of list. * o SList does not change on append (has a container) */ #ifndef INCLUDED_SLIST_H #define INCLUDED_SLIST_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "recyclebin.h" typedef struct SListNode { struct SListNode *next; gpointer data; } SListNode; typedef struct { RecycleBin *list_recycle; RecycleBin *node_recycle; } SListSet; typedef struct { SListNode *head; SListNode *tail; SListSet *set; } SList; /* FIXME: optimisation : remove slist->set to save space */ #define SList_isempty(slist) ((slist)->head == (slist)->tail) #define SList_first(slist) \ (SList_isempty(slist)?NULL:(slist)->head->next->data) #define SList_last(slist) \ (SList_isempty(slist)?NULL:(slist)->tail->data) typedef void (*SList_DestroyFunc)(gpointer data, gpointer user_data); typedef gboolean (*SList_TraverseFunc)(gpointer data, gpointer user_data); typedef gboolean (*SList_CompareFunc)(gpointer a, gpointer b, gpointer user_data); /* SList_TraverseFunc should return TRUE to stop the traversal */ SListSet *SListSet_create(void); void SListSet_destroy(SListSet *slist_set); gsize SListSet_memory_usage(SListSet *slist_set); /* SListSet also destroy will all SLists in this set */ SList *SList_create(SListSet *slist_set); void SList_empty(SList *slist); void SList_destroy(SList *slist); void SList_destroy_with_data(SList *slist, SList_DestroyFunc destroy_func, gpointer user_data); void SList_empty_with_data(SList *slist, SList_DestroyFunc destroy_func, gpointer user_data); void SList_queue(SList *slist, gpointer data); void SList_stack(SList *slist, gpointer data); gpointer SList_pop(SList *slist); SList *SList_join(SList *left, SList *right); /* SList_stack() adds to start of list * SList_queue() adds to end of list * SList_pop() removes from the start of the list * SList_join() appends right onto end of left. * right is emptied but not destroyed. */ SList *SList_merge(SList *left, SList *right, SList_CompareFunc compare_func, gpointer user_data); /* SList_merge() assumes left and right are already sorted, * and merge and sort the results into left. * right is emptied but not destroyed. */ void SList_remove(SList *slist, SListNode *prev_node); void SList_insert(SList *slist, SListNode *prev_node, SList *new_slist); /* SList_remove() removes prev_node->next from slist * (or the first node if prev_node is NULL) * SList_insert() inserts new_slist into slist after prev_node * (or at the beggining if prev_node is NULL) * new_slist is destroyed in the process. */ void SList_traverse(SList *slist, SList_TraverseFunc traverse_func, gpointer user_data); #define SList_for_each(slist, sln) \ for((sln) = (slist)->head->next; (sln); (sln) = (sln)->next) #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_SLIST_H */ exonerate-2.4.0/src/struct/rangetree.h0000644000175000017500000000503611162714342014675 00000000000000/****************************************************************\ * * * Trees for 2D range searching. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_RANGETREE_H #define INCLUDED_RANGETREE_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "recyclebin.h" typedef struct RangeTree_Node { struct RangeTree_Node *left; struct RangeTree_Node *right; gint x; gint y; gpointer info; } RangeTree_Node; typedef struct { RangeTree_Node *root; GTree *recent_data; RecycleBin *node_recycle; } RangeTree; RangeTree *RangeTree_create(void); typedef void (*RangeTree_FreeFunc)(gpointer info, gpointer user_data); void RangeTree_destroy(RangeTree *rt, RangeTree_FreeFunc rtff, gpointer user_data); void RangeTree_add(RangeTree *rt, gint x, gint y, gpointer info); typedef gboolean (*RangeTree_ReportFunc)(gint x, gint y, gpointer info, gpointer user_data); gboolean RangeTree_find(RangeTree *rt, gint x_start, gint x_length, gint y_start, gint y_length, RangeTree_ReportFunc rtrf, gpointer user_data); /* Will call rtrf for all points within supplied range. * If [xy]_start is zero, will only work on [xy] * If rtrf() returns TRUE, will exit early, and return TRUE. */ gboolean RangeTree_check_pos(RangeTree *rt, gint x, gint y); /* Check a single position without refilling RangeTree */ gboolean RangeTree_is_empty(RangeTree *rt); gboolean RangeTree_traverse(RangeTree *rt, RangeTree_ReportFunc rtrf, gpointer user_data); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_RANGETREE_H */ exonerate-2.4.0/src/struct/vfsm.h0000644000175000017500000001003411200037565013664 00000000000000/****************************************************************\ * * * VFSM Library : complete trie navigation * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_VFSM_H #define INCLUDED_VFSM_H #include #include /* For CHAR_BIT */ #ifdef G_HAVE_GINT64 typedef guint64 VFSM_Int; #define VFSM_MAX G_GINT64_CONSTANT(18446744073709551615U) #ifdef CUSTOM_GUINT64_FORMAT #define VFSM_PRINT_FORMAT CUSTOM_GUINT64_FORMAT #else /* CUSTOM_GUINT64_FORMAT */ #define VFSM_PRINT_FORMAT "llu" #endif /* CUSTOM_GUINT64_FORMAT */ #else /* G_HAVE_GINT64 */ typedef guint32 VFSM_Int; #define VFSM_MAX 4294967295U #define VFSM_PRINT_FORMAT "u" #endif /* G_HAVE_GINT64 */ #ifndef ALPHABETSIZE #define ALPHABETSIZE (1<<(CHAR_BIT)) #endif /* ALPHABETSIZE */ typedef struct { gchar *alphabet; guint alphabet_size; gchar index[ALPHABETSIZE]; /* alphabet index */ guint depth; /* trie depth or wordsize */ gboolean is_poweroftwo; guint log_alphabet_size; /* Used for &(base-1) tricks */ VFSM_Int prs; /* Penultimate row start */ VFSM_Int prw; /* Penultimate row width */ VFSM_Int lrs; /* Last row start */ VFSM_Int lrw; /* Last row width (leafcount) */ VFSM_Int total; /* Total number of states */ VFSM_Int *branch_size; /* For each depth position */ } VFSM; VFSM *VFSM_create(gchar *alphabet, guint depth); /* Returns NULL if cannot build VFSM using basic types */ void VFSM_destroy(VFSM *vfsm); void VFSM_info(VFSM *vfsm); VFSM_Int VFSM_word2state(VFSM *vfsm, gchar *word); /* Returns 0 if word is not valid */ gboolean VFSM_state2word(VFSM *vfsm, VFSM_Int state, gchar *word); /* Returns FALSE if pos is not valid */ /* Requires vfsm->depth+1 char space for word */ VFSM_Int VFSM_change_state(VFSM *vfsm, VFSM_Int state, guchar ch); VFSM_Int VFSM_change_state_POW2(VFSM *vfsm, VFSM_Int state, guchar ch); #define VFSM_state_is_leaf(vfsm, state) ((state) >= (vfsm)->lrs) #define VFSM_state2leaf(vfsm, state) ((state)-(vfsm)->lrs) #define VFSM_leaf2state(vfsm, state) ((state)+(vfsm)->lrs) #define VFSM_change_state_M(vfsm, state, ch) \ ((state) < (vfsm)->lrs) \ ? ((state)*(vfsm)->alphabet_size)+(vfsm)->index[(ch)] \ : (((vfsm)->prs+(((state)-(vfsm)->lrs) % (vfsm)->prw)) \ * (vfsm)->alphabet_size)+(vfsm)->index[(ch)]; #define VFSM_change_state_MPOW2(vfsm, state, ch) \ ((state) < (vfsm)->lrs) \ ? ((state)<<(vfsm)->log_alphabet_size)+(vfsm)->index[(ch)] \ : (((vfsm)->prs+(((state)-(vfsm)->lrs)&((vfsm)->prw-1))) \ << (vfsm)->log_alphabet_size)+(vfsm)->index[(ch)]; gchar VFSM_symbol_by_pos(VFSM *vfsm, VFSM_Int state, guint depth); gchar VFSM_symbol_by_pos_POW2(VFSM *vfsm, VFSM_Int state, guint depth); VFSM_Int VFSM_jump(VFSM *vfsm, VFSM_Int state, guint depth, guchar next); #define VFSM_jump_M(vfsm, state, depth, next, curr) \ ((state)+((vfsm)->branch_size[(depth)] \ *((vfsm)->index[(curr)]-(vfsm)->index[(next)]))) /* FIXME: some of these macros will have to change */ # endif /* INCLUDED_VFSM_H */ exonerate-2.4.0/src/struct/recyclebin.h0000644000175000017500000000411411162714342015034 00000000000000/****************************************************************\ * * * Efficient Memory Allocation Routines * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_RECYCLEBIN_H #define INCLUDED_RECYCLEBIN_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include typedef struct RecycleBin_Node { struct RecycleBin_Node *next; } RecycleBin_Node; typedef struct { gint ref_count; gchar *name; GPtrArray *chunk_list; gint chunk_pos; gint nodes_per_chunk; gsize node_size; gint count; RecycleBin_Node *recycle; } RecycleBin; RecycleBin *RecycleBin_create(gchar *name, gsize node_size, gint nodes_per_chunk); void RecycleBin_destroy(RecycleBin *recycle_bin); RecycleBin *RecycleBin_share(RecycleBin *recycle_bin); gsize RecycleBin_memory_usage(RecycleBin *recycle_bin); gpointer RecycleBin_alloc(RecycleBin *recycle_bin); gpointer RecycleBin_alloc_blank(RecycleBin *recycle_bin); void RecycleBin_recycle(RecycleBin *recycle_bin, gpointer data); void RecycleBin_profile(void); #define RecycleBin_total(rb) ((rb)->count) /* FIXME: optimisation: use macros */ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_RECYCLEBIN_H */ exonerate-2.4.0/src/struct/sparsecache.h0000644000175000017500000000626111173374122015203 00000000000000/****************************************************************\ * * * Sparse Cache Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_SPARSECACHE_H #define INCLUDED_SPARSECACHE_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include /* Currently using simple demand paging * * FIXME: optimisation: add option to flush of LRU pages ? */ typedef gpointer (*SparseCache_GetFunc)(gint pos, gpointer page_data, gpointer user_data); typedef gint (*SparseCache_CopyFunc)(gint start, gint length, gchar *dst, gpointer page_data, gpointer user_data); typedef struct { gpointer data; SparseCache_GetFunc get_func; SparseCache_CopyFunc copy_func; gsize data_size; } SparseCache_Page; typedef SparseCache_Page *(*SparseCache_FillFunc)(gint start, gpointer user_data); typedef void (*SparseCache_EmptyFunc)(SparseCache_Page *page, gpointer user_data); typedef void (*SparseCache_FreeFunc)(gpointer user_data); #define SparseCache_PAGE_SIZE_BIT_WIDTH 12 #define SparseCache_PAGE_SIZE (1 << SparseCache_PAGE_SIZE_BIT_WIDTH) #define SparseCache_pos2page(pos) ((pos) >> SparseCache_PAGE_SIZE_BIT_WIDTH) typedef struct { gint ref_count; SparseCache_Page **page_list; gint page_total; gint page_used; gint length; gsize page_memory_usage; SparseCache_FillFunc fill_func; SparseCache_EmptyFunc empty_func; SparseCache_FreeFunc free_func; gpointer user_data; } SparseCache; SparseCache *SparseCache_create(gint length, SparseCache_FillFunc fill_func, SparseCache_EmptyFunc empty_func, SparseCache_FreeFunc free_func, gpointer user_data); void SparseCache_destroy(SparseCache *sc); SparseCache *SparseCache_share(SparseCache *sc); gpointer SparseCache_get(SparseCache *sc, gint pos); gsize SparseCache_memory_usage(SparseCache *sc); void SparseCache_copy(SparseCache *sc, gint start, gint length, gchar *dst); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_SPARSECACHE_H */ exonerate-2.4.0/src/struct/dejavu.h0000644000175000017500000000407011162714341014173 00000000000000/****************************************************************\ * * * Deja-vu library for fast linear space repeat finding * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_DEJAVU_H #define INCLUDED_DEJAVU_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include #ifndef ALPHABET_SIZE #define ALPHABET_SIZE (1 << CHAR_BIT) #endif /* ALPHABET_SIZE */ typedef struct { gchar *seq; gint len; gint *next; gint symbol_list[ALPHABET_SIZE]; gint symbol_list_len; } DejaVu; typedef void (*DejaVu_TraverseFunc)(gint first_pos, gint curr_pos, gint length, gchar *seq, gint len, gpointer user_data); DejaVu *DejaVu_create(gchar *seq, gint len); /* seq is used, without a copy being made */ void DejaVu_info(DejaVu *dv); void DejaVu_destroy(DejaVu *dv); void DejaVu_traverse(DejaVu *dv, gint min_wordlen, gint max_wordlen, DejaVu_TraverseFunc dvtf, gpointer user_data, guchar *filter, gint verbosity); /* No filter applied when filter is NULL. * Filter is Alphabet size. * symbols are excluded when (filter[seq[i]] == '-') */ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_DEJAVU_H */ exonerate-2.4.0/src/struct/bitarray.h0000644000175000017500000000437311200040537014530 00000000000000/****************************************************************\ * * * A simple bitarray data structure * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_BITARRAY_H #define INCLUDED_BITARRAY_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include #include /* For CHAR_BIT */ typedef struct { guchar *data; guint64 length; /* Length in bits */ guint64 alloc; } BitArray; #define BitArray_get_size(length) \ ( ((length) / (sizeof(guchar) * CHAR_BIT)) \ +(((length) % (sizeof(guchar) * CHAR_BIT))?1:0)) #define BitArray_memory_usage(ba) \ sizeof(BitArray) \ + (sizeof(gchar)*(ba)->alloc) BitArray *BitArray_create(void); void BitArray_destroy(BitArray *ba); void BitArray_info(BitArray *ba); void BitArray_empty(BitArray *ba); void BitArray_write(BitArray *ba, FILE *fp); BitArray *BitArray_read(FILE *fp, gsize size); void BitArray_append_bit(BitArray *ba, gboolean bit); void BitArray_append(BitArray *ba, guint64 data, guchar width); gboolean BitArray_get_bit(BitArray *ba, guint64 pos); guint64 BitArray_get(BitArray *ba, guint64 start, guchar width); void BitArray_write_int(guint64 num, FILE *fp); guint64 BitArray_read_int(FILE *fp); void BitArray_int_print(guint64 num); gchar *BitArray_int_sprintf(guint64 num); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_BITARRAY_H */ exonerate-2.4.0/src/struct/splaytree.h0000644000175000017500000000570511162714343014735 00000000000000/****************************************************************\ * * * Splay Tree Library * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ /* Splay Trees * * For a description, see: * "Self-adjusting Binary Search Trees" * D.D.Sleator and R.E.Tarjan * JACM 32:3 pp 652-686, July 1985 */ #ifndef INCLUDED_SPLAYTREE_H #define INCLUDED_SPLAYTREE_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "recyclebin.h" typedef struct SplayTree { struct SplayTree *left; struct SplayTree *right; gpointer data; } SplayTree; typedef gint (*SplayTree_CompareFunc)(gpointer data_a, gpointer data_b, gpointer user_data); /* Returns 0 if equal or overlapping, <0 if before, >0 if after */ typedef void (*SplayTree_FreeFunc)(gpointer data, gpointer user_data); typedef struct { RecycleBin *recycle; SplayTree_CompareFunc compare_func; SplayTree_FreeFunc free_func; gpointer user_data; } SplayTree_Set; typedef gboolean (*SplayTree_Traverse_Func)(gpointer data, gpointer user_data); SplayTree_Set *SplayTree_Set_create(SplayTree_CompareFunc compare_func, SplayTree_FreeFunc free_func, gpointer user_data); void SplayTree_Set_destroy(SplayTree_Set *sts); gsize SplayTree_Set_memory_usage(SplayTree_Set *sts); SplayTree *SplayTree_splay(SplayTree *st, SplayTree_Set *sts, gpointer data); SplayTree *SplayTree_insert(SplayTree *st, SplayTree_Set *sts, gpointer data); SplayTree *SplayTree_remove(SplayTree *st, SplayTree_Set *sts, gpointer data, gpointer *result); SplayTree *SplayTree_lookup(SplayTree *st, SplayTree_Set *sts, gpointer data); void SplayTree_destroy(SplayTree *st, SplayTree_Set *sts); void SplayTree_traverse(SplayTree *st, SplayTree_Set *sts, SplayTree_Traverse_Func traverse_func); /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_SPLAYTREE_H */ exonerate-2.4.0/src/struct/noitree.h0000644000175000017500000000406011162714342014362 00000000000000/****************************************************************\ * * * NOI_Tree : non-overlapping interval tree * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_NOITREE_H #define INCLUDED_NOITREE_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include "splaytree.h" typedef struct { SplayTree_Set *sts; } NOI_Tree_Set; typedef struct NOI_Tree { SplayTree *st; struct NOI_Tree *delta; } NOI_Tree; /**/ NOI_Tree_Set *NOI_Tree_Set_create(gpointer user_data); void NOI_Tree_Set_destroy(NOI_Tree_Set *nts); /**/ typedef void (*NOI_Tree_Traverse_Func)(gint start, gint length, gpointer user_data); NOI_Tree *NOI_Tree_create(NOI_Tree_Set *nts); void NOI_Tree_destroy(NOI_Tree *nt, NOI_Tree_Set *nts); void NOI_Tree_insert(NOI_Tree *nt, NOI_Tree_Set *nts, gint start, gint length); void NOI_Tree_traverse(NOI_Tree *nt, NOI_Tree_Set *nts, NOI_Tree_Traverse_Func ntf); void NOI_Tree_delta_init(NOI_Tree *nt, NOI_Tree_Set *nts); void NOI_Tree_delta_traverse(NOI_Tree *nt, NOI_Tree_Set *nts, NOI_Tree_Traverse_Func ntf); /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_NOITREE_H */ exonerate-2.4.0/src/general/0000777000175000017500000000000011222703703012715 500000000000000exonerate-2.4.0/src/general/Makefile.am0000644000175000017500000000165011222207477014676 00000000000000 TESTS = argument.test lineparse.test compoundfile.test socket.test \ jobqueue.test threadref.test noinst_PROGRAMS = $(TESTS) noinst_HEADERS = argument.h lineparse.h compoundfile.h socket.h \ jobqueue.h threadref.h INCLUDES = -DSOURCE_ROOT_DIR="\"@source_root_dir@\"" \ -DCUSTOM_GUINT64_FORMAT="\"@custom_guint64_format@\"" \ -I$(top_srcdir)/src/struct argument_test_SOURCES = argument.test.c argument.c lineparse_test_SOURCES = lineparse.test.c lineparse.c compoundfile_test_SOURCES = compoundfile.test.c compoundfile.c socket_test_SOURCES = socket.test.c socket.c jobqueue_test_SOURCES = jobqueue.test.c jobqueue.c jobqueue_test_LDADD = $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/struct/recyclebin.o threadref_test_SOURCES = threadref.test.c threadref.c threadref_test_LDADD = -lpthread # Files to clear away MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/src/general/Makefile.in0000644000175000017500000002464611222703646014720 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ TESTS = argument.test lineparse.test compoundfile.test socket.test jobqueue.test threadref.test noinst_PROGRAMS = $(TESTS) noinst_HEADERS = argument.h lineparse.h compoundfile.h socket.h jobqueue.h threadref.h INCLUDES = -DSOURCE_ROOT_DIR="\"@source_root_dir@\"" -DCUSTOM_GUINT64_FORMAT="\"@custom_guint64_format@\"" -I$(top_srcdir)/src/struct argument_test_SOURCES = argument.test.c argument.c lineparse_test_SOURCES = lineparse.test.c lineparse.c compoundfile_test_SOURCES = compoundfile.test.c compoundfile.c socket_test_SOURCES = socket.test.c socket.c jobqueue_test_SOURCES = jobqueue.test.c jobqueue.c jobqueue_test_LDADD = $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/recyclebin.o threadref_test_SOURCES = threadref.test.c threadref.c threadref_test_LDADD = -lpthread # Files to clear away MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = PROGRAMS = $(noinst_PROGRAMS) DEFS = @DEFS@ -I. -I$(srcdir) CPPFLAGS = @CPPFLAGS@ LDFLAGS = @LDFLAGS@ LIBS = @LIBS@ argument_test_OBJECTS = argument.test.o argument.o argument_test_LDADD = $(LDADD) argument_test_DEPENDENCIES = argument_test_LDFLAGS = lineparse_test_OBJECTS = lineparse.test.o lineparse.o lineparse_test_LDADD = $(LDADD) lineparse_test_DEPENDENCIES = lineparse_test_LDFLAGS = compoundfile_test_OBJECTS = compoundfile.test.o compoundfile.o compoundfile_test_LDADD = $(LDADD) compoundfile_test_DEPENDENCIES = compoundfile_test_LDFLAGS = socket_test_OBJECTS = socket.test.o socket.o socket_test_LDADD = $(LDADD) socket_test_DEPENDENCIES = socket_test_LDFLAGS = jobqueue_test_OBJECTS = jobqueue.test.o jobqueue.o jobqueue_test_DEPENDENCIES = $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/struct/recyclebin.o jobqueue_test_LDFLAGS = threadref_test_OBJECTS = threadref.test.o threadref.o threadref_test_DEPENDENCIES = threadref_test_LDFLAGS = CFLAGS = @CFLAGS@ COMPILE = $(CC) $(DEFS) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) CCLD = $(CC) LINK = $(CCLD) $(AM_CFLAGS) $(CFLAGS) $(LDFLAGS) -o $@ HEADERS = $(noinst_HEADERS) DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best SOURCES = $(argument_test_SOURCES) $(lineparse_test_SOURCES) $(compoundfile_test_SOURCES) $(socket_test_SOURCES) $(jobqueue_test_SOURCES) $(threadref_test_SOURCES) OBJECTS = $(argument_test_OBJECTS) $(lineparse_test_OBJECTS) $(compoundfile_test_OBJECTS) $(socket_test_OBJECTS) $(jobqueue_test_OBJECTS) $(threadref_test_OBJECTS) all: all-redirect .SUFFIXES: .SUFFIXES: .S .c .o .s $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps src/general/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status mostlyclean-noinstPROGRAMS: clean-noinstPROGRAMS: -test -z "$(noinst_PROGRAMS)" || rm -f $(noinst_PROGRAMS) distclean-noinstPROGRAMS: maintainer-clean-noinstPROGRAMS: .c.o: $(COMPILE) -c $< .s.o: $(COMPILE) -c $< .S.o: $(COMPILE) -c $< mostlyclean-compile: -rm -f *.o core *.core clean-compile: distclean-compile: -rm -f *.tab.c maintainer-clean-compile: argument.test: $(argument_test_OBJECTS) $(argument_test_DEPENDENCIES) @rm -f argument.test $(LINK) $(argument_test_LDFLAGS) $(argument_test_OBJECTS) $(argument_test_LDADD) $(LIBS) lineparse.test: $(lineparse_test_OBJECTS) $(lineparse_test_DEPENDENCIES) @rm -f lineparse.test $(LINK) $(lineparse_test_LDFLAGS) $(lineparse_test_OBJECTS) $(lineparse_test_LDADD) $(LIBS) compoundfile.test: $(compoundfile_test_OBJECTS) $(compoundfile_test_DEPENDENCIES) @rm -f compoundfile.test $(LINK) $(compoundfile_test_LDFLAGS) $(compoundfile_test_OBJECTS) $(compoundfile_test_LDADD) $(LIBS) socket.test: $(socket_test_OBJECTS) $(socket_test_DEPENDENCIES) @rm -f socket.test $(LINK) $(socket_test_LDFLAGS) $(socket_test_OBJECTS) $(socket_test_LDADD) $(LIBS) jobqueue.test: $(jobqueue_test_OBJECTS) $(jobqueue_test_DEPENDENCIES) @rm -f jobqueue.test $(LINK) $(jobqueue_test_LDFLAGS) $(jobqueue_test_OBJECTS) $(jobqueue_test_LDADD) $(LIBS) threadref.test: $(threadref_test_OBJECTS) $(threadref_test_DEPENDENCIES) @rm -f threadref.test $(LINK) $(threadref_test_LDFLAGS) $(threadref_test_OBJECTS) $(threadref_test_LDADD) $(LIBS) tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = src/general distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done check-TESTS: $(TESTS) @failed=0; all=0; \ srcdir=$(srcdir); export srcdir; \ for tst in $(TESTS); do \ if test -f $$tst; then dir=.; \ else dir="$(srcdir)"; fi; \ if $(TESTS_ENVIRONMENT) $$dir/$$tst; then \ all=`expr $$all + 1`; \ echo "PASS: $$tst"; \ elif test $$? -ne 77; then \ all=`expr $$all + 1`; \ failed=`expr $$failed + 1`; \ echo "FAIL: $$tst"; \ fi; \ done; \ if test "$$failed" -eq 0; then \ banner="All $$all tests passed"; \ else \ banner="$$failed of $$all tests failed"; \ fi; \ dashes=`echo "$$banner" | sed s/./=/g`; \ echo "$$dashes"; \ echo "$$banner"; \ echo "$$dashes"; \ test "$$failed" -eq 0 info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am $(MAKE) $(AM_MAKEFLAGS) check-TESTS check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall: uninstall-am all-am: Makefile $(PROGRAMS) $(HEADERS) all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-noinstPROGRAMS mostlyclean-compile \ mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-noinstPROGRAMS clean-compile clean-tags clean-generic \ mostlyclean-am clean: clean-am distclean-am: distclean-noinstPROGRAMS distclean-compile distclean-tags \ distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-noinstPROGRAMS \ maintainer-clean-compile maintainer-clean-tags \ maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: mostlyclean-noinstPROGRAMS distclean-noinstPROGRAMS \ clean-noinstPROGRAMS maintainer-clean-noinstPROGRAMS \ mostlyclean-compile distclean-compile clean-compile \ maintainer-clean-compile tags mostlyclean-tags distclean-tags \ clean-tags maintainer-clean-tags distdir check-TESTS info-am info \ dvi-am dvi check check-am installcheck-am installcheck install-exec-am \ install-exec install-data-am install-data install-am install \ uninstall-am uninstall all-redirect all-am all installdirs \ mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/src/general/argument.test.c0000644000175000017500000000537111162714300015601 00000000000000/****************************************************************\ * * * Library for command line argument processing * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "argument.h" static gchar *file_argument_handler(gchar *arg_string, gpointer data){ g_message("file handler received [%s]", arg_string); return NULL; } int Argument_main(Argument *arg){ register ArgumentSet *as_a = ArgumentSet_create("Arg set one"); register ArgumentSet *as_b = ArgumentSet_create("Arg set two"); register gint i; GPtrArray *argument_list; gboolean bool_A, bool_B; ArgumentSet_add_option(as_a, 'f', "file", "path name", "Input file for blah or blah and blah", "/dev/zero", file_argument_handler, NULL); ArgumentSet_add_option(as_a, 'o', "output", "output path", "output or whatever", "/dev/null", file_argument_handler, NULL); ArgumentSet_add_option(as_a, 'A', "bool_A", NULL, "whatever", "FALSE", Argument_parse_boolean, &bool_A); ArgumentSet_add_option(as_a, 'B', "bool_B", NULL, "whatever", "FALSE", Argument_parse_boolean, &bool_B); ArgumentSet_add_option(as_a, 'l', "list", "path list", "list of paths or something", "one", NULL, &argument_list); Argument_absorb_ArgumentSet(arg, as_a); Argument_absorb_ArgumentSet(arg, as_b); Argument_process(arg, "blah", "A program for stuff\n" "Guy St.C. Slater. guy@ebi.ac.uk December 2000.\n", NULL); for(i = 0; i < argument_list->len; i++) g_message("List [%d] = [%s]", i, (gchar*)argument_list->pdata[i]); return 0; } exonerate-2.4.0/src/general/argument.c0000644000175000017500000005536611222672275014647 00000000000000/****************************************************************\ * * * Library for command line argument processing * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "argument.h" #include /* For fprintf() */ #include /* For exit() */ #include /* For strlen() */ #include /* For isalnum() */ #include /* For gethostname() */ static ArgumentOption *ArgumentOption_create(ArgumentSet *as, gchar symbol, gchar *option, gchar *type, gchar *desc, gchar *default_string, ArgumentHandler handler, gpointer handler_data){ register ArgumentOption *ao= g_new(ArgumentOption, 1); ao->as = as; ao->symbol = symbol; ao->option = g_strdup(option); ao->arg_value = NULL; ao->desc = g_strdup(desc); ao->type = g_strdup(type); ao->default_string = g_strdup(default_string); ao->handler = handler; ao->handler_data = handler_data; ao->env_var = NULL; if(handler) ao->arg_list = NULL; else ao->arg_list = g_ptr_array_new(); return ao; } static void ArgumentOption_destroy(ArgumentOption *ao){ g_free(ao->option); g_free(ao->desc); g_free(ao->type); g_free(ao->default_string); g_free(ao->env_var); if(ao->arg_list) g_ptr_array_free(ao->arg_list, TRUE); g_free(ao); return; } static void ArgumentSet_destroy(ArgumentSet *as){ register gint i; register ArgumentOption *ao; for(i = 0; i < as->arg_option->len; i++){ ao = as->arg_option->pdata[i]; ArgumentOption_destroy(ao); } g_ptr_array_free(as->arg_option, TRUE); g_free(as->desc); g_free(as); return; } static void ArgumentOption_print_values(ArgumentOption *ao, gint use_long){ register gint i; if(ao->arg_list){ if(ao->arg_list->len){ if((!ao->default_string) || ((ao->arg_list->len == 1) && strcmp(ao->default_string, ao->arg_list->pdata[0]))){ g_print("{%s", (gchar*)ao->arg_list->pdata[0]); for(i = 1; i < ao->arg_list->len; i++) g_print(":%s", (gchar*)ao->arg_list->pdata[i]); g_print("}"); } } else { g_print(" <*** empty list ***>"); } } else { if(ao->arg_value){ if((!ao->default_string) || strcmp(ao->default_string, ao->arg_value)){ if(use_long) g_print("Using: "); g_print("<%s>", ao->arg_value); } } else { g_print(" <*** not set ***>"); } } return; } static gboolean ArgumentOption_is_set(ArgumentOption *ao){ if(ao->arg_list) return ao->arg_list->len?TRUE:FALSE; return ao->arg_value?TRUE:FALSE; } static void ArgumentOption_add_value(ArgumentOption *ao, GPtrArray *error_queue, gchar *value){ if(ao->arg_list){ g_ptr_array_add(ao->arg_list, value); } else { if(ao->arg_value) g_ptr_array_add(error_queue, g_strdup_printf("Already set --%s to \"%s\"", ao->option, ao->arg_value)); else ao->arg_value = value; } return; } static gboolean ArgumentOption_is_mandatory(ArgumentOption *ao){ return ao->default_string?FALSE:TRUE; } /**/ static gboolean ArgumentParse_boolean(gchar *arg_string){ register gint i; gchar *true_string[6] = {"Y", "T", "TRUE", "YES", "ON", "1"}, *false_string[6] = {"N", "F", "FALSE", "NO", "OFF", "0"}; for(i = 0; i < 6; i++){ if(!g_strcasecmp(arg_string, true_string[i])) return TRUE; if(!g_strcasecmp(arg_string, false_string[i])) return FALSE; } g_error("Cannot parse boolean \"%s\"", arg_string); return FALSE; } /* FIXME: change to work with error listing */ static gchar *ArgumentHandler_short_help_func(gchar *arg_string, gpointer data){ register Argument *arg = (Argument*)data; if(ArgumentParse_boolean(arg_string)) arg->show_short_help = TRUE; return NULL; } static gchar *ArgumentHandler_long_help_func(gchar *arg_string, gpointer data){ register Argument *arg = (Argument*)data; if(ArgumentParse_boolean(arg_string)) arg->show_long_help = TRUE; return NULL; } static void Argument_show_version(Argument *arg){ register gchar *branch = "$Name: $"; g_print("%s from %s version %s\n", arg->name, PACKAGE, VERSION); g_print("Using glib version %d.%d.%d\n", GLIB_MAJOR_VERSION, GLIB_MINOR_VERSION, GLIB_MICRO_VERSION); g_print("Built on %s\n", __DATE__); if(strlen(branch) >= 10) g_print("Branch: %.*s\n", (gint)(strlen(branch)-9), branch+7); return; } static gchar *ArgumentHandler_version_func(gchar *arg_string, gpointer data){ register Argument *arg = (Argument*)data; if(ArgumentParse_boolean(arg_string)){ Argument_show_version(arg); exit(1); } return NULL; } static void Argument_add_standard_options(Argument *arg){ register ArgumentSet *as = ArgumentSet_create("General Options"); ArgumentSet_add_option(as, 'h', "shorthelp", NULL, "Display compact help text", "FALSE", ArgumentHandler_short_help_func, arg); ArgumentSet_add_option(as, '\0', "help", NULL, "Displays verbose help text", "FALSE", ArgumentHandler_long_help_func, arg); ArgumentSet_add_option(as, 'v', "version", NULL, "Show version number for this program", "FALSE", ArgumentHandler_version_func, arg); Argument_absorb_ArgumentSet(arg, as); return; } /**/ typedef struct { Argument_Cleanup_Func cleanup_func; gpointer user_data; } Argument_Cleanup; void Argument_add_cleanup(Argument *arg, Argument_Cleanup_Func cleanup_func, gpointer user_data){ register Argument_Cleanup *cleanup = g_new(Argument_Cleanup, 1); cleanup->cleanup_func = cleanup_func; cleanup->user_data = user_data; g_ptr_array_add(arg->cleanup_list, cleanup); return; } static void Argument_cleanup(Argument *arg){ register Argument_Cleanup *cleanup; register gint i; for(i = 0; i < arg->cleanup_list->len; i++){ cleanup = arg->cleanup_list->pdata[i]; cleanup->cleanup_func(cleanup->user_data); g_free(cleanup); } return; } /**/ static gint Argument_strcmp_compare(gconstpointer a, gconstpointer b){ return strcmp((gchar*)a, (gchar*)b); } static Argument *Argument_create(gint argc, gchar **argv){ register Argument *arg = g_new0(Argument, 1); arg->arg_set = g_ptr_array_new(); arg->mandatory_set = g_ptr_array_new(); arg->cleanup_list = g_ptr_array_new(); arg->option_registry = g_tree_new(Argument_strcmp_compare); arg->argc = argc; arg->argv = argv; Argument_add_standard_options(arg); return arg; } static void Argument_destroy(Argument *arg){ register ArgumentSet *as; register gint i; for(i = 0; i < arg->arg_set->len; i++){ as = arg->arg_set->pdata[i]; ArgumentSet_destroy(as); } Argument_cleanup(arg); g_ptr_array_free(arg->cleanup_list, TRUE); g_ptr_array_free(arg->mandatory_set, TRUE); g_ptr_array_free(arg->arg_set, TRUE); g_tree_destroy(arg->option_registry); g_free(arg->name); g_free(arg->desc); g_free(arg); return; } static gboolean Argument_assertion_warning(void){ g_warning("Compiled with assertion checking - will run slowly"); return TRUE; } static void Argument_error_handler(const gchar *log_domain, GLogLevelFlags log_level, const gchar *message, gpointer user_data){ register Argument *arg = user_data; register gchar *stack_trace_str = (gchar*)g_getenv("EXONERATE_DEBUG_STACK_TRACE"), *debug_str = (gchar*)g_getenv("EXONERATE_DEBUG"); fprintf(stderr, "** FATAL ERROR **: %s\n", message); if(stack_trace_str && ArgumentParse_boolean(stack_trace_str)){ fprintf(stderr, "Generating stack trace ...\n"); g_on_error_stack_trace(arg->name); } if(debug_str && ArgumentParse_boolean(debug_str)){ fprintf(stderr, "Calling abort...\n"); abort(); } fprintf(stderr, "exiting ...\n"); Argument_destroy(arg); exit(1); return; } /* This is probably not what one is supposed to do with glib, * but it is to stop g_error() calling abort() and core dumping. * Maybe switch to g_critical() after glib-2 migration. */ int main(int argc, char **argv){ register Argument *arg; register gint retval; #ifdef USE_PTHREADS if(!g_thread_supported()) g_thread_init(NULL); #endif /* USE_PTHREADS */ arg = Argument_create(argc, argv); g_log_set_handler(NULL, G_LOG_LEVEL_ERROR|G_LOG_FLAG_FATAL, Argument_error_handler, arg); g_assert(Argument_assertion_warning()); retval = Argument_main(arg); Argument_destroy(arg); return retval; } static void Argument_usage(Argument *arg, gchar *synopsis){ register ArgumentOption *ao; register gint i; Argument_show_version(arg); g_print("\n%s: %s\n", arg->name, arg->desc); if(synopsis){ g_print("%s\n", synopsis); } else { g_print("Synopsis:\n--------\n%s", arg->name); for(i = 0; i < arg->mandatory_set->len; i++){ ao = arg->mandatory_set->pdata[i]; g_print(" <%s>", ao->type); } g_print("\n\n"); } return; } static void Argument_short_help(Argument *arg){ register ArgumentSet *as; register ArgumentOption *ao; register gint i, j; for(i = 0; i < arg->arg_set->len; i++){ as = arg->arg_set->pdata[i]; if(as->arg_option->len){ g_print("%s:\n", as->desc); for(j = strlen(as->desc); j > 0; j--) g_print("-"); g_print("\n"); for(j = 0; j < as->arg_option->len; j++){ ao = as->arg_option->pdata[j]; if(ao->symbol) g_print("-%c ", ao->symbol); else g_print(" "); g_print("--%s", ao->option); if(ao->default_string) g_print(" [%s]", ao->default_string); else g_print(" [mandatory]"); g_print(" "); ArgumentOption_print_values(ao, FALSE); g_print("\n"); } g_print("\n"); } } g_print("--\n"); return; } static void Argument_long_help(Argument *arg){ register ArgumentSet *as; register ArgumentOption *ao; register gint i, j; register gchar *env_value; for(i = 0; i < arg->arg_set->len; i++){ as = arg->arg_set->pdata[i]; if(as->arg_option->len){ g_print("%s:\n", as->desc); for(j = strlen(as->desc); j > 0; j--) g_print("-"); g_print("\n\n"); for(j = 0; j < as->arg_option->len; j++){ ao = as->arg_option->pdata[j]; if(ao->symbol) g_print("-%c ", ao->symbol); g_print("--%s", ao->option); if(ao->type) g_print(" <%s>", ao->type); g_print("\n%s\n", ao->desc); g_print("Environment variable: $%s", ao->env_var); env_value = (gchar*)g_getenv(ao->env_var); if(env_value){ g_print(" (Set to \"%s\")\n", env_value); } else { g_print(" (Not set)\n"); } if(ao->default_string) g_print("Default: \"%s\"\n", ao->default_string); else g_print("*** This argument is mandatory ***\n"); ArgumentOption_print_values(ao, TRUE); g_print("\n"); } } } g_print("--\n"); return; } static gint Argument_set_env_var_func(gpointer key, gpointer value, gpointer data){ register ArgumentOption *ao = (ArgumentOption*)value; register gchar *name = (gchar*)data; register gint i; ao->env_var = g_strdup_printf("%s_%s_%s", PACKAGE, name, ao->option); for(i = strlen(ao->env_var)-1; i >= 0; i--) if(isalnum(ao->env_var[i])) ao->env_var[i] = toupper(ao->env_var[i]); else ao->env_var[i] = '_'; return FALSE; } static gint Argument_traverse_registry_func(gpointer key, gpointer value, gpointer data){ register ArgumentOption *ao = (ArgumentOption*)value; register GPtrArray *error_queue = (GPtrArray*)data; register gchar *err_msg, *env_var; if(!ArgumentOption_is_set(ao)){ env_var = (gchar*)g_getenv(ao->env_var); if(env_var) ArgumentOption_add_value(ao, error_queue, env_var); } if(!ArgumentOption_is_set(ao)){ if(ArgumentOption_is_mandatory(ao)){ g_ptr_array_add(error_queue, g_strdup_printf( "No value set for mandatory argument --%s <%s>", ao->option, ao->type)); return FALSE; } else { ArgumentOption_add_value(ao, error_queue, ao->default_string); } } if(ao->handler){ err_msg = ao->handler(ao->arg_value, ao->handler_data); if(err_msg) g_ptr_array_add(error_queue, err_msg); } else { /* List */ (*((GPtrArray**)ao->handler_data)) = ao->arg_list; } return FALSE; } static ArgumentOption *Argument_process_get_option(Argument *arg, gchar *string, GPtrArray *error_queue){ register ArgumentOption *ao = NULL; register gint i; if(string[0] == '-'){ if(string[1] == '-'){ ao = g_tree_lookup(arg->option_registry, string+2); if(!ao) g_ptr_array_add(error_queue, g_strdup_printf("Unrecognised option \"%s\"", string)); } else { for(i = 1; string[i]; i++){ ao = arg->symbol_registry[(guchar)string[i]]; if(!ao) g_error("Unknown flag [%c] in argument [%s]", string[i], string); if(string[i+1]) /* If not last symbol */ ArgumentOption_add_value(ao, error_queue, "TRUE"); } } } return ao; } void Argument_info(Argument *arg){ register gchar *cl = g_strjoinv(" ", arg->argv); gchar hostname[1024]; g_print("Command line: [%s]\n", cl); g_free(cl); /**/ gethostname(hostname, 1024); g_print("Hostname: [%s]\n", hostname); return; } void Argument_process(Argument *arg, gchar *name, gchar *desc, gchar *synopsis){ register gint i; register GPtrArray *unflagged_arg = g_ptr_array_new(); register ArgumentOption *ao; register GPtrArray *error_queue = g_ptr_array_new(); arg->desc = desc?g_strdup(desc):g_strdup(""); arg->name = g_strdup(name); g_tree_traverse(arg->option_registry, Argument_set_env_var_func, G_IN_ORDER, arg->name); if(arg->mandatory_set->len && (arg->argc <= 1)){ Argument_usage(arg, synopsis); exit(1); } for(i = 1; i < arg->argc; i++){ ao = Argument_process_get_option(arg, arg->argv[i], error_queue); if(ao){ /* -? */ if(ao->type){ /* Not boolean */ if((i+1) == arg->argc){ g_ptr_array_add(error_queue, g_strdup_printf("No argument supplied with %s", arg->argv[i])); } else { if(ao->handler){ /* Not list */ ArgumentOption_add_value(ao, error_queue, arg->argv[++i]); } else { /* Is list */ do { ArgumentOption_add_value(ao, error_queue, arg->argv[i+1]); i++; } while(((i+1) < arg->argc) && (arg->argv[i+1][0] != '-')); } } } else { /* Is boolean */ if((i+1) == arg->argc){ ArgumentOption_add_value(ao, error_queue, "TRUE"); } else { if(arg->argv[i+1][0] == '-'){ ArgumentOption_add_value(ao, error_queue, "TRUE"); } else { ArgumentOption_add_value(ao, error_queue, arg->argv[i+1]); i++; } } } } else { /* unflagged argument */ g_ptr_array_add(unflagged_arg, arg->argv[i]); } } if(arg->mandatory_set->len < unflagged_arg->len){ g_ptr_array_add(error_queue, g_strdup_printf("Too many unflagged arguments")); } else { for(i = 0; i < unflagged_arg->len; i++){ ao = arg->mandatory_set->pdata[i]; ArgumentOption_add_value(ao, error_queue, unflagged_arg->pdata[i]); } } g_ptr_array_free(unflagged_arg, TRUE); g_tree_traverse(arg->option_registry, Argument_traverse_registry_func, G_IN_ORDER, error_queue); if((!(arg->show_short_help | arg->show_long_help) && error_queue->len)){ Argument_usage(arg, synopsis); g_print("--\n" "%d ERROR%s encountered in argument processing\n" "--\n", error_queue->len, (error_queue->len > 1)?"S were":" was"); for(i = 0; i < error_queue->len; i++){ g_print("[ %d ] : %s\n", i+1, (gchar*)error_queue->pdata[i]); g_free(error_queue->pdata[i]); } g_print("--\n\n" "Use -h or --help for more information on usage\n" "\n"); exit(1); } if(arg->show_short_help){ Argument_usage(arg, synopsis); Argument_short_help(arg); exit(1); } if(arg->show_long_help){ Argument_usage(arg, synopsis); Argument_long_help(arg); exit(1); } g_ptr_array_free(error_queue, TRUE); return; } /* FIXME: tidy */ ArgumentSet *ArgumentSet_create(gchar *desc){ register ArgumentSet *as = g_new(ArgumentSet, 1); g_assert(desc); as->desc = g_strdup(desc); as->arg_option = g_ptr_array_new(); return as; } void Argument_absorb_ArgumentSet(Argument *arg, ArgumentSet *as){ register ArgumentOption *ao; register gint i; g_ptr_array_add(arg->arg_set, as); for(i = 0; i < as->arg_option->len; i++){ ao = as->arg_option->pdata[i]; if(!ao->default_string) g_ptr_array_add(arg->mandatory_set, ao); if(ao->symbol){ /* Check option not already used */ g_assert(!arg->symbol_registry[(guchar)ao->symbol]); arg->symbol_registry[(guchar)ao->symbol] = ao; } g_assert(!g_tree_lookup(arg->option_registry, ao)); g_tree_insert(arg->option_registry, ao->option, ao); } return; } void ArgumentSet_add_option(ArgumentSet *as, gchar symbol, gchar *option, gchar *type, gchar *desc, gchar *default_string, ArgumentHandler handler, gpointer handler_data){ register ArgumentOption *ao = ArgumentOption_create( as, symbol, option, type, desc, default_string, handler, handler_data); g_ptr_array_add(as->arg_option, ao); return; } gchar *Argument_parse_string(gchar *arg_string, gpointer data){ register gchar **dst_string = (gchar**)data; if(!strcasecmp(arg_string, "NULL")) (*dst_string) = NULL; else (*dst_string) = arg_string; return NULL; } gchar *Argument_parse_char(gchar *arg_string, gpointer data){ register gchar *dst_char = (gchar*)data; if((!arg_string[0]) || (arg_string[1])) return g_strdup_printf( "Expected single character argument not [%s]", arg_string); (*dst_char) = arg_string[0]; return NULL; } gchar *Argument_parse_int(gchar *arg_string, gpointer data){ register gint *dst_int = (gint*)data; (*dst_int) = atoi(arg_string); return NULL; } gchar *Argument_parse_float(gchar *arg_string, gpointer data){ register gfloat *dst_float = (gfloat*)data; (*dst_float) = atof(arg_string); return NULL; } gchar *Argument_parse_boolean(gchar *arg_string, gpointer data){ register gboolean *dst_boolean = (gboolean*)data; (*dst_boolean) = ArgumentParse_boolean(arg_string); return NULL; } /**/ exonerate-2.4.0/src/general/lineparse.test.c0000644000175000017500000000321511162714301015735 00000000000000/****************************************************************\ * * * Library for basic line-by-line parsing of text files. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "lineparse.h" int main(void){ register gchar *path = g_strconcat(SOURCE_ROOT_DIR, G_DIR_SEPARATOR_S, "src", G_DIR_SEPARATOR_S, "general", G_DIR_SEPARATOR_S, __FILE__, NULL); register FILE *fp = fopen(path, "r"); register LineParse *lp; register int i; if(!fp) g_error("Could not open file [%s]", path); lp = LineParse_create(fp); while(LineParse_word(lp) != EOF){ g_print("Line"); for(i = 0; i < lp->word->len; i++) g_print("-[%s]", (gchar*)lp->word->pdata[i]); g_print("\n"); } LineParse_destroy(lp); fclose(fp); g_free(path); return 0; } exonerate-2.4.0/src/general/lineparse.c0000644000175000017500000000555511216464026014776 00000000000000/****************************************************************\ * * * Library for basic line-by-line parsing of text files. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "lineparse.h" #include /* For isspace() */ LineParse *LineParse_create(FILE *fp){ register LineParse *lp = g_new(LineParse, 1); lp->fp = fp; lp->line = g_string_sized_new(128); lp->word = g_ptr_array_new(); return lp; } void LineParse_destroy(LineParse *lp){ g_ptr_array_free(lp->word, TRUE); g_string_free(lp->line, TRUE); g_free(lp); return; } gint LineParse_line(LineParse *lp){ register int ch; g_string_truncate(lp->line, 0); while((ch = getc(lp->fp)) != EOF){ /* read a line */ if(ch == '\n') return lp->line->len; g_string_append_c(lp->line, ch); } return lp->line->len?lp->line->len:EOF; } gint LineParse_word(LineParse *lp){ register guchar *prev, *ptr; g_ptr_array_set_size(lp->word, 0); switch(LineParse_line(lp)){ case 0: return 0; case EOF: return EOF; default: break; } /* skip start */ for(ptr = (guchar*)lp->line->str; isspace(*ptr); ptr++); prev = ptr; while(*ptr){ if(isspace(*ptr)){ *ptr = '\0'; do ptr++; while(isspace(*ptr)); if(!*ptr) break; g_ptr_array_add(lp->word, prev); /* add a word */ prev = ptr; } ptr++; } if(prev != ptr) g_ptr_array_add(lp->word, prev); /* add final word */ return lp->word->len; } gint LineParse_seek(LineParse *lp, gchar *pattern){ register gchar *ptr = pattern; register gint ch, total = 0; while((ch = getc(lp->fp)) != EOF){ if(*ptr == ch) ptr++; else /* Allow mismatch on pattern[0] */ ptr = pattern + ((*pattern == ch)?1:0); total++; if(!*ptr) return total; } return EOF; } /* Moves lp->fp to end of next occurrence of pattern. Returns number of characters skipped or EOF if pattern is not found. */ exonerate-2.4.0/src/general/compoundfile.test.c0000644000175000017500000000626311162714300016444 00000000000000/****************************************************************\ * * * Library for reading large and/or split files * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "compoundfile.h" int main(void){ register gchar *path_a = g_strconcat(SOURCE_ROOT_DIR, G_DIR_SEPARATOR_S, "src", G_DIR_SEPARATOR_S, "general", G_DIR_SEPARATOR_S, "compoundfile.h", NULL); register gchar *path_b = g_strconcat(SOURCE_ROOT_DIR, G_DIR_SEPARATOR_S, "src", G_DIR_SEPARATOR_S, "general", G_DIR_SEPARATOR_S, "compoundfile.c", NULL); register gchar *path_c = g_strconcat(SOURCE_ROOT_DIR, G_DIR_SEPARATOR_S, "src", G_DIR_SEPARATOR_S, "general", G_DIR_SEPARATOR_S, "compoundfile.test.c", NULL); register CompoundFile *cf; register gint ch, total = 0; register GPtrArray *path_list = g_ptr_array_new(); register CompoundFile_Pos total_length; register CompoundFile_Location *cfl_start, *cfl_stop; g_ptr_array_add(path_list, path_a); g_ptr_array_add(path_list, path_b); g_ptr_array_add(path_list, path_c); cf = CompoundFile_create(path_list, TRUE); g_message("Compound file:"); while((ch = CompoundFile_getc(cf)) != EOF){ g_print("%c", ch); total++; } g_message("--"); g_message("Compound file again:"); CompoundFile_rewind(cf); while((ch = CompoundFile_getc(cf)) != EOF) g_print("%c", ch); g_message("--"); /* Read central half of compound file */ total_length = CompoundFile_get_length(cf); cfl_start = CompoundFile_Location_from_pos(cf, total_length/4); cfl_stop = CompoundFile_Location_from_pos(cf, (total_length/4)*3); CompoundFile_set_limits(cf, cfl_start, cfl_stop); g_print("File truncated from ["); CompoundFile_Pos_print(stdout, total_length); g_print("] to ["); CompoundFile_Pos_print(stdout, CompoundFile_get_length(cf)); g_print("]\n"); CompoundFile_Location_destroy(cfl_start); CompoundFile_Location_destroy(cfl_stop); g_message("Middle half:"); while((ch = CompoundFile_getc(cf)) != EOF) g_print("%c", ch); g_message("\n--"); /**/ CompoundFile_destroy(cf); g_ptr_array_free(path_list, TRUE); g_free(path_a); g_free(path_b); g_free(path_c); return 0; } exonerate-2.4.0/src/general/compoundfile.c0000644000175000017500000003115411176332256015476 00000000000000/****************************************************************\ * * * Library for reading large and/or split files. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "compoundfile.h" #include /* For qsort() */ #include #include /* For strerror() */ /**/ static CompoundFile_Element *CompoundFile_Element_create( gchar *path, CompoundFile_Pos length){ register CompoundFile_Element *cfe = g_new(CompoundFile_Element, 1); cfe->path = g_strdup(path); cfe->length = length; return cfe; } static void CompoundFile_Element_destroy(CompoundFile_Element *cfe){ g_free(cfe->path); g_free(cfe); return; } /**/ static FILE *CompoundFile_open_file(gchar *path){ register FILE *fp = fopen(path, "r"); if(!fp) g_error("Could not open [%s] : %s", path, strerror(errno)); return fp; } static void CompoundFile_clear_buffers(CompoundFile *cf, CompoundFile_Pos pos){ cf->in_buffer_full = 0; cf->in_buffer_pos = 0; cf->in_buffer_start = pos; return; } static void CompoundFile_set_file(CompoundFile *cf, gint element_id){ register CompoundFile_Element *cfe; if(cf->curr_element_id == element_id) return; if(cf->fp) fclose(cf->fp); cfe = cf->element_list->pdata[element_id]; cf->fp = CompoundFile_open_file(cfe->path); cf->curr_element_id = element_id; CompoundFile_clear_buffers(cf, 0); return; } static CompoundFile_Pos CompoundFile_size_from_path(gchar *path){ register FILE *fp; register CompoundFile_Pos size; fp = CompoundFile_open_file(path); size = lseek(fileno(fp), 0, SEEK_END); if(size == -1) g_error("Could not seek in file [%s] (%s)", path, strerror(errno)); fclose(fp); return size; } static int CompoundFile_sort_path_list(const void *a, const void *b){ register CompoundFile_Element **element_a = (CompoundFile_Element**)a, **element_b = (CompoundFile_Element**)b; register gint ret_val = (*element_a)->length - (*element_b)->length; if(!ret_val) ret_val = strcmp((*element_b)->path, (*element_a)->path); if(!ret_val) g_error("Duplicate file [%s] in input", (*element_a)->path); return ret_val; } /* Sorted incase shell order is not preserved, * to check against duplicate files, * and to search smallest files first for best performance * with dynamic thresholds. */ static void CompoundFile_sort(CompoundFile *cf){ qsort(cf->element_list->pdata, cf->element_list->len, sizeof(gpointer), CompoundFile_sort_path_list); return; } CompoundFile *CompoundFile_create(GPtrArray *path_list, gboolean sort_on_file_size){ register CompoundFile *cf = g_new(CompoundFile, 1); register CompoundFile_Element *cfe; register CompoundFile_Pos length; register gint i; register gchar *path; g_assert(path_list); g_assert(path_list->len); cf->ref_count = 1; cf->element_list = g_ptr_array_new(); cf->fp = NULL; cf->curr_element_id = -1; /* Expand any path list to include any directories */ for(i = 0; i < path_list->len; i++){ path = path_list->pdata[i]; /* Finding the size also checks that the file exists */ length = CompoundFile_size_from_path(path); cfe = CompoundFile_Element_create(path, length); g_ptr_array_add(cf->element_list, cfe); } cf->start_limit = cf->stop_limit = NULL; if(sort_on_file_size) CompoundFile_sort(cf); CompoundFile_rewind(cf); #ifdef USE_PTHREADS pthread_mutex_init(&cf->compoundfile_mutex, NULL); #endif /* USE_PTHREADS */ return cf; } void CompoundFile_destroy(CompoundFile *cf){ register gint i; register CompoundFile_Element *cfe; if(--cf->ref_count) return; for(i = 0; i < cf->element_list->len; i++){ cfe = cf->element_list->pdata[i]; CompoundFile_Element_destroy(cfe); } g_ptr_array_free(cf->element_list, TRUE); fclose(cf->fp); if(cf->start_limit) CompoundFile_Location_destroy(cf->start_limit); if(cf->stop_limit) CompoundFile_Location_destroy(cf->stop_limit); #ifdef USE_PTHREADS pthread_mutex_destroy(&cf->compoundfile_mutex); #endif /* USE_PTHREADS */ g_free(cf); return; } CompoundFile *CompoundFile_share(CompoundFile *cf){ cf->ref_count++; return cf; } static GPtrArray *CompoundFile_get_path_list(CompoundFile *cf){ register gint i; register gchar *path; register GPtrArray *path_list = g_ptr_array_new(); register CompoundFile_Element *element; for(i = 0; i < cf->element_list->len; i++){ element = cf->element_list->pdata[i]; path = g_strdup(element->path); g_ptr_array_add(path_list, path); } return path_list; } CompoundFile *CompoundFile_dup(CompoundFile *cf){ register CompoundFile *ncf; register CompoundFile_Location *nstart = NULL, *nstop = NULL; register gint i; register gchar *path; register GPtrArray *path_list = CompoundFile_get_path_list(cf); g_assert(path_list); ncf = CompoundFile_create(path_list, FALSE); if(cf->start_limit) nstart = CompoundFile_Location_create(ncf, cf->start_limit->pos, cf->start_limit->element_id); if(cf->stop_limit) nstop = CompoundFile_Location_create(ncf, cf->stop_limit->pos, cf->stop_limit->element_id); CompoundFile_set_limits(ncf, nstart, nstop); if(nstart) CompoundFile_Location_destroy(nstart); if(nstop) CompoundFile_Location_destroy(nstop); for(i = 0; i < path_list->len; i++){ path = path_list->pdata[i]; g_free(path); } g_ptr_array_free(path_list, TRUE); return ncf; } gchar *CompoundFile_current_path(CompoundFile *cf){ register CompoundFile_Element *cfe = cf->element_list->pdata[cf->curr_element_id]; return cfe->path; } gint CompoundFile_buffer_reload(CompoundFile *cf){ register gsize read_size = sizeof(gchar)*COMPOUND_FILE_BUFFER_SIZE; g_assert(cf); cf->in_buffer_start += cf->in_buffer_full; if(cf->stop_limit) if(cf->curr_element_id == cf->stop_limit->element_id) if((cf->in_buffer_start + read_size) > cf->stop_limit->pos) read_size = cf->stop_limit->pos - cf->in_buffer_start; cf->in_buffer_full = read(fileno(cf->fp), cf->in_buffer, read_size); cf->in_buffer_pos = 0; if(!cf->in_buffer_full){ if(cf->stop_limit){ if(cf->curr_element_id == cf->stop_limit->element_id) return EOF; } else { if((cf->curr_element_id + 1) == cf->element_list->len) return EOF; } CompoundFile_set_file(cf, cf->curr_element_id+1); return CompoundFile_getc(cf); } return cf->in_buffer[cf->in_buffer_pos++]; } gboolean CompoundFile_is_finished(CompoundFile *cf){ g_assert(cf); if(!CompoundFile_buffer_is_empty(cf)) return FALSE; if(cf->in_buffer_full == (sizeof(gchar)*COMPOUND_FILE_BUFFER_SIZE)) return FALSE; /* At end of file */ if((cf->curr_element_id+1) < cf->element_list->len) return FALSE; return TRUE; } void CompoundFile_rewind(CompoundFile *cf){ if(cf->start_limit){ CompoundFile_Location_seek(cf->start_limit); } else { CompoundFile_set_file(cf, 0); CompoundFile_seek(cf, 0); } return; } void CompoundFile_seek(CompoundFile *cf, CompoundFile_Pos pos){ register CompoundFile_Element *cfe; if(lseek(fileno(cf->fp), pos, SEEK_SET) != pos){ cfe = cf->element_list->pdata[cf->curr_element_id]; g_error("Could not seek in file [%s] (%s)", cfe->path, strerror(errno)); } CompoundFile_clear_buffers(cf, pos); return; } /* FIXME: optimisation: should not move within current buffer * when possible */ CompoundFile_Pos CompoundFile_get_length(CompoundFile *cf){ register gint i; register CompoundFile_Pos length = 0; register CompoundFile_Element *cfe; register gint start_element, stop_element; start_element = cf->start_limit?cf->start_limit->element_id :0; stop_element = cf->stop_limit?(cf->stop_limit->element_id + 1) :cf->element_list->len; for(i = start_element; i < stop_element; i++){ cfe = cf->element_list->pdata[i]; length += cfe->length; } if(cf->start_limit) length -= cf->start_limit->pos; if(cf->stop_limit){ cfe = cf->element_list->pdata[cf->stop_limit->element_id]; length -= (cfe->length - cf->stop_limit->pos); } return length; } CompoundFile_Pos CompoundFile_get_max_element_length(CompoundFile *cf){ register CompoundFile_Pos max = 0; register CompoundFile_Element *cfe; register gint i; for(i = 0; i < cf->element_list->len; i++){ cfe = cf->element_list->pdata[i]; if(max < cfe->length) max = cfe->length; } return max; } /**/ gint CompoundFile_read(CompoundFile *cf, gchar *buf, gint length){ g_error("ni"); /* FIXME: do unbuffered read, line reload() */ return -1; } /**/ CompoundFile_Location *CompoundFile_Location_create(CompoundFile *cf, CompoundFile_Pos pos, gint element_id){ register CompoundFile_Location *cfl = g_new(CompoundFile_Location, 1); g_assert(cf); cfl->ref_count = 1; cfl->cf = CompoundFile_share(cf); cfl->pos = pos, cfl->element_id = element_id; return cfl; } CompoundFile_Location *CompoundFile_Location_tell(CompoundFile *cf){ return CompoundFile_Location_create(cf, CompoundFile_ftell(cf), cf->curr_element_id); } void CompoundFile_Location_destroy(CompoundFile_Location *cfl){ g_assert(cfl); if(--cfl->ref_count) return; CompoundFile_destroy(cfl->cf); g_free(cfl); return; } CompoundFile_Location *CompoundFile_Location_share( CompoundFile_Location *cfl){ g_assert(cfl); cfl->ref_count++; return cfl; } void CompoundFile_Location_seek(CompoundFile_Location *cfl){ g_assert(cfl); CompoundFile_set_file(cfl->cf, cfl->element_id); CompoundFile_seek(cfl->cf, cfl->pos); return; } /**/ CompoundFile_Location *CompoundFile_Location_from_pos( CompoundFile *cf, CompoundFile_Pos pos){ register CompoundFile_Location *cfl = g_new(CompoundFile_Location, 1); register gint i; register CompoundFile_Element *cfe; cfl->ref_count = 1; cfl->cf = CompoundFile_share(cf); cfl->pos = pos; cfl->element_id = -1; /* Find which file this position should be in */ for(i = 0; i < cf->element_list->len; i++){ cfe = cf->element_list->pdata[i]; if((cfl->pos - cfe->length) < 0){ cfl->element_id = i; break; } else { cfl->pos -= cfe->length; } } g_assert(cfl->element_id != -1); return cfl; } void CompoundFile_set_limits(CompoundFile *cf, CompoundFile_Location *start, CompoundFile_Location *stop){ if(cf->start_limit) CompoundFile_Location_destroy(cf->start_limit); if(cf->stop_limit) CompoundFile_Location_destroy(cf->stop_limit); if(start) cf->start_limit = CompoundFile_Location_share(start); if(stop) cf->stop_limit = CompoundFile_Location_share(stop); CompoundFile_rewind(cf); return; } /**/ exonerate-2.4.0/src/general/socket.test.c0000644000175000017500000000600211162714301015240 00000000000000/****************************************************************\ * * * Simple client-server code library * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include #include #include #include "socket.h" static gchar *read_next_line(gchar *prompt){ gchar buffer[1024]; register gchar *msg = NULL; g_print("%s", prompt); if(!fgets(buffer, 1024, stdin)) g_error("Didn't read anything"); if(buffer[0] != '\n'){ msg = g_strdup(buffer); g_message("ret [%s]", msg); } return msg; } static void run_client(gchar *host, gint port){ register SocketClient *sc = SocketClient_create(host, port); register gchar *msg, *reply; while(TRUE){ msg = read_next_line("> "); g_message("client going to send [%s]", msg); if(!msg) continue; if(!strncmp(msg, "quit client", 11)) break; reply = SocketClient_send(sc, msg); if(reply){ g_print(" reply [%s]", reply); g_free(reply); } g_free(msg); } SocketClient_destroy(sc); return; } static gboolean test_server_func(gchar *msg, gchar **reply, gpointer connection_data, gpointer user_data){ g_message("server received [%s]", msg); (*reply) = g_strdup_printf("msg received[%s]", msg); return strncmp(msg, "close", 11)?FALSE:TRUE; } static void run_server(gint port){ register SocketServer *ss = SocketServer_create(port, 2, test_server_func, NULL, NULL, NULL); while(SocketServer_listen(ss)); SocketServer_destroy(ss); return; } int main(int argc, char **argv){ register gboolean be_client; register gint port; register gchar *host; if(argc != 4){ g_warning("Usage: socket.test host port"); return 0; /* exit quietly to please make check */ } be_client = (argv[1][0] == 'c'); host = argv[2]; port = atoi(argv[3]); g_message("client [%s] host [%s] port [%d]", be_client?"yes":"no", host, port); if(be_client) run_client(host, port); else run_server(port); return 0; } exonerate-2.4.0/src/general/socket.c0000644000175000017500000003234711222643064014301 00000000000000/****************************************************************\ * * * Simple client-server code library * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include #include /* For bcopy() */ #include /* For close() */ #include #include #include #include #include #include #include #include #include /* For sigaction() */ #include /* For wait() */ #include "socket.h" #define Socket_BUFSIZE BUFSIZ static SocketConnection *SocketConnection_create(gchar *host, gint port){ register SocketConnection *connection = g_new(SocketConnection, 1); register gint result; int flag = 1; if((connection->sock = socket(AF_INET, SOCK_STREAM, 0)) < 0){ perror("opening stream socket"); exit(1); } connection->host = g_strdup(host); connection->port = port; result = setsockopt(connection->sock, IPPROTO_TCP, TCP_NODELAY, (char*)&flag, sizeof(int)); if(result < 0) perror("setting socket options"); return connection; } SocketClient *SocketClient_create(gchar *host, gint port){ register SocketClient *client = g_new(SocketClient, 1); struct sockaddr_in server; struct hostent *hp = gethostbyname(host); register gchar *reply; #ifdef USE_PTHREADS pthread_mutex_init(&client->connection_mutex, NULL); #endif /* USE_PTHREADS */ if(!hp){ perror("looking up hostname"); exit(1); } client->connection = SocketConnection_create(host, port); #if 0 /* Make non-blocking */ if(fcntl(client->connection->sock, F_SETFL, O_NDELAY) == -1){ perror("Tcp: Could not set O_NDELAY on socket with fcntl"); exit(1); } #endif /* 0 */ server.sin_family = AF_INET; server.sin_port = htons(port); bcopy(hp->h_addr, &server.sin_addr, hp->h_length); if(connect(client->connection->sock, (struct sockaddr*)&server, sizeof(server)) < 0){ perror("connecting client socket"); exit(1); } /* Send a packet to the server to test the connection. * If we get any reply, the connection is OK. * * FIXME: There should be a better way of doing this, * but I can't find a method that works portably. */ reply = SocketClient_send(client, "ping"); if(reply){ g_free(reply); } else { SocketClient_destroy(client); return NULL; } return client; } static void SocketConnection_destroy(SocketConnection *connection){ if(close(connection->sock) == -1){ perror("closing socket"); exit(1); } g_free(connection->host); g_free(connection); return; } static gchar *SocketConnection_read(gint sock){ register gint i, len = 0, line_complete = 0, line_expect = 1; register gchar *reply; register GString *string = g_string_sized_new(Socket_BUFSIZE); register gboolean line_count_given = FALSE; gchar buffer[Socket_BUFSIZE+1]; do { if((len = recv(sock, buffer, Socket_BUFSIZE, 0)) <= 0){ len = 0; break; } buffer[len] = '\0'; for(i = 0; i < len; i++) if(buffer[i] == '\n') line_complete++; if((!string->len) && (len > 10)) if(!strncmp(buffer, "linecount:", 10)){ line_expect = atoi(&buffer[11]); line_count_given = TRUE; if(line_expect < 2) g_error("linecount: must be > 1"); } if(line_complete > line_expect){ if(line_count_given){ g_error("Received [%d] socket message lines, but expected [%d]", line_complete, line_count_given); } else { g_error("Multiline socket messages must use linecount:"); } } g_string_append(string, buffer); } while(line_complete < line_expect); if(string->len){ reply = string->str; g_string_free(string, FALSE); return reply; } g_string_free(string, TRUE); return NULL; } static void Socket_send_msg(gint sock, gchar *msg, gchar *err_msg){ register gint start = 0, len, msglen = strlen(msg); do { if((len = send(sock, msg+start, msglen-start, 0)) < 0){ perror(err_msg); exit(1); } start += len; } while(start < msglen); return; } static void Socket_send(gint sock, gchar *msg, gchar *err_msg){ register gint i, line_count = 0; register GString *full_msg = g_string_sized_new(strlen(msg)+32); for(i = 0; msg[i]; i++) if(msg[i] == '\n') line_count++; if(line_count > 1) g_string_sprintfa(full_msg, "linecount: %d\n", line_count+1); g_string_sprintfa(full_msg, "%s", msg); if(!line_count) g_string_append_c(full_msg, '\n'); Socket_send_msg(sock, full_msg->str, err_msg); g_string_free(full_msg, TRUE); return; } gchar *SocketClient_send(SocketClient *client, gchar *msg){ register gchar *reply; #ifdef USE_PTHREADS pthread_mutex_lock(&client->connection_mutex); #endif /* USE_PTHREADS */ Socket_send(client->connection->sock, msg, "writing client message"); reply = SocketConnection_read(client->connection->sock); g_assert(reply); #ifdef USE_PTHREADS pthread_mutex_unlock(&client->connection_mutex); #endif /* USE_PTHREADS */ return reply; } void SocketClient_destroy(SocketClient *client){ SocketConnection_destroy(client->connection); #ifdef USE_PTHREADS pthread_mutex_destroy(&client->connection_mutex); #endif /* USE_PTHREADS */ g_free(client); return; } static gint global_connection_count = 0; static void SocketServer_reap_dead_children(int signum){ signal(SIGCHLD, SIG_IGN); while(waitpid(-1, NULL, WNOHANG) > 0) g_message("Cleaned up dead child process [%d]", global_connection_count); signal(SIGCHLD, SocketServer_reap_dead_children); global_connection_count--; return; } static void SocketServer_shutdown(int signum){ g_message("Server shutting down"); exit(0); return; } static void SocketServer_broken_pipe(int signum){ g_message("Server detected broken pipe - closing thread"); pthread_exit(NULL); return; } SocketServer *SocketServer_create(gint port, gint max_connections, SocketProcessFunc server_process_func, SocketConnectionOpenFunc connection_open_func, SocketConnectionCloseFunc connection_close_func, gpointer user_data){ register SocketServer *server = g_new(SocketServer, 1); struct sockaddr_in sock_server; socklen_t len = sizeof(sock_server); server->connection = SocketConnection_create("localhost", port); g_assert(server_process_func); server->server_process_func = server_process_func; server->connection_open_func = connection_open_func; server->connection_close_func = connection_close_func; server->user_data = user_data; server->max_connections = max_connections; sock_server.sin_family = AF_INET; sock_server.sin_addr.s_addr = INADDR_ANY; sock_server.sin_port = htons(port); if(bind(server->connection->sock, (struct sockaddr*)&sock_server, len)){ perror("binding stream socket"); exit(1); } if(getsockname(server->connection->sock, (struct sockaddr*)&sock_server, &len)){ perror("getting socket name"); exit(1); } server->connection->port = ntohs(sock_server.sin_port); signal(SIGCHLD, SocketServer_reap_dead_children); signal(SIGTERM, SocketServer_shutdown); #ifdef USE_PTHREADS server->sspd = g_new0(SocketServer_pthread_Data, max_connections); pthread_mutex_init(&server->connection_mutex, NULL); #endif /* USE_PTHREADS */ return server; } static void SocketServer_process_connection(SocketServer *server, int msgsock){ register gpointer connection_data = server->connection_open_func ? server->connection_open_func(server->user_data) : NULL; register gchar *msg; register gboolean ok = TRUE; gchar *reply; do { #ifdef USE_PTHREADS pthread_mutex_lock(&server->connection_mutex); #endif /* USE_PTHREADS */ msg = SocketConnection_read(msgsock); #ifdef USE_PTHREADS pthread_mutex_unlock(&server->connection_mutex); #endif /* USE_PTHREADS */ reply = NULL; if(!msg) break; ok = server->server_process_func(msg, &reply, connection_data, server->user_data); g_free(msg); if(reply){ #ifdef USE_PTHREADS pthread_mutex_lock(&server->connection_mutex); #endif /* USE_PTHREADS */ Socket_send(msgsock, reply, "writing reply"); #ifdef USE_PTHREADS pthread_mutex_unlock(&server->connection_mutex); #endif /* USE_PTHREADS */ g_free(reply); } else { g_error("no reply from server"); } } while(ok); if(server->connection_close_func) server->connection_close_func(connection_data, server->user_data); return; } #ifdef USE_PTHREADS static void *SocketServer_pthread_func(void* data){ register SocketServer_pthread_Data *sspd = (SocketServer_pthread_Data*)data; signal(SIGPIPE, SocketServer_broken_pipe); SocketServer_process_connection(sspd->server, sspd->msgsock); pthread_mutex_lock(&sspd->server->connection_mutex); g_message("cleaning up connection [%d]", global_connection_count); global_connection_count--; close(sspd->msgsock); sspd->in_use = FALSE; pthread_mutex_unlock(&sspd->server->connection_mutex); pthread_exit(NULL); return NULL; } #endif /* USE_PTHREADS */ gboolean SocketServer_listen(SocketServer *server){ register int msgsock; struct sockaddr_in client_addr; socklen_t client_len = sizeof(struct sockaddr_in); #ifdef USE_PTHREADS register gint i; pthread_attr_t pt_attr; pthread_attr_init(&pt_attr); pthread_attr_setdetachstate(&pt_attr, PTHREAD_CREATE_DETACHED); #endif /* USE_PTHREADS */ /**/ listen(server->connection->sock, 5); msgsock = accept(server->connection->sock, (struct sockaddr*)&client_addr, &client_len); if(msgsock == -1){ perror("server accept"); close(msgsock); exit(1); } /* FIXME: send an busy warning message back to the client ? */ #ifdef USE_PTHREADS pthread_mutex_lock(&server->connection_mutex); if(global_connection_count >= server->max_connections){ pthread_mutex_unlock(&server->connection_mutex); g_message("Max connections reached"); } else { for(i = 0; i < server->max_connections; i++) if(!server->sspd[i].in_use){ server->sspd[i].in_use = TRUE; break; } server->sspd[i].server = server; server->sspd[i].msgsock = dup(msgsock); if(server->sspd[i].msgsock < 0) perror("duplicating socket for pthread"); global_connection_count++; g_message("opened connection [%d/%d] from [%s]", global_connection_count, server->max_connections, inet_ntoa(client_addr.sin_addr)); pthread_mutex_unlock(&server->connection_mutex); pthread_create(&server->sspd[i].thread, &pt_attr, SocketServer_pthread_func, (void*)&server->sspd[i]); } #else /* USE_PTHREADS */ if(global_connection_count >= server->max_connections){ g_message("Max connections reached"); } else { if(fork() == 0){ SocketServer_process_connection(server, msgsock); exit(0); } else { global_connection_count++; g_message("opened connection [%d/%d] from [%s]", global_connection_count, server->max_connections, inet_ntoa(client_addr.sin_addr)); } } #endif /* USE_PTHREADS */ close(msgsock); return TRUE; /* Keep server running */ } void SocketServer_destroy(SocketServer *server){ SocketConnection_destroy(server->connection); #ifdef USE_PTHREADS g_free(server->sspd); pthread_mutex_destroy(&server->connection_mutex); #endif /* USE_PTHREADS */ g_free(server); return; } exonerate-2.4.0/src/general/jobqueue.test.c0000644000175000017500000000330611203020606015565 00000000000000/****************************************************************\ * * * Library for Queueing Multi-Threaded Jobs * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For usleep() */ #include "jobqueue.h" static void test_run_func(gpointer job_data){ register gchar *name = job_data; g_message("running job [%s]", name); usleep(1000000); /* test job length */ return; } int main(void){ register gint i; register JobQueue *jq = JobQueue_create(2); gchar *test_str[10] = {"one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten"}; /* Initialise threads as not using Argument library */ #ifdef USE_PTHREADS if(!g_thread_supported()) g_thread_init(NULL); #endif /* USE_PTHREADS */ for(i = 0; i < 10; i++) JobQueue_submit(jq, test_run_func, test_str[i], 0); JobQueue_complete(jq); JobQueue_destroy(jq); return 0; } exonerate-2.4.0/src/general/jobqueue.c0000644000175000017500000001137611222643067014632 00000000000000/****************************************************************\ * * * Library for Queueing Multi-Threaded Jobs * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "jobqueue.h" #include /* For usleep() */ #ifdef USE_PTHREADS static JobQueue_Task *JobQueue_Task_create(JobQueue_Func job_func, gpointer job_data, gint priority){ register JobQueue_Task *task = g_new(JobQueue_Task, 1); task->job_func = job_func; task->job_data = job_data; task->priority = priority; return task; } static void JobQueue_Task_destroy(JobQueue_Task *task){ g_free(task); return; } static void *JobQueue_thread_func(void *data){ register JobQueue *jq = data; register JobQueue_Task *task; do { pthread_mutex_lock(&jq->queue_lock); task = PQueue_pop(jq->pq); if(task){ jq->running_count++; pthread_mutex_unlock(&jq->queue_lock); task->job_func(task->job_data); JobQueue_Task_destroy(task); pthread_mutex_lock(&jq->queue_lock); jq->running_count--; pthread_mutex_unlock(&jq->queue_lock); } else { pthread_mutex_unlock(&jq->queue_lock); usleep(1000); /* wait before checking job queue again */ } } while(!jq->is_complete); return NULL; } static gboolean JobQueue_Task_compare(gpointer low, gpointer high, gpointer user_data){ register JobQueue_Task *low_task = (JobQueue_Task*)low, *high_task = (JobQueue_Task*)high; return low_task->priority < high_task->priority; } #endif /* USE_PTHREADS */ JobQueue *JobQueue_create(gint thread_total){ register JobQueue *jq = g_new(JobQueue, 1); #ifdef USE_PTHREADS register gint i; jq->is_complete = FALSE; jq->thread_total = thread_total; jq->running_count = 0; pthread_mutex_init(&jq->queue_lock, NULL); jq->thread = g_new0(pthread_t, thread_total); jq->pq_set = PQueueSet_create(); jq->pq = PQueue_create(jq->pq_set, JobQueue_Task_compare, NULL); for(i = 0; i < thread_total; i++) pthread_create(&jq->thread[i], NULL, JobQueue_thread_func, jq); #endif /* USE_PTHREADS */ return jq; } static void JobQueue_Task_free_func(gpointer data, gpointer user_data){ register JobQueue_Task *task = data; JobQueue_Task_destroy(task); return; } void JobQueue_destroy(JobQueue *jq){ #ifdef USE_PTHREADS g_assert(!PQueue_total(jq->pq)); PQueue_destroy(jq->pq, JobQueue_Task_free_func, NULL); PQueueSet_destroy(jq->pq_set); pthread_mutex_destroy(&jq->queue_lock); g_free(jq->thread); #endif /* USE_PTHREADS */ g_free(jq); return; } void JobQueue_submit(JobQueue *jq, JobQueue_Func job_func, gpointer job_data, gint priority){ #ifdef USE_PTHREADS register JobQueue_Task *task; g_assert(jq); pthread_mutex_lock(&jq->queue_lock); task = JobQueue_Task_create(job_func, job_data, priority); PQueue_push(jq->pq, task); pthread_mutex_unlock(&jq->queue_lock); #else /* USE_PTHREADS */ jq->job_func(job_data); /* when no threads available, just run the job */ #endif /* USE_PTHREADS */ return; } /* waits until less than (thread_total * 2) jobs are queued */ void JobQueue_complete(JobQueue *jq){ #ifdef USE_PTHREADS register gint i, job_total; do { /* wait for job queue to empty */ pthread_mutex_lock(&jq->queue_lock); job_total = PQueue_total(jq->pq) + jq->running_count; pthread_mutex_unlock(&jq->queue_lock); usleep(1000); /* wait before checking if queue is empty again */ } while(job_total); jq->is_complete = TRUE; /* wait for threads to finish */ for(i = 0; i < jq->thread_total; i++) pthread_join(jq->thread[i], NULL); #endif /* USE_PTHREADS */ return; } /* waits until the queue is empty and no jobs are running, * to catch any jobs which are added to the queue by runnning jobs */ /**/ exonerate-2.4.0/src/general/threadref.test.c0000644000175000017500000000227211222150143015714 00000000000000/****************************************************************\ * * * Basic library for thread-safe reference counting. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "threadref.h" int main(void){ register ThreadRef *tr = ThreadRef_create(), *tr2; tr2 = ThreadRef_share(tr); ThreadRef_destroy(tr2); ThreadRef_lock(tr); ThreadRef_unlock(tr); ThreadRef_destroy(tr); return 0; } exonerate-2.4.0/src/general/threadref.c0000644000175000017500000000433211222434635014750 00000000000000/****************************************************************\ * * * Basic library for thread-safe reference counting. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "threadref.h" ThreadRef *ThreadRef_create(void){ register ThreadRef *threadref = g_new(ThreadRef, 1); threadref->ref_count = 1; #ifdef USE_PTHREADS pthread_mutex_init(&threadref->ref_lock, NULL); #endif /* USE_PTHREADS */ return threadref; } ThreadRef *ThreadRef_destroy(ThreadRef *threadref){ ThreadRef_lock(threadref); if(--threadref->ref_count){ ThreadRef_unlock(threadref); return threadref; } ThreadRef_unlock(threadref); #ifdef USE_PTHREADS pthread_mutex_destroy(&threadref->ref_lock); #endif /* USE_PTHREADS */ g_free(threadref); return NULL; } ThreadRef *ThreadRef_share(ThreadRef *threadref){ ThreadRef_lock(threadref); threadref->ref_count++; ThreadRef_unlock(threadref); return threadref; } gint ThreadRef_get_count(ThreadRef *threadref){ register gint ref_count; ThreadRef_lock(threadref); ref_count = threadref->ref_count; ThreadRef_unlock(threadref); return ref_count; } void ThreadRef_lock(ThreadRef *threadref){ #ifdef USE_PTHREADS pthread_mutex_lock(&threadref->ref_lock); #endif /* USE_PTHREADS */ return; } void ThreadRef_unlock(ThreadRef *threadref){ #ifdef USE_PTHREADS pthread_mutex_unlock(&threadref->ref_lock); #endif /* USE_PTHREADS */ return; } exonerate-2.4.0/src/general/argument.h0000644000175000017500000000776311162714300014637 00000000000000/****************************************************************\ * * * Library for command line argument processing * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_ARGUMENT_H #define INCLUDED_ARGUMENT_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include /* For CHAR_BIT */ #include #ifndef ALPHABETSIZE #define ALPHABETSIZE (1< to '\0' if a single-letter flag is not required. * * If no handler is supplied, the argument is assumed to be a list. */ /* Some common handlers */ gchar *Argument_parse_string(gchar *arg_string, gpointer data); gchar *Argument_parse_char(gchar *arg_string, gpointer data); gchar *Argument_parse_int(gchar *arg_string, gpointer data); gchar *Argument_parse_float(gchar *arg_string, gpointer data); gchar *Argument_parse_boolean(gchar *arg_string, gpointer data); void Argument_add_cleanup(Argument *arg, Argument_Cleanup_Func cleanup_func, gpointer user_data); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_ARGUMENT_H */ exonerate-2.4.0/src/general/lineparse.h0000644000175000017500000000272311216414241014767 00000000000000/****************************************************************\ * * * Library for basic line-by-line parsing of text files. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_LINEPARSE_H #define INCLUDED_LINEPARSE_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include typedef struct { FILE *fp; GString *line; GPtrArray *word; } LineParse; LineParse *LineParse_create(FILE *fp); void LineParse_destroy(LineParse *lp); gint LineParse_line(LineParse *lp); gint LineParse_word(LineParse *lp); gint LineParse_seek(LineParse *lp, gchar *pattern); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_LINEPARSE_H */ exonerate-2.4.0/src/general/compoundfile.h0000644000175000017500000001273311200067374015477 00000000000000/****************************************************************\ * * * Library for reading large and/or split files. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_COMPOUNDFILE_H #define INCLUDED_COMPOUNDFILE_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include #include /* For lseek() and off_t */ #include /* For lseek() and off_t */ #include /* For PRIuPTR */ #ifdef USE_PTHREADS #include #endif /* USE_PTHREADS */ #define COMPOUND_FILE_BUFFER_SIZE BUFSIZ /* Using BUFSIZ from stdio.h, although I'm not sure * why the stdio buffered IO with getc isn't just as * fast as this implementation. */ /* G_GNUC_EXTENSION is not defined on all platforms */ #ifdef G_GNUC_EXTENSION #define CompoundFile_Pragma G_GNUC_EXTENSION #else /* G_GNUC_EXTENSION */ #define CompoundFile_Pragma #endif /* G_GNUC_EXTENSION */ typedef off_t CompoundFile_Pos; #if __STDC_VERSION__ >= 199901L #define CompoundFile_Pos_print(fp, pos) \ CompoundFile_Pragma fprintf((fp), "%jd", (intmax_t)(pos)) typedef intmax_t CompoundFile_Pos_scan_type; #define CompoundFile_Pos_scan(src, dst) \ CompoundFile_Pragma sscanf((src), "%jd", (intmax_t*)(dst)) #else /* __STDC_VERSION__ */ #define CompoundFile_Pos_print(fp, pos) \ CompoundFile_Pragma fprintf((fp), "%lld", (long long int)(pos)) typedef long long int CompoundFile_Pos_scan_type; #define CompoundFile_Pos_scan(src, dst) \ CompoundFile_Pragma sscanf((src), "%lld", (dst)) #endif /* __STDC_VERSION__ */ typedef struct { gchar *path; CompoundFile_Pos length; } CompoundFile_Element; typedef struct { guint ref_count; GPtrArray *element_list; FILE *fp; gint curr_element_id; gchar in_buffer[COMPOUND_FILE_BUFFER_SIZE]; gint in_buffer_full; gint in_buffer_pos; CompoundFile_Pos in_buffer_start; struct CompoundFile_Location *start_limit; struct CompoundFile_Location *stop_limit; #ifdef USE_PTHREADS pthread_mutex_t compoundfile_mutex; #endif /* USE_PTHREADS */ } CompoundFile; CompoundFile *CompoundFile_create(GPtrArray *path_list, gboolean sort_on_file_size); void CompoundFile_destroy(CompoundFile *cf); CompoundFile *CompoundFile_share(CompoundFile *cf); CompoundFile *CompoundFile_dup(CompoundFile *cf); gchar *CompoundFile_current_path(CompoundFile *cf); #define CompoundFile_buffer_is_empty(cf) \ ((cf)->in_buffer_pos == (cf)->in_buffer_full) #define CompoundFile_getc(cf) \ (CompoundFile_buffer_is_empty(cf) \ ? CompoundFile_buffer_reload(cf) \ : (cf)->in_buffer[(cf)->in_buffer_pos++]) gint CompoundFile_buffer_reload(CompoundFile *cf); /* This should only be called by the CompoundFile_getc() macro */ #define CompoundFile_ftell(cf) \ ((cf)->in_buffer_start + (cf)->in_buffer_pos) /* Alternatively, could use lseek(fileno(cf->fp), 0, SEEK_CUR) */ gboolean CompoundFile_is_finished(CompoundFile *cf); void CompoundFile_rewind(CompoundFile *cf); void CompoundFile_seek(CompoundFile *cf, CompoundFile_Pos pos); CompoundFile_Pos CompoundFile_get_length(CompoundFile *cf); CompoundFile_Pos CompoundFile_get_max_element_length(CompoundFile *cf); gint CompoundFile_read(CompoundFile *cf, gchar *buf, gint length); /**/ typedef struct CompoundFile_Location { gint ref_count; CompoundFile *cf; CompoundFile_Pos pos; gint element_id; } CompoundFile_Location; CompoundFile_Location *CompoundFile_Location_create(CompoundFile *cf, CompoundFile_Pos pos, gint element_id); CompoundFile_Location *CompoundFile_Location_tell(CompoundFile *cf); void CompoundFile_Location_destroy( CompoundFile_Location *cfl); CompoundFile_Location *CompoundFile_Location_share( CompoundFile_Location *cfl); void CompoundFile_Location_seek( CompoundFile_Location *cfl); /**/ CompoundFile_Location *CompoundFile_Location_from_pos( CompoundFile *cf, CompoundFile_Pos pos); /* Converts a position (in compound file coordinates) to a location. */ void CompoundFile_set_limits(CompoundFile *cf, CompoundFile_Location *start, CompoundFile_Location *stop); /* Sets limits and rewinds truncated file */ /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_COMPOUNDFILE_H */ exonerate-2.4.0/src/general/socket.h0000644000175000017500000000626411174044751014311 00000000000000/****************************************************************\ * * * Simple client-server code library * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_SOCKET_H #define INCLUDED_SOCKET_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #ifdef USE_PTHREADS #include #endif /* USE_PTHREADS */ typedef gboolean SocketProcessFunc(gchar *msg, gchar **reply, gpointer connection_data, gpointer user_data); /* Return FALSE to close the connection * If reply is set, it will be freed by the server. */ typedef gpointer SocketConnectionOpenFunc(gpointer user_data); typedef void SocketConnectionCloseFunc(gpointer connection_data, gpointer user_data); typedef struct { int sock; gchar *host; gint port; } SocketConnection; typedef struct { SocketConnection *connection; #ifdef USE_PTHREADS pthread_mutex_t connection_mutex; #endif /* USE_PTHREADS */ } SocketClient; #ifdef USE_PTHREADS typedef struct { gboolean in_use; struct SocketServer *server; int msgsock; pthread_t thread; } SocketServer_pthread_Data; #endif /* USE_PTHREADS */ typedef struct SocketServer { SocketConnection *connection; SocketProcessFunc *server_process_func; SocketConnectionOpenFunc *connection_open_func; SocketConnectionCloseFunc *connection_close_func; gpointer user_data; gint max_connections; #ifdef USE_PTHREADS SocketServer_pthread_Data *sspd; pthread_mutex_t connection_mutex; #endif /* USE_PTHREADS */ } SocketServer; SocketClient *SocketClient_create(gchar *host, gint port); gchar *SocketClient_send(SocketClient *client, gchar *msg); void SocketClient_destroy(SocketClient *client); SocketServer *SocketServer_create(gint port, gint max_connections, SocketProcessFunc server_process_func, SocketConnectionOpenFunc connection_open_func, SocketConnectionCloseFunc connection_close_func, gpointer user_data); gboolean SocketServer_listen(SocketServer *server); void SocketServer_destroy(SocketServer *server); /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_SOCKET_H */ exonerate-2.4.0/src/general/jobqueue.h0000644000175000017500000000376011176347137014644 00000000000000/****************************************************************\ * * * Library for Queueing Multi-Threaded Jobs * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_JOBQUEUE_H #define INCLUDED_JOBQUEUE_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #ifdef USE_PTHREADS #include #endif /* USE_PTHREADS */ #include "pqueue.h" typedef void (*JobQueue_Func)(gpointer job_data); typedef struct JobQueue_Task { JobQueue_Func job_func; gpointer job_data; gint priority; } JobQueue_Task; typedef struct { #ifdef USE_PTHREADS gboolean is_complete; gint thread_total; gint running_count; pthread_mutex_t queue_lock; pthread_t *thread; PQueue *pq; PQueueSet *pq_set; #endif /* USE_PTHREADS */ } JobQueue; JobQueue *JobQueue_create(gint thread_total); void JobQueue_destroy(JobQueue *jq); void JobQueue_submit(JobQueue *jq, JobQueue_Func job_func, gpointer job_data, gint priority); void JobQueue_complete(JobQueue *jq); /* Lower priority jobs are run first */ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_JOBQUEUE_H */ /**/ exonerate-2.4.0/src/general/threadref.h0000644000175000017500000000317611222647625014767 00000000000000/****************************************************************\ * * * Basic library for thread-safe reference counting. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_THREADREF_H #define INCLUDED_THREADREF_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #ifdef USE_PTHREADS #include #endif /* USE_PTHREADS */ typedef struct { gint ref_count; #ifdef USE_PTHREADS pthread_mutex_t ref_lock; #endif /* USE_PTHREADS */ } ThreadRef; ThreadRef *ThreadRef_create(void); ThreadRef *ThreadRef_destroy(ThreadRef *threadref); ThreadRef *ThreadRef_share(ThreadRef *threadref); gint ThreadRef_get_count(ThreadRef *threadref); void ThreadRef_lock(ThreadRef *threadref); void ThreadRef_unlock(ThreadRef *threadref); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_THREADREF_H */ exonerate-2.4.0/src/sequence/0000777000175000017500000000000011222703703013110 500000000000000exonerate-2.4.0/src/sequence/Makefile.am0000644000175000017500000000231610616363167015076 00000000000000 TESTS = sequence.test submat.test translate.test splice.test \ alphabet.test codonsubmat.test noinst_PROGRAMS = $(TESTS) INCLUDES = -I$(top_srcdir)/src/struct \ -I$(top_srcdir)/src/general noinst_HEADERS = sequence.h submat.h translate.h splice.h alphabet.h \ codonsubmat.h splice_test_SOURCES = splice.test.c splice.c SPLICE_OBJ = $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ -lm splice_test_LDADD = $(SPLICE_OBJ) sequence_test_SOURCES = sequence.test.c sequence.c alphabet.c translate.c \ splice.c sequence_test_LDADD = $(SPLICE_OBJ) submat_test_SOURCES = submat.test.c submat.c translate_test_SOURCES = translate.test.c translate.c translate_test_LDADD = $(top_srcdir)/src/general/argument.o alphabet_test_SOURCES = alphabet.test.c alphabet.c alphabet_test_LDADD = $(top_srcdir)/src/general/argument.o codonsubmat_test_SOURCES = codonsubmat.test.c codonsubmat.c codonsubmat_test_LDADD = $(top_srcdir)/src/general/argument.o -lm # Files to clear away MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/src/sequence/Makefile.in0000644000175000017500000002565211222703646015111 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ TESTS = sequence.test submat.test translate.test splice.test alphabet.test codonsubmat.test noinst_PROGRAMS = $(TESTS) INCLUDES = -I$(top_srcdir)/src/struct -I$(top_srcdir)/src/general noinst_HEADERS = sequence.h submat.h translate.h splice.h alphabet.h codonsubmat.h splice_test_SOURCES = splice.test.c splice.c SPLICE_OBJ = $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/struct/sparsecache.o $(top_srcdir)/src/general/argument.o $(top_srcdir)/src/general/lineparse.o -lm splice_test_LDADD = $(SPLICE_OBJ) sequence_test_SOURCES = sequence.test.c sequence.c alphabet.c translate.c splice.c sequence_test_LDADD = $(SPLICE_OBJ) submat_test_SOURCES = submat.test.c submat.c translate_test_SOURCES = translate.test.c translate.c translate_test_LDADD = $(top_srcdir)/src/general/argument.o alphabet_test_SOURCES = alphabet.test.c alphabet.c alphabet_test_LDADD = $(top_srcdir)/src/general/argument.o codonsubmat_test_SOURCES = codonsubmat.test.c codonsubmat.c codonsubmat_test_LDADD = $(top_srcdir)/src/general/argument.o -lm # Files to clear away MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = PROGRAMS = $(noinst_PROGRAMS) DEFS = @DEFS@ -I. -I$(srcdir) CPPFLAGS = @CPPFLAGS@ LDFLAGS = @LDFLAGS@ LIBS = @LIBS@ sequence_test_OBJECTS = sequence.test.o sequence.o alphabet.o \ translate.o splice.o sequence_test_DEPENDENCIES = $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o sequence_test_LDFLAGS = submat_test_OBJECTS = submat.test.o submat.o submat_test_LDADD = $(LDADD) submat_test_DEPENDENCIES = submat_test_LDFLAGS = translate_test_OBJECTS = translate.test.o translate.o translate_test_DEPENDENCIES = $(top_srcdir)/src/general/argument.o translate_test_LDFLAGS = splice_test_OBJECTS = splice.test.o splice.o splice_test_DEPENDENCIES = $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o splice_test_LDFLAGS = alphabet_test_OBJECTS = alphabet.test.o alphabet.o alphabet_test_DEPENDENCIES = $(top_srcdir)/src/general/argument.o alphabet_test_LDFLAGS = codonsubmat_test_OBJECTS = codonsubmat.test.o codonsubmat.o codonsubmat_test_DEPENDENCIES = $(top_srcdir)/src/general/argument.o codonsubmat_test_LDFLAGS = CFLAGS = @CFLAGS@ COMPILE = $(CC) $(DEFS) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) CCLD = $(CC) LINK = $(CCLD) $(AM_CFLAGS) $(CFLAGS) $(LDFLAGS) -o $@ HEADERS = $(noinst_HEADERS) DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best SOURCES = $(sequence_test_SOURCES) $(submat_test_SOURCES) $(translate_test_SOURCES) $(splice_test_SOURCES) $(alphabet_test_SOURCES) $(codonsubmat_test_SOURCES) OBJECTS = $(sequence_test_OBJECTS) $(submat_test_OBJECTS) $(translate_test_OBJECTS) $(splice_test_OBJECTS) $(alphabet_test_OBJECTS) $(codonsubmat_test_OBJECTS) all: all-redirect .SUFFIXES: .SUFFIXES: .S .c .o .s $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps src/sequence/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status mostlyclean-noinstPROGRAMS: clean-noinstPROGRAMS: -test -z "$(noinst_PROGRAMS)" || rm -f $(noinst_PROGRAMS) distclean-noinstPROGRAMS: maintainer-clean-noinstPROGRAMS: .c.o: $(COMPILE) -c $< .s.o: $(COMPILE) -c $< .S.o: $(COMPILE) -c $< mostlyclean-compile: -rm -f *.o core *.core clean-compile: distclean-compile: -rm -f *.tab.c maintainer-clean-compile: sequence.test: $(sequence_test_OBJECTS) $(sequence_test_DEPENDENCIES) @rm -f sequence.test $(LINK) $(sequence_test_LDFLAGS) $(sequence_test_OBJECTS) $(sequence_test_LDADD) $(LIBS) submat.test: $(submat_test_OBJECTS) $(submat_test_DEPENDENCIES) @rm -f submat.test $(LINK) $(submat_test_LDFLAGS) $(submat_test_OBJECTS) $(submat_test_LDADD) $(LIBS) translate.test: $(translate_test_OBJECTS) $(translate_test_DEPENDENCIES) @rm -f translate.test $(LINK) $(translate_test_LDFLAGS) $(translate_test_OBJECTS) $(translate_test_LDADD) $(LIBS) splice.test: $(splice_test_OBJECTS) $(splice_test_DEPENDENCIES) @rm -f splice.test $(LINK) $(splice_test_LDFLAGS) $(splice_test_OBJECTS) $(splice_test_LDADD) $(LIBS) alphabet.test: $(alphabet_test_OBJECTS) $(alphabet_test_DEPENDENCIES) @rm -f alphabet.test $(LINK) $(alphabet_test_LDFLAGS) $(alphabet_test_OBJECTS) $(alphabet_test_LDADD) $(LIBS) codonsubmat.test: $(codonsubmat_test_OBJECTS) $(codonsubmat_test_DEPENDENCIES) @rm -f codonsubmat.test $(LINK) $(codonsubmat_test_LDFLAGS) $(codonsubmat_test_OBJECTS) $(codonsubmat_test_LDADD) $(LIBS) tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = src/sequence distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done check-TESTS: $(TESTS) @failed=0; all=0; \ srcdir=$(srcdir); export srcdir; \ for tst in $(TESTS); do \ if test -f $$tst; then dir=.; \ else dir="$(srcdir)"; fi; \ if $(TESTS_ENVIRONMENT) $$dir/$$tst; then \ all=`expr $$all + 1`; \ echo "PASS: $$tst"; \ elif test $$? -ne 77; then \ all=`expr $$all + 1`; \ failed=`expr $$failed + 1`; \ echo "FAIL: $$tst"; \ fi; \ done; \ if test "$$failed" -eq 0; then \ banner="All $$all tests passed"; \ else \ banner="$$failed of $$all tests failed"; \ fi; \ dashes=`echo "$$banner" | sed s/./=/g`; \ echo "$$dashes"; \ echo "$$banner"; \ echo "$$dashes"; \ test "$$failed" -eq 0 info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am $(MAKE) $(AM_MAKEFLAGS) check-TESTS check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall: uninstall-am all-am: Makefile $(PROGRAMS) $(HEADERS) all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-noinstPROGRAMS mostlyclean-compile \ mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-noinstPROGRAMS clean-compile clean-tags clean-generic \ mostlyclean-am clean: clean-am distclean-am: distclean-noinstPROGRAMS distclean-compile distclean-tags \ distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-noinstPROGRAMS \ maintainer-clean-compile maintainer-clean-tags \ maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: mostlyclean-noinstPROGRAMS distclean-noinstPROGRAMS \ clean-noinstPROGRAMS maintainer-clean-noinstPROGRAMS \ mostlyclean-compile distclean-compile clean-compile \ maintainer-clean-compile tags mostlyclean-tags distclean-tags \ clean-tags maintainer-clean-tags distdir check-TESTS info-am info \ dvi-am dvi check check-am installcheck-am installcheck install-exec-am \ install-exec install-data-am install-data install-am install \ uninstall-am uninstall all-redirect all-am all installdirs \ mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/src/sequence/sequence.test.c0000644000175000017500000000355611162714340015771 00000000000000/****************************************************************\ * * * Simple Sequence Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "sequence.h" gint Argument_main(Argument *arg){ register Alphabet *alphabet = Alphabet_create(Alphabet_Type_DNA, TRUE); register gchar *seq = "CGTATGACGtagctagctGCAGATAGC"; register Sequence *s = Sequence_create("testseq", NULL, seq, 0, Sequence_Strand_FORWARD, alphabet); register Sequence *s2, *s3, *s4; register gchar *result; register Translate *translate = Translate_create(FALSE); s2 = Sequence_revcomp(s); s3 = Sequence_translate(s2, translate, 1); s4 = Sequence_mask(s3); /**/ result = Sequence_get_str(s4); g_message("result [%s]", result); g_free(result); /**/ Sequence_destroy(s); Sequence_destroy(s2); Sequence_destroy(s3); Sequence_destroy(s4); Alphabet_destroy(alphabet); Translate_destroy(translate); return 0; } exonerate-2.4.0/src/sequence/sequence.c0000644000175000017500000007165311222343603015013 00000000000000/****************************************************************\ * * * Simple Sequence Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For fopen() */ #include /* For strlen() */ #include /* For toupper() */ #include /* For atoi() */ #include "sequence.h" #include "lineparse.h" #include "translate.h" static Sequence_Annotation *Sequence_Annotation_create(gchar *id, Sequence_Strand strand, gint cds_start, gint cds_length){ register Sequence_Annotation *annotation = g_new(Sequence_Annotation, 1); annotation->id = g_strdup(id); annotation->strand = strand; annotation->cds_start = cds_start; annotation->cds_length = cds_length; return annotation; } static void Sequence_Annotation_destroy( Sequence_Annotation *annotation){ g_free(annotation->id); g_free(annotation); return; } static gint Sequence_Annotation_compare(gconstpointer a, gconstpointer b){ register gchar *id_a = (gchar*)a, *id_b = (gchar*)b; return strcmp(id_a, id_b); } static GTree *Sequence_create_annotation_tree(gchar *path){ register GTree *tree = g_tree_new(Sequence_Annotation_compare); register FILE *fp = fopen(path, "r"); register LineParse *lp; register gchar *id, strand_char; register gint cds_start = 0, cds_length = 0; register Sequence_Strand strand; register Sequence_Annotation *annotation; /**/ if(!fp) g_error("Could not open annotation file [%s]", path); lp = LineParse_create(fp); while(LineParse_word(lp) != EOF){ if((lp->word->len == 2) || (lp->word->len == 4)){ id = lp->word->pdata[0]; strand_char = *((gchar*)(lp->word->pdata[1])); strand = (strand_char == '+') ? Sequence_Strand_FORWARD : ((strand_char == '-') ? Sequence_Strand_REVCOMP : Sequence_Strand_UNKNOWN); if(strand == Sequence_Strand_UNKNOWN) g_error("Strand unknown for [%s]", id); if(lp->word->len == 4){ cds_start = atoi(lp->word->pdata[2]) - 1; cds_length = atoi(lp->word->pdata[3]); } annotation = Sequence_Annotation_create(id, strand, cds_start, cds_length); g_assert(!g_tree_lookup(tree, annotation->id)); g_tree_insert(tree, annotation->id, annotation); } } LineParse_destroy(lp); fclose(fp); return tree; } static gint Sequence_destroy_annotation_data(gpointer key, gpointer value, gpointer data){ register Sequence_Annotation *annotation = value; Sequence_Annotation_destroy(annotation); return FALSE; } static void Sequence_destroy_annotation_tree(GTree *tree){ g_tree_traverse(tree, Sequence_destroy_annotation_data, G_IN_ORDER, NULL); g_tree_destroy(tree); return; } static void Sequence_Argument_cleanup(gpointer user_data){ register Sequence_ArgumentSet *sas = user_data; if(sas->annotation_tree){ Sequence_destroy_annotation_tree(sas->annotation_tree); sas->annotation_tree = NULL; } return; } Sequence_ArgumentSet *Sequence_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static Sequence_ArgumentSet sas = {NULL}; if(arg){ as = ArgumentSet_create("Sequence Options"); ArgumentSet_add_option(as, 'A', "annotation", "path", "Path to sequence annotation file", "none", Argument_parse_string, &sas.annotation_path); Argument_absorb_ArgumentSet(arg, as); Argument_add_cleanup(arg, Sequence_Argument_cleanup, &sas); } else { if(sas.annotation_path && strcmp(sas.annotation_path, "none") && (!sas.annotation_tree)) sas.annotation_tree = Sequence_create_annotation_tree(sas.annotation_path); } return &sas; } /**/ static gboolean Sequence_is_valid(Sequence *seq){ register gint i, ch; g_assert(seq); g_assert(seq->id); if(seq->alphabet->type == Alphabet_Type_PROTEIN){ g_assert(seq->strand == Sequence_Strand_UNKNOWN); } else if(seq->alphabet->type == Alphabet_Type_UNKNOWN){ g_assert(seq->strand != Sequence_Strand_FORWARD); g_assert(seq->strand != Sequence_Strand_REVCOMP); } if(seq->data) for(i = 0; i < seq->len; i++){ ch = Sequence_get_symbol(seq, i); if(!Alphabet_symbol_is_valid(seq->alphabet, ch)) g_warning("Invalid symbol [%c](%d) at [%s][%d]", ch, ch, seq->id, i); g_assert(Alphabet_symbol_is_valid(seq->alphabet, ch)); } return TRUE; } static void Sequence_IntMemory_data_destroy(gpointer data){ g_free(data); return; } static gint Sequence_IntMemory_get_symbol(gpointer data, gint pos){ register gchar *seq = data; return seq[pos]; } static Sequence *Sequence_create_internal(gchar *id, gchar *def, guint len, Sequence_Strand strand, Alphabet *alphabet){ register Sequence *s = g_new0(Sequence, 1); register Sequence_ArgumentSet *sas = Sequence_ArgumentSet_create(NULL); s->ref_count = 1; if(id) s->id = g_strdup(id); if(def) s->def = g_strdup(def); s->strand = strand; if(alphabet) s->alphabet = Alphabet_share(alphabet); else s->alphabet = Alphabet_create(Alphabet_Type_UNKNOWN, FALSE); s->annotation = sas->annotation_tree ? g_tree_lookup(sas->annotation_tree, s->id) : NULL; s->len = len; #ifdef USE_PTHREADS pthread_mutex_init(&s->seq_lock, NULL); #endif /* USE_PTHREADS */ return s; } static void Sequence_ExtMemory_data_destroy(gpointer data){ register SparseCache *cache = data; SparseCache_destroy(cache); return; } static gint Sequence_ExtMemory_get_symbol(gpointer data, gint pos){ register SparseCache *cache = data; return GPOINTER_TO_INT(SparseCache_get(cache, pos)); } Sequence *Sequence_create(gchar *id, gchar *def, gchar *seq, guint len, Sequence_Strand strand, Alphabet *alphabet){ register Sequence *s = Sequence_create_internal(id, def, len, strand, alphabet); s->type = Sequence_Type_INTMEM; s->get_symbol = Sequence_IntMemory_get_symbol; if(seq){ s->len = len ? len : strlen(seq); g_assert((!len) || (s->len == strlen(seq))); s->data = g_strndup(seq, s->len); } else { s->len = len; s->data = NULL; } g_assert(Sequence_is_valid(s)); return s; } Sequence *Sequence_create_extmem(gchar *id, gchar *def, guint len, Sequence_Strand strand, Alphabet *alphabet, SparseCache *cache){ register Sequence *s = Sequence_create_internal(id, def, len, strand, alphabet); s->type = Sequence_Type_EXTMEM; s->get_symbol = Sequence_ExtMemory_get_symbol; s->ref_count = 1; s->data = SparseCache_share(cache); return s; } void Sequence_preload_extmem(Sequence *s){ register gint i; if(s->type != Sequence_Type_EXTMEM) return; for(i = 0; i < s->len; i += SparseCache_PAGE_SIZE) Sequence_get_symbol(s, i); return; } Sequence *Sequence_share(Sequence *s){ g_assert(s); Sequence_lock(s); g_assert(s->ref_count); s->ref_count++; Sequence_unlock(s); return s; } typedef struct { Sequence *sequence; guint start; } Sequence_Subseq; static void Sequence_Subseq_data_destroy(gpointer data){ register Sequence_Subseq *subseq = data; Sequence_destroy(subseq->sequence); g_free(subseq); return; } static gint Sequence_Subseq_get_symbol(gpointer data, gint pos){ register Sequence_Subseq *subseq = data; return Sequence_get_symbol(subseq->sequence, subseq->start+pos); } Sequence *Sequence_subseq(Sequence *s, guint start, guint length){ register Sequence *ns; register Sequence_Subseq *subseq; g_assert(s); g_assert(s->data); g_assert(s->len); g_assert(start < s->len); g_assert((start+length) <= s->len); if((start == 0) && (s->len == length)) return Sequence_share(s); ns = Sequence_create(s->id, s->def, NULL, 0, s->strand, s->alphabet); g_free(ns->id); ns->id = g_strdup_printf("%s:subseq(%d,%d)", s->id, start, length); subseq = g_new(Sequence_Subseq, 1); subseq->sequence = Sequence_share(s); subseq->start = start; ns->type = Sequence_Type_SUBSEQ; ns->get_symbol = Sequence_Subseq_get_symbol; ns->data = subseq; ns->len = length; return ns; } /**/ gint Sequence_print_fasta_block(Sequence *s, FILE *fp){ register gint i, pos = 0, pause, width = 70, total = s->len; if(s->len){ pos = 0; pause = pos+s->len-width; while(pos < pause){ for(i = 0; i < width; i++) fputc(Sequence_get_symbol(s, pos+i), fp); fputc('\n', fp); pos += width; total++; } for(i = pos; i < s->len; i++) fputc(Sequence_get_symbol(s, i), fp); fputc('\n', fp); total++; } #if 0 /* FIXME: optimisation: use this for Sequence_Mode_IntMemory */ register gchar *ptr, *pause; /**/ if(s->seq){ ptr = s->seq; pause = ptr+s->len-width; while(ptr < pause){ ptr += fwrite(ptr, sizeof(gchar), width, fp); fputc('\n', fp); } ptr += fwrite(ptr, sizeof(gchar), s->seq+s->len-ptr, fp); fputc('\n', fp); } #endif /* 0 */ return total; } /* FIXME: use fast version where possible * (Sequence_strncpy each buffer, then write() * */ void Sequence_print_fasta(Sequence *s, FILE *fp, gboolean show_info){ fprintf(fp, ">%s", s->id?s->id:"[unknown]"); if(s->def) fprintf(fp, " %s", s->def); if(show_info){ if(s->strand != Sequence_Strand_UNKNOWN) fprintf(fp, " [%s]", (s->strand == Sequence_Strand_FORWARD) ?"forward":"revcomp"); if(s->alphabet->type != Alphabet_Type_UNKNOWN) fprintf(fp, " [%s]", Alphabet_Type_get_name(s->alphabet->type)); if(s->len) fprintf(fp, " [length %d]", s->len); } fprintf(fp, "\n"); Sequence_print_fasta_block(s, fp); return; } static void Sequence_revcomp_data_destroy(gpointer data){ register Sequence *sequence = data; Sequence_destroy(sequence); return; } static gint Sequence_revcomp_get_symbol(gpointer data, gint pos){ register Sequence *sequence = data; register gint ch = Sequence_get_symbol(sequence, sequence->len-pos-1); return sequence->alphabet->complement[ch]; } Sequence_Strand Sequence_Strand_revcomp(Sequence_Strand strand){ g_assert((strand == Sequence_Strand_FORWARD) ||(strand == Sequence_Strand_REVCOMP)); return (strand == Sequence_Strand_FORWARD) ? Sequence_Strand_REVCOMP : Sequence_Strand_FORWARD; } void Sequence_revcomp_in_place(gchar *seq, guint length){ register guchar *a, *z, swap; register gint pos; register Alphabet *alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE); register guchar *complement = Alphabet_get_filter_by_type(alphabet, Alphabet_Filter_Type_COMPLEMENT); for(a = (guchar*)seq, z = (guchar*)seq+length-1; a < z; a++, z--){ swap = complement[*a]; *a = complement[*z]; *z = swap; } if(length & 1){ /* If odd length, complement the central base */ pos = length >> 1; seq[pos] = complement[(guchar)seq[pos]]; } Alphabet_destroy(alphabet); return; } /* FIXME: optimisation: should avoid repeated creation of alphabet. */ void Sequence_reverse_in_place(gchar *seq, guint length){ register guchar *a, *z, swap; for(a = (guchar*)seq, z = (guchar*)seq+length-1; a < z; a++, z--){ swap = *a; *a = *z; *z = swap; } return; } Sequence *Sequence_revcomp(Sequence *s){ register Sequence *ns; register Sequence_Strand strand; /* Prevent creation of revcomp(revcomp(seq)) */ if(s->type == Sequence_Type_REVCOMP) return Sequence_share((Sequence*)s->data); strand = Sequence_Strand_revcomp(s->strand); ns = Sequence_create(s->id, s->def, NULL, s->len, strand, s->alphabet); if(ns->def){ g_free(ns->def); ns->def = g_strdup_printf("%s:[revcomp]", s->def); } else { ns->def = g_strdup("[revcomp]"); } ns->data = Sequence_share(s); ns->type = Sequence_Type_REVCOMP; ns->get_symbol = Sequence_revcomp_get_symbol; return ns; } /**/ void Sequence_filter_in_place(gchar *seq, guint length, Alphabet *alphabet, Alphabet_Filter_Type filter_type){ register gint i; register const guchar *filter = Alphabet_get_filter_by_type(alphabet, filter_type); g_assert(filter); for(i = 0; i < length; i++){ g_assert(filter[(guchar)seq[i]] != '-'); seq[i] = filter[(guchar)seq[i]]; } return; } typedef struct { Sequence *sequence; guchar *filter; Alphabet_Filter_Type filter_type; } Sequence_Filter; static void Sequence_Filter_data_destroy(gpointer data){ register Sequence_Filter *sequence_filter = data; Sequence_destroy(sequence_filter->sequence); g_free(sequence_filter); return; } static gint Sequence_Filter_get_symbol(gpointer data, gint pos){ register Sequence_Filter *sequence_filter = data; return sequence_filter->filter [Sequence_get_symbol(sequence_filter->sequence, pos)]; } Sequence *Sequence_filter(Sequence *s, Alphabet_Filter_Type filter_type){ register Sequence *ns = Sequence_create(s->id, s->def, NULL, s->len, s->strand, s->alphabet); register Sequence_Filter *sequence_filter = g_new(Sequence_Filter, 1); g_assert(Alphabet_get_filter_by_type(s->alphabet, filter_type)); g_free(ns->id); ns->id = g_strdup_printf("%s:filter(%s)", s->id, Alphabet_Filter_Type_get_name(filter_type)); ns->data = sequence_filter; ns->type = Sequence_Type_FILTER; ns->get_symbol = Sequence_Filter_get_symbol; sequence_filter->sequence = Sequence_share(s); sequence_filter->filter = Alphabet_get_filter_by_type(s->alphabet, filter_type); sequence_filter->filter_type = filter_type; return ns; } /* FIXME: optimisaton : if sequence is already filtered, * use merged filter with original seq */ gint Sequence_checksum(Sequence *s){ register guint64 check = 0; register gint i, ch; register gchar *index = "--------------------------------------&---*---.-----------------" "@ABCDEFGHIJKLMNOPQRSTUVWXYZ------ABCDEFGHIJKLMNOPQRSTUVWXYZ---~-" "----------------------------------------------------------------" "----------------------------------------------------------------"; for(i = 0; i < s->len; i++){ ch = index[Sequence_get_symbol(s, i)]; if(ch != '-') check += ((i % 57) + 1) * ch; } return check % 10000; /* gcg checksum */ } /**/ typedef struct { Sequence *sequence; gint frame; Translate *translate; } Sequence_Translation; static void Sequence_Translation_data_destroy(gpointer data){ register Sequence_Translation *translation = data; Sequence_destroy(translation->sequence); Translate_destroy(translation->translate); g_free(translation); return; } static gint Sequence_translate_get_symbol(gpointer data, gint pos){ register Sequence_Translation *translation = data; register gint p = (pos*3)+(translation->frame-1); return Translate_base(translation->translate, Sequence_get_symbol(translation->sequence, p), Sequence_get_symbol(translation->sequence, p+1), Sequence_get_symbol(translation->sequence, p+2)); } Sequence *Sequence_translate(Sequence *s, Translate *translate, gint frame){ register Alphabet *protein_alphabet = Alphabet_create(Alphabet_Type_PROTEIN, FALSE); register Sequence *ts = Sequence_create(s->id, s->def, NULL, 0, Sequence_Strand_UNKNOWN, protein_alphabet); register Sequence_Translation *translation = g_new(Sequence_Translation, 1); g_assert((frame >= 1) && (frame <= 3)); if(ts->def){ g_free(ts->def); ts->def = g_strdup_printf("%s:[translate(%d)]", s->def, frame); } else { ts->def = g_strdup_printf("[translate(%d)]", frame); } translation->sequence = Sequence_share(s); translation->frame = frame; translation->translate = Translate_share(translate); ts->len = (s->len-(frame-1))/3; ts->data = translation; ts->type = Sequence_Type_TRANSLATE; ts->get_symbol = Sequence_translate_get_symbol; Alphabet_destroy(protein_alphabet); return ts; } /**/ #if 0 static void Sequence_print_type(Sequence *s){ register Sequence_Subseq *subseq; register Sequence *revcomp; register Sequence_Filter *filter; register Sequence_Translation *translation; switch(s->type){ case Sequence_Type_INTMEM: g_print("intmem"); /* s->data is seq */ break; case Sequence_Type_EXTMEM: g_print("extmem"); /* s->data is SparseCache */ break; case Sequence_Type_SUBSEQ: g_print("subseq:"); subseq = s->data; Sequence_print_type(subseq->sequence); break; case Sequence_Type_REVCOMP: g_print("revcomp:"); revcomp = s->data; Sequence_print_type(revcomp); break; case Sequence_Type_FILTER: g_print("filter:"); filter = s->data; Sequence_print_type(filter->sequence); break; case Sequence_Type_TRANSLATE: g_print("translate:"); translation = s->data; Sequence_print_type(translation->sequence); break; default: g_error("unknown sequence type [%d]", s->type); break; } return; } #endif /* 0 */ void Sequence_strncpy(Sequence *s, gint start, gint length, gchar *dst){ register gint i; register gchar *str; register Sequence_Subseq *subseq; register SparseCache *cache; g_assert(start >= 0); g_assert(s->len > 0); g_assert(start < s->len); g_assert((start+length) <= s->len); /* FIXME: g_warning("using slow implementation of [%s]", __FUNCTION__); */ #if 0 g_print("Sequence_strncpy ["); Sequence_print_type(s); g_print("] [%d] [%d] [%s]\n", start, length, s->id); #endif /* 0 */ switch(s->type){ case Sequence_Type_INTMEM: str = s->data; for(i = 0; i < length; i++) dst[i] = str[start+i]; return; case Sequence_Type_EXTMEM: cache = s->data; SparseCache_copy(cache, start, length, dst); break; case Sequence_Type_SUBSEQ: subseq = s->data; Sequence_strncpy(subseq->sequence, start+subseq->start, length, dst); return; default: for(i = 0; i < length; i++) dst[i] = Sequence_get_symbol(s, start+i); break; } return; } /* FIXME: optimisation : implement fast versions * for specific sequence types * target types: * subseq:extmem * intmem */ void Sequence_strcpy(Sequence *s, gchar *dst){ Sequence_strncpy(s, 0, s->len, dst); return; } gchar *Sequence_get_substr(Sequence *s, gint start, gint length){ register gchar *str = g_new(gchar, length+1); Sequence_strncpy(s, start, length, str); str[length] = '\0'; return str; } gchar *Sequence_get_str(Sequence *s){ return Sequence_get_substr(s, 0, s->len); } /* FIXME: optimisation: implement fast version * returning shared String for simple sequences */ /**/ void Sequence_destroy(Sequence *s){ g_assert(s); #ifdef USE_PTHREADS pthread_mutex_lock(&s->seq_lock); #endif /* USE_PTHREADS */ if(--s->ref_count){ #ifdef USE_PTHREADS pthread_mutex_unlock(&s->seq_lock); #endif /* USE_PTHREADS */ return; } #ifdef USE_PTHREADS pthread_mutex_unlock(&s->seq_lock); #endif /* USE_PTHREADS */ if(s->id) g_free(s->id); if(s->def) g_free(s->def); if(s->data){ switch(s->type){ case Sequence_Type_INTMEM: Sequence_IntMemory_data_destroy(s->data); break; case Sequence_Type_EXTMEM: Sequence_ExtMemory_data_destroy(s->data); break; case Sequence_Type_SUBSEQ: Sequence_Subseq_data_destroy(s->data); break; case Sequence_Type_REVCOMP: Sequence_revcomp_data_destroy(s->data); break; case Sequence_Type_FILTER: Sequence_Filter_data_destroy(s->data); break; case Sequence_Type_TRANSLATE: Sequence_Translation_data_destroy(s->data); break; default: g_error("Unknown Sequence type [%d]", s->type); break; } } Alphabet_destroy(s->alphabet); #ifdef USE_PTHREADS pthread_mutex_destroy(&s->seq_lock); #endif /* USE_PTHREADS */ g_free(s); return; } Sequence *Sequence_mask(Sequence *s){ register gboolean ok = TRUE; register Sequence *curr_seq = s, *new_seq, *prev_seq; register gint i; register GPtrArray *seq_list = g_ptr_array_new(); register Sequence_Subseq *seq_subseq; register Sequence_Filter *seq_filter; register Sequence_Translation *seq_translation; /* Find the base sequence */ do { switch(curr_seq->type){ case Sequence_Type_INTMEM: case Sequence_Type_EXTMEM: ok = FALSE; break; case Sequence_Type_SUBSEQ: seq_subseq = curr_seq->data; g_ptr_array_add(seq_list, curr_seq); curr_seq = seq_subseq->sequence; break; case Sequence_Type_REVCOMP: g_ptr_array_add(seq_list, curr_seq); curr_seq = curr_seq->data; break; case Sequence_Type_FILTER: seq_filter = curr_seq->data; g_ptr_array_add(seq_list, curr_seq); curr_seq = seq_filter->sequence; break; case Sequence_Type_TRANSLATE: seq_translation = curr_seq->data; g_ptr_array_add(seq_list, curr_seq); curr_seq = seq_translation->sequence; break; default: g_error("Unknown Sequence Type [%d]", curr_seq->type); break; } } while(ok); /* Apply masking filter to base sequence */ new_seq = Sequence_filter(curr_seq, Alphabet_Filter_Type_MASKED); /* Apply other transformations to filtered sequence copy */ for(i = seq_list->len-1; i >= 0; i--){ curr_seq = seq_list->pdata[i]; prev_seq = new_seq; switch(curr_seq->type){ case Sequence_Type_SUBSEQ: seq_subseq = curr_seq->data; new_seq = Sequence_subseq(prev_seq, seq_subseq->start, seq_subseq->sequence->len); break; case Sequence_Type_REVCOMP: new_seq = Sequence_revcomp(prev_seq); break; case Sequence_Type_FILTER: seq_filter = curr_seq->data; new_seq = Sequence_filter(prev_seq, seq_filter->filter_type); break; case Sequence_Type_TRANSLATE: seq_translation = curr_seq->data; new_seq = Sequence_translate(prev_seq, seq_translation->translate, seq_translation->frame); break; case Sequence_Type_INTMEM: case Sequence_Type_EXTMEM: g_error("impossible"); break; default: g_error("Unknown Sequence type"); break; } Sequence_destroy(prev_seq); } g_ptr_array_free(seq_list, TRUE); return new_seq; } gsize Sequence_memory_usage(Sequence *s){ register SparseCache *cache; register gsize data_memory = 0; switch(s->type){ case Sequence_Type_INTMEM: data_memory = sizeof(gchar)*s->len; break; case Sequence_Type_EXTMEM: cache = s->data; data_memory = SparseCache_memory_usage(cache); break; default: data_memory = 0; break; } return sizeof(Sequence) + sizeof(gchar)*strlen(s->id) + sizeof(gchar)*(s->def?strlen(s->def):0) + sizeof(Alphabet) + data_memory; } void Sequence_lock(Sequence *s){ register Sequence_Subseq *subseq; register Sequence_Filter *filter; register Sequence_Translation *translation; register Sequence *revcomp; g_assert(s); #ifdef USE_PTHREADS pthread_mutex_lock(&s->seq_lock); if(s->data){ switch(s->type){ case Sequence_Type_INTMEM: break; case Sequence_Type_EXTMEM: break; case Sequence_Type_SUBSEQ: subseq = s->data; Sequence_lock(subseq->sequence); break; case Sequence_Type_REVCOMP: revcomp = s->data; Sequence_lock(revcomp); break; case Sequence_Type_FILTER: filter = s->data; Sequence_lock(filter->sequence); break; case Sequence_Type_TRANSLATE: translation = s->data; Sequence_lock(translation->sequence); break; default: g_error("Unknown Sequence type [%d]", s->type); break; } } #endif /* USE_PTHREADS */ return; } void Sequence_unlock(Sequence *s){ register Sequence_Subseq *subseq; register Sequence_Filter *filter; register Sequence_Translation *translation; register Sequence *revcomp; g_assert(s); #ifdef USE_PTHREADS pthread_mutex_unlock(&s->seq_lock); if(s->data){ switch(s->type){ case Sequence_Type_INTMEM: break; case Sequence_Type_EXTMEM: break; case Sequence_Type_SUBSEQ: subseq = s->data; Sequence_unlock(subseq->sequence); break; case Sequence_Type_REVCOMP: revcomp = s->data; Sequence_unlock(revcomp); break; case Sequence_Type_FILTER: filter = s->data; Sequence_unlock(filter->sequence); break; case Sequence_Type_TRANSLATE: translation = s->data; Sequence_unlock(translation->sequence); break; default: g_error("Unknown Sequence type [%d]", s->type); break; } } #endif /* USE_PTHREADS */ return; } exonerate-2.4.0/src/sequence/alphabet.c0000644000175000017500000003775611176125725015004 00000000000000/****************************************************************\ * * * Simple Alphabet Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strlen() */ #include /* For toupper() */ #include "alphabet.h" gchar *Alphabet_Argument_parse_alphabet_type(gchar *arg_string, gpointer data){ register Alphabet_Type *dst_type = (Alphabet_Type*)data; gchar *type_str; register gchar *ret_val = Argument_parse_string(arg_string, &type_str); if(ret_val) return ret_val; (*dst_type) = Alphabet_name_get_type(type_str); return NULL; } Alphabet_Type Alphabet_Type_guess(gchar *str){ static const guchar *index = (guchar*) "----------------------------------------------------------------" "-A-C---G------N-----T------------A-C---G------N-----T-----------" "----------------------------------------------------------------" "----------------------------------------------------------------"; register gint count = 0, len; register Alphabet_Type type; for(len = 0; str[len]; len++) if(index[(guchar)str[len]] != '-') count++; if((20.0*(gdouble)count) >= (17.0*(gdouble)len)) type = Alphabet_Type_DNA; else type = Alphabet_Type_PROTEIN; return type; } Alphabet_ArgumentSet *Alphabet_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static Alphabet_ArgumentSet aas = {TRUE}; if(arg){ as = ArgumentSet_create("Alphabet Options"); ArgumentSet_add_option(as, '\0', "useaatla", NULL, "Use three-letter abbreviation for AA names", "TRUE", Argument_parse_boolean, &aas.use_aa_tla); Argument_absorb_ArgumentSet(arg, as); } return &aas; } /**/ #define Alphabet_Filter_macro(asterisk, upper_case, lower_case) \ (guchar*) ("------------------------------------------" \ asterisk "---------------------" \ "-" upper_case "------" lower_case "-----" \ "----------------------------------------------------------------" \ "----------------------------------------------------------------") typedef enum { Alphabet_Filter_VALID_SEQ_SYMBOL, Alphabet_Filter_VALID_DNA_IUPAC, Alphabet_Filter_VALID_DNA_ACGTN, Alphabet_Filter_VALID_PROTEIN, Alphabet_Filter_CLEAN_DNA_ACGTN, Alphabet_Filter_CLEAN_DNA_IUPAC, Alphabet_Filter_CLEAN_PROTEIN, Alphabet_Filter_TO_UPPER, Alphabet_Filter_TO_LOWER, Alphabet_Filter_COMPLEMENT, Alphabet_Filter_SOFTMASK_DNA, Alphabet_Filter_SOFTMASK_PROTEIN, Alphabet_Filter_DNA_NOAMBIG, Alphabet_Filter_PROTEIN_NOAMBIG_SOFTMASK, Alphabet_Filter_PROTEIN_NOAMBIG, Alphabet_Filter_DNA_NOAMBIG_SOFTMASK, Alphabet_Filter_NUMBER_OF_FILTERS } Alphabet_Filter; static guchar *Alphabet_Filter_get(Alphabet_Filter type){ static guchar *filter_set[Alphabet_Filter_NUMBER_OF_FILTERS] = { /* Alphabet_Filter_VALID_SEQ_SYMBOL: */ Alphabet_Filter_macro("*", "ABCDEFGHI-KLMN-PQRSTUVWXYZ", "abcdefghi-klmn-pqrstuvwxyz"), /* Alphabet_Filter_VALID_DNA_IUPAC: */ Alphabet_Filter_macro("-", "ABCD--GH--K-MN---RST-VW-Y-", "abcd--gh--k-mn---rst-vw-y-"), /* Alphabet_Filter_DNA_ACGTN: */ Alphabet_Filter_macro("-", "A-C---G------N-----T------", "a-c---g------n-----t------"), /* Alphabet_Filter_VALID_PROTEIN: */ Alphabet_Filter_macro("*", "ABCDEFGHI-KLMN-PQRSTUVWXYZ", "abcdefghi-klmn-pqrstuvwxyz"), /* Alphabet_Filter_CLEAN_DNA_ACGTN: */ Alphabet_Filter_macro("-", "ANCNNNGNNNNNNNNNNNNTNNNNNN", "ancnnngnnnnnnnnnnnntnnnnnn"), /* Alphabet_Filter_CLEAN_DNA_IUPAC: */ Alphabet_Filter_macro("-", "ABCDNNGHNNKNMNNNNRSTNVWNYN", "abcdnnghnnknmnnnnrstnvwnyn"), /* Alphabet_Filter_CLEAN_PROTEIN: */ Alphabet_Filter_macro("*", "AXCDEFGHIXKLMNXPQRSTUVWXYX", "axcdefghixklmnxpqrstuvwxyx"), /* Alphabet_Filter_TO_UPPER: */ Alphabet_Filter_macro("*", "ABCDEFGHI-KLMN-PQRSTUVWXY-", "ABCDEFGHI-KLMN-PQRSTUVWXY-"), /* Alphabet_Filter_TO_LOWER: */ Alphabet_Filter_macro("*", "abcdefghi-klmn-pqrstuvwxy-", "abcdefghi-klmn-pqrstuvwxy-"), /* Alphabet_Filter_COMPLEMENT: */ Alphabet_Filter_macro("-", "TVGH--CD--M-KN---YSA-BW-R-", "tvgh--cd--m-kn---ysa-bw-r-"), /* Alphabet_Filter_SOFTMASK_DNA: */ Alphabet_Filter_macro("-", "ABCD--GH--K-MN---RST-VW-Y-", "NNNN--NN--N-NN---NNN-NN-N-"), /* Alphabet_Filter_SOFTMASK_PROTEIN: */ Alphabet_Filter_macro("*", "ABCDEFGHI-KLMN-PQRSTUVWXYZ", "XXXXXXXXX-XXXX-XXXXXXXXXXX"), /* Alphabet_Filter_DNA_NOAMBIG: */ Alphabet_Filter_macro("-", "A-C---G------------T------", "A-C---G------------T------"), /* Alphabet_Filter_PROTEIN_NOAMBIG_SOFTMASK: */ Alphabet_Filter_macro("*", "A-CDEFGHI-KLMN-PQRSTUVW-Y-", "--------------------------"), /* Alphabet_Filter_PROTEIN_NOAMBIG: */ Alphabet_Filter_macro("*", "A-CDEFGHI-KLMN-PQRSTUVW-Y-", "A-CDEFGHI-KLMN-PQRSTUVW-Y-"), /* Alphabet_Filter_DNA_NOAMBIG_SOFTMASK: */ Alphabet_Filter_macro("-", "A-C---G------------T------", "--------------------------") }; /* The protein alphabets include U for selenocysteine */ g_assert(type >= 0); g_assert(type < Alphabet_Filter_NUMBER_OF_FILTERS); return filter_set[type]; } /**/ Alphabet *Alphabet_create(Alphabet_Type type, gboolean is_soft_masked){ register Alphabet *a = g_new0(Alphabet, 1); a->ref_count = 1; a->type = type; a->is_soft_masked = is_soft_masked; /* Set look-up tables */ a->unmasked = Alphabet_Filter_get(Alphabet_Filter_TO_UPPER); if(is_soft_masked){ if(type == Alphabet_Type_PROTEIN){ a->masked = Alphabet_Filter_get (Alphabet_Filter_SOFTMASK_PROTEIN); a->non_ambig = Alphabet_Filter_get (Alphabet_Filter_PROTEIN_NOAMBIG_SOFTMASK); } else { a->masked = Alphabet_Filter_get (Alphabet_Filter_SOFTMASK_DNA); a->non_ambig = Alphabet_Filter_get (Alphabet_Filter_DNA_NOAMBIG_SOFTMASK); } } else { a->masked = Alphabet_Filter_get(Alphabet_Filter_TO_UPPER); if(type == Alphabet_Type_PROTEIN){ a->non_ambig = Alphabet_Filter_get (Alphabet_Filter_PROTEIN_NOAMBIG); } else { a->non_ambig = Alphabet_Filter_get (Alphabet_Filter_DNA_NOAMBIG); } } switch(type){ case Alphabet_Type_DNA: a->member = (guchar*)"ACGT"; a->is_valid = Alphabet_Filter_get(Alphabet_Filter_VALID_DNA_IUPAC); a->complement = Alphabet_Filter_get(Alphabet_Filter_COMPLEMENT); a->clean = Alphabet_Filter_get(Alphabet_Filter_CLEAN_DNA_IUPAC); break; case Alphabet_Type_PROTEIN: a->member = (guchar*)"ARNDCQEGHILKMFPSTWYUV*"; a->is_valid = Alphabet_Filter_get(Alphabet_Filter_VALID_PROTEIN); a->complement = NULL; a->clean = Alphabet_Filter_get(Alphabet_Filter_CLEAN_PROTEIN); break; case Alphabet_Type_UNKNOWN: a->member = NULL; a->is_valid = Alphabet_Filter_get(Alphabet_Filter_VALID_SEQ_SYMBOL); a->complement = NULL; a->clean = NULL; /* Cannot clean unknown sequence type */ break; default: g_error("Unknown alphabet type [%d]", type); break; } return a; } void Alphabet_destroy(Alphabet *a){ g_assert(a); if(--a->ref_count) return; g_free(a); return; } Alphabet *Alphabet_share(Alphabet *a){ g_assert(a); g_assert(a->ref_count); a->ref_count++; return a; } /**/ guchar *Alphabet_get_filter_by_type(Alphabet *alphabet, Alphabet_Filter_Type filter_type){ register guchar *filter = NULL; g_assert(filter_type >= 0); g_assert(filter_type < Alphabet_Filter_Type_NUMBER_OF_TYPES); switch(filter_type){ case Alphabet_Filter_Type_UNMASKED: filter = alphabet->unmasked; break; case Alphabet_Filter_Type_MASKED: filter = alphabet->masked; break; case Alphabet_Filter_Type_IS_VALID: filter = alphabet->is_valid; break; case Alphabet_Filter_Type_COMPLEMENT: filter = alphabet->complement; break; case Alphabet_Filter_Type_CLEAN: filter = alphabet->clean; break; case Alphabet_Filter_Type_CLEAN_ACGTN: filter = Alphabet_Filter_get( Alphabet_Filter_CLEAN_DNA_ACGTN); break; case Alphabet_Filter_Type_NON_AMBIG: filter = alphabet->non_ambig; break; default: g_error("Alphabet_Filter_Type unknown [%d]", filter_type); break; } return filter; } gchar *Alphabet_Filter_Type_get_name(Alphabet_Filter_Type filter_type){ register gchar *name = NULL; g_assert(filter_type >= 0); g_assert(filter_type < Alphabet_Filter_Type_NUMBER_OF_TYPES); switch(filter_type){ case Alphabet_Filter_Type_UNMASKED: name = "unmasked"; break; case Alphabet_Filter_Type_MASKED: name = "masked"; break; case Alphabet_Filter_Type_IS_VALID: name = "valid"; break; case Alphabet_Filter_Type_COMPLEMENT: name = "complement"; break; case Alphabet_Filter_Type_CLEAN: name = "clean"; break; case Alphabet_Filter_Type_CLEAN_ACGTN: name = "clean_acgtn"; break; case Alphabet_Filter_Type_NON_AMBIG: name = "non_ambig"; break; default: g_error("Alphabet_Filter_Type unknown [%d]", filter_type); break; } return name; } /**/ gchar *Alphabet_Type_get_name(Alphabet_Type type){ switch(type){ case Alphabet_Type_DNA: return "DNA"; case Alphabet_Type_PROTEIN: return "Protein"; case Alphabet_Type_UNKNOWN: return "Unknown"; } g_error("Unknown Alphabet Type [%d]", type); return NULL; } Alphabet_Type Alphabet_name_get_type(gchar *name){ register gint i; gchar *dna_string[5] = {"d", "dna", "n", "nt", "nucleotide"}, *protein_string[4] = {"p", "protein", "aa", "aminoacid"}, *unknown_string[3] = {"?", "unknown", "-"}; for(i = 0; i < 5; i++) if(!g_strcasecmp(name, dna_string[i])) return Alphabet_Type_DNA; for(i = 0; i < 4; i++) if(!g_strcasecmp(name, protein_string[i])) return Alphabet_Type_PROTEIN; for(i = 0; i < 3; i++) if(!g_strcasecmp(name, unknown_string[i])) return Alphabet_Type_UNKNOWN; g_error("Unknown sequence type [%s]", name); return Alphabet_Type_UNKNOWN; /* not reached */ } gchar *Alphabet_aa2tla(gchar aa){ register Alphabet_ArgumentSet *aas = Alphabet_ArgumentSet_create(NULL); static gchar *tla_names[25] = { "Ala", "Arg", "Asn", "Asp", "Cys", "Gln", "Glu", "Gly", "His", "Ile", "Leu", "Lys", "Met", "Phe", "Pro", "Ser", "Thr", "Trp", "Tyr", "Val", "Asx", "Zed", "Unk", "***", "Sec" }; static gchar *short_names[25] = { "^A^", "^R^", "^N^", "^D^", "^C^", "^Q^", "^E^", "^G^", "^H^", "^I^", "^L^", "^K^", "^M^", "^F^", "^P^", "^S^", "^T^", "^W^", "^Y^", "^V^", "^B^", "^Z^", "^X^", "^*^", "^U^"}; static guchar index[(1<<8)] = /* ARNDCQEGHILKMFPSTWYV */ { 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, /* 32 */ 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 23, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, /* 64 */ 25, 0, 20, 4, 3, 6, 13, 7, 8, 9, 25, 11, 10, 12, 2, 25, 14, 5, 1, 15, 16, 25, 19, 17, 22, 18, 21, 25, 25, 25, 25, 25, 25, 0, 20, 4, 3, 6, 13, 7, 8, 9, 25, 11, 10, 12, 2, 25, 14, 5, 1, 15, 16, 25, 19, 17, 22, 18, 21, 25, 25, 25, 25, 25, /* 128 */ 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, /* 192 */ 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25 /* 256 */ }; if(index[(guchar)aa] == 25) g_error("Unknown amino acid [%c]", aa); g_assert(short_names[index[(guchar)aa]][1] == toupper(aa)); if(aas->use_aa_tla) return tla_names[index[(guchar)aa]]; return short_names[index[(guchar)aa]]; } gchar *Alphabet_nt2ambig(gchar nt){ static gchar *ambig[16] = { /* - */ NULL, /* G */ "G", /* A */ "A", /* R */ "AG", /* T */ "T", /* K */ "GT", /* W */ "AT", /* D */ "AGT", /* C */ "C", /* S */ "CG", /* M */ "AC", /* V */ "ACG", /* Y */ "CT", /* B */ "CGT", /* H */ "ACT", /* N */ "ACGT" }; static guchar index[(1<<8)] = { /* GARTKWDCSMVYBHN */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 13, 8, 7, 0, 0, 1, 14, 0, 0, 5, 0, 10, 15, 0, 0, 0, 3, 9, 4, 0, 11, 6, 0, 12, 0, 0, 0, 0, 0, 0, 0, 2, 13, 8, 7, 0, 0, 1, 14, 0, 0, 5, 0, 10, 15, 0, 0, 0, 3, 9, 4, 0, 11, 6, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, }; g_assert(index[(guchar)nt]); return ambig[index[(guchar)nt]]; } exonerate-2.4.0/src/sequence/translate.c0000644000175000017500000003107111162714341015172 00000000000000/****************************************************************\ * * * Nucleotide Translation Code * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include #include /* For atoi() */ #include "translate.h" /* :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CTAG 0 - 0000 - blank Representation of Nucleotides 1 G 0001 C ----------------------------- 2 A 0010 T 3 R 0011 Y (AG) The standard (IUB) nucleotides 4 T 0100 A are used in the array "-GARTKWDCSMVYBHN" 5 K 0101 M (GT) so that: 6 W 0110 W (AT) 7 D 0111 H (AGT) 1: Four bits per base are used. 8 C 1000 G 2: A bit is set for each base represented. 9 S 1001 S (CG) 3: Any base reversed is it's complement. 10 M 1010 K (AC) 11 V 1011 B (ACG) 12 Y 1100 R (CT) 13 B 1101 V (CGT) 14 H 1110 D (ACT) 15 N 1111 N (ATGC) :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: */ /**/ Translate_ArgumentSet *Translate_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static Translate_ArgumentSet tas = {NULL}; if(arg){ as = ArgumentSet_create("Translation Options"); ArgumentSet_add_option(as, '\0', "geneticcode", NULL, "Use built-in or custom genetic code", "1", Argument_parse_string, &tas.genetic_code); Argument_absorb_ArgumentSet(arg, as); } return &tas; } /**/ static void Translate_initialise_nucleotide_data(Translate *t){ register gint i; for(i = 0; i < Translate_NT_SET_SIZE; i++) t->nt2d[t->nt[i]] = t->nt2d[tolower(t->nt[i])] = i; t->nt2d['X'] = t->nt2d['x'] = t->nt2d['N']; t->nt2d['U'] = t->nt2d['u'] = t->nt2d['T']; return; } static void Translate_initialise_peptide_data(Translate *t){ register gint i, j; guchar pimagrp[Translate_PIMA_SET_SIZE][6] = { "aIV", "bLM", "dFWY", "lND", "kDE", "oEQ", "nKR", "iST", "hAG", "cab", "edH", "mlk", "pon", "jihP", "fCcd", "rHmpi", "xfrj", "Xx*" }; for(i = 0; i < Translate_AA_SET_SIZE; i++) t->aa2d[t->aa[i]] = i; for(i = 1; i < 23; i++) t->aamask[i] = (1L<<(i-1)); /* First is zero */ for(i = 0; i < Translate_PIMA_SET_SIZE; i++){ t->aamask[t->aa2d[pimagrp[i][0]]] = t->aamask[t->aa2d[pimagrp[i][1]]]; for(j = 2; pimagrp[i][j]; j++) t->aamask[t->aa2d[pimagrp[i][0]]] |= t->aamask[t->aa2d[pimagrp[i][j]]]; } return; } static void Translate_initialise_translation_data(Translate *t){ register gchar a, b, c, x, y, z; register gint i, tmp; for(x = 0; x < 16; x++){ for(y = 0; y < 16; y++){ for(z = 0; z < 16; z++){ tmp = 0; for(a = 0; a < 4; a++){ if(x != (x|1<aamask[ t->aa2d[t->code[((a<<4)|(b<<2)|c)]]]); } } } for(i = 0; t->aamask[i] != (tmp|t->aamask[i]); i++); t->trans[x|(y<<4)|(z<<8)] = i; } } } return; } static void Translate_initialise_reverse_translate_data(Translate *t){ register gint i; for(i = 0; t->code[i]; i++){ if(!t->revtrans[t->code[i]]) t->revtrans[t->code[i]] = g_ptr_array_new(); g_ptr_array_add(t->revtrans[t->code[i]], GINT_TO_POINTER(i)); } return; } /**/ static gchar *Translate_convert_genetic_code(gchar *code){ register gint a, b, c, n = 0; gint table[4] = {3, 2, 0, 1}; register gchar *result; g_assert(code && (strlen(code) == 64)); result = g_new(gchar, 65); result[64] = '\0'; for(a = 0; a < 4; a++) for(b = 0; b < 4; b++) for(c = 0; c < 4; c++) result[n++] = code[(table[a] << 4) | (table[b] << 2) | table[c]]; return result; } /* Converts genetic code from NCBI format (TCAG order) * to the format internally by exonerate (GATC order) */ typedef struct { gint id; gchar *name; gchar *code; } Translate_GeneticCode; static gchar *Translate_get_builtin_genetic_code(gint id){ register gint i; Translate_GeneticCode code[17] = { { 1, "The Standard Code", "FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG"}, { 2, "The Vertebrate Mitochondrial Code", "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG"}, { 3, "The Yeast Mitochondrial Code", "FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRRIIMMTTTTNNKKSSRRVVVVAAAADDEEGGGG"}, { 4, "The Mold, Protozoan, and Coelenterate Mitochondrial Code" " and the Mycoplasma/Spiroplasma Code", "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG"}, { 5, "The Invertebrate Mitochondrial Code", "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG"}, { 6, "The Ciliate, Dasycladacean and Hexamita Nuclear Code", "FFLLSSSSYYQQCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG"}, { 9, "The Echinoderm and Flatworm Mitochondrial Code", "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG"}, {10, "The Euplotid Nuclear Code", "FFLLSSSSYY**CCCWLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG"}, {11, "The Bacterial and Plant Plastid Code", "FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG"}, {12, "The Alternative Yeast Nuclear Code", "FFLLSSSSYY**CC*WLLLSPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG"}, {13, "The Ascidian Mitochondrial Code", "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSGGVVVVAAAADDEEGGGG"}, {14, "The Alternative Flatworm Mitochondrial Code", "FFLLSSSSYYY*CCWWLLLLPPPPHHQQRRRRIIIMTTTTNNNKSSSSVVVVAAAADDEEGGGG"}, {15, "Blepharisma Nuclear Code", "FFLLSSSSYY*QCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG"}, {16, "Chlorophycean Mitochondrial Code", "FFLLSSSSYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG"}, {21, "Trematode Mitochondrial Code", "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNNKSSSSVVVVAAAADDEEGGGG"}, {22, "Scenedesmus obliquus mitochondrial Code", "FFLLSS*SYY*LCC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG"}, {23, "Thraustochytrium Mitochondrial Code", "FF*LSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG"} }; for(i = 0; i < 17; i++){ if(code[i].id == id) return Translate_convert_genetic_code(code[i].code); } g_error("No built in genetic code corresponding to id [%d]", id); return NULL; } /* Built in genetic codes taken from: * http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi */ static gchar *Translate_get_genetic_code(gchar *code){ register gint len, id; if(!code) return g_strdup( "GGGGEEDDVVVVAAAARRSSKKNNMIIITTTTW*CC**YYLLFFSSSSRRRRQQHHLLLLPPPP"); len = strlen(code); if(len == 64) return Translate_convert_genetic_code(code); if((len == 1) || (len == 2)){ id = atoi(code); return Translate_get_builtin_genetic_code(id); } g_error("Could not use genetic code [%s]", code); return NULL; } /**/ Translate *Translate_create(gboolean use_pima){ register Translate *t = g_new0(Translate, 1); register Translate_ArgumentSet *tas = Translate_ArgumentSet_create(NULL); t->ref_count = 1; t->nt = (guchar*)"-GARTKWDCSMVYBHN"; t->aa = (guchar*)"-ARNDCQEGHILKMFPSTWYV*ablkonihdmcepjfrxX"; t->code = (guchar*)Translate_get_genetic_code(tas->genetic_code); Translate_initialise_nucleotide_data(t); Translate_initialise_peptide_data(t); Translate_initialise_translation_data(t); Translate_initialise_reverse_translate_data(t); if(!use_pima) t->aa = (guchar*)"-ARNDCQEGHILKMFPSTWYV*XXXXXXXXXXXXXXXXXX"; return t; } Translate *Translate_share(Translate *t){ g_assert(t); t->ref_count++; return t; } void Translate_destroy(Translate *t){ register gint i; g_assert(t); if(--t->ref_count) return; for(i = 0; t->code[i]; i++){ if(t->revtrans[t->code[i]]){ g_ptr_array_free(t->revtrans[t->code[i]], TRUE); t->revtrans[t->code[i]] = NULL; } } g_free(t->code); g_free(t); return; } gint Translate_sequence(Translate *t, gchar *dna, gint dna_length, gint frame, gchar *aaseq, guchar *filter){ register gchar *dp, *ap = aaseq, *end; if((frame > 0) && (frame < 4)){ end = dna + dna_length-2; if(filter){ for(dp = dna + frame - 1; dp < end; dp += 3) *ap++ = Translate_base(t, filter[(guchar)dp[0]], filter[(guchar)dp[1]], filter[(guchar)dp[2]]); } else { for(dp = dna + frame - 1; dp < end; dp += 3) *ap++ = Translate_codon(t, dp); } *ap = '\0'; return ap - aaseq; } if((frame < 0) && (frame > -4)){ if(filter){ for(dp = dna + dna_length + frame - 2; dp >= dna; dp -= 3) *ap++ = Translate_base(t, filter[(guchar)dp[0]], filter[(guchar)dp[1]], filter[(guchar)dp[2]]); } else { for(dp = dna + dna_length + frame - 2; dp >= dna; dp -= 3) *ap++ = Translate_codon(t, dp); } *ap = '\0'; return ap - aaseq; } g_error("Invalid reading frame [%d]", frame); return 0; /* Never reached */ } /* Returns length of peptide generated */ void Translate_reverse(Translate *t, gchar *aaseq, gint length, Translate_reverse_func trf, gpointer user_data){ register gint i, j, total, *prod, id, codon; register gchar *nucleotide; g_assert(length > 0); g_assert(aaseq); g_assert(aaseq[0]); g_assert(trf); prod = g_new0(gint, length); nucleotide = g_new(gchar, (length*3)+1); nucleotide[length*3] = '\0'; total = t->revtrans[(guchar)aaseq[0]] ? t->revtrans[(guchar)aaseq[0]]->len : 1; for(i = 1; i < length; i++){ if(t->revtrans[(guchar)aaseq[i]]) total *= t->revtrans[(guchar)aaseq[i]]->len; } prod[0] = 1; for(i = 1; i < length; i++) prod[i] = prod[i-1] * (t->revtrans[(guchar)aaseq[i-1]] ? t->revtrans[(guchar)aaseq[i-1]]->len : 1); for(i = 0; i < total; i++){ for(j = 0; j < length; j++){ if(t->revtrans[(guchar)aaseq[j]]){ id = (i/prod[j])%t->revtrans[(guchar)aaseq[j]]->len; codon = GPOINTER_TO_INT( t->revtrans[(guchar)aaseq[j]]->pdata[id]); nucleotide[j*3] = t->nt[1<<((codon>>4)&3)]; nucleotide[(j*3)+1] = t->nt[1<<((codon>>2)&3)]; nucleotide[(j*3)+2] = t->nt[1<<(codon&3)]; } else { nucleotide[j*3] = nucleotide[(j*3)+1] = nucleotide[(j*3)+2] = 'N'; } } trf(nucleotide, length*3, user_data); } g_free(nucleotide); g_free(prod); return; } /* TODO: Add alternative genetic codes (when required) from: * http://www.ncbi.nlm.nih.gov/htbin-post/Taxonomy/wprintgc?mode=c */ exonerate-2.4.0/src/sequence/splice.c0000644000175000017500000004763211162714340014465 00000000000000/****************************************************************\ * * * Library for Splice site prediction. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include #include /* For log() */ #include /* For toupper() */ #include /* For strlen() */ #include "splice.h" #include "matrix.h" #include "lineparse.h" #ifndef SPLICE_IMPOSSIBLY_LOW_SCORE #define SPLICE_IMPOSSIBLY_LOW_SCORE -987654321.0 #endif /* SPLICE_IMPOSSIBLY_LOW_SCORE */ Splice_ArgumentSet *Splice_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static Splice_ArgumentSet sas = {NULL, NULL, FALSE}; if(arg){ as = ArgumentSet_create("Splice Site Prediction Options"); ArgumentSet_add_option(as, 0, "splice3", NULL, "Supply frequency matrix for 3' splice sites", "primate", Argument_parse_string, &sas.splice3_data_path); ArgumentSet_add_option(as, 0, "splice5", NULL, "Supply frequency matrix for 5' splice sites", "primate", Argument_parse_string, &sas.splice5_data_path); ArgumentSet_add_option(as, 0, "forcegtag", NULL, "Force use of gt...ag splice sites", "FALSE", Argument_parse_boolean, &sas.force_gtag); Argument_absorb_ArgumentSet(arg, as); } return &sas; } /* ------------------------------------------------------------ */ /* --- Start of Splice data ----------------------------------- */ /* Splice site data from: * * "Splice Junctions, Branch Point Sites, and Exons: * Sequence Statistics, Identification, and Applications * to Genome Project" * Periannan Senapathy, Marvin B. Shapiro, and Nomi L. Harris * Methods in Enzymology, Vol 183, pp 252-278. */ static void SplicePredictor_add_data_primate_5SS(SplicePredictor *sp){ register gint i, j; gint splice_pssm_primate_5SS_data[9][4] = { /* A C G T */ /* -3 */ { 28, 40, 17, 14 }, /* -2 */ { 59, 14, 13, 14 }, /* -1 */ { 8, 5, 81, 6 }, /* <-- Splice site */ /* 1 */ { 0, 0, 100, 0 }, /* G */ /* 2 */ { 0, 0, 0, 100 }, /* T */ /* 3 */ { 54, 2, 42, 2 }, /* 4 */ { 74, 8, 11, 8 }, /* 5 */ { 5, 6, 85, 4 }, /* 6 */ { 16, 18, 21, 45 } }; sp->model_length = 9; sp->model_splice_after = 3; sp->model_data = (gfloat**)Matrix2d_create(sp->model_length, 5, sizeof(gfloat)); for(i = 0; i < sp->model_length; i++) for(j = 0; j < 4; j++) sp->model_data[i][j] = (gfloat)splice_pssm_primate_5SS_data[i][j]; return; } static void SplicePredictor_add_data_primate_3SS(SplicePredictor *sp){ register gint i, j; gint splice_pssm_primate_3SS_data[15][4] = { /* A C G T */ /* -14 */ { 10, 31, 14, 44 }, /* -13 */ { 8, 36, 14, 43 }, /* -12 */ { 6, 34, 12, 48 }, /* -11 */ { 6, 34, 8, 52 }, /* -10 */ { 9, 37, 9, 45 }, /* -9 */ { 9, 38, 10, 44 }, /* -8 */ { 8, 44, 9, 40 }, /* -7 */ { 9, 41, 8, 41 }, /* -6 */ { 6, 44, 6, 45 }, /* -5 */ { 6, 40, 6, 48 }, /* -4 */ { 23, 28, 26, 23 }, /* -3 */ { 2, 79, 1, 18 }, /* -2 */ { 100, 0, 0, 0 }, /* A */ /* -1 */ { 0, 0, 100, 0 }, /* G */ /* <-- Splice site */ /* 1 */ { 28, 14, 47, 11 } }; sp->model_length = 15; sp->model_splice_after = 14; sp->model_data = (gfloat**)Matrix2d_create(sp->model_length, 5, sizeof(gfloat)); for(i = 0; i < sp->model_length; i++) for(j = 0; j < 4; j++) sp->model_data[i][j] = (gfloat)splice_pssm_primate_3SS_data[i][j]; return; } /* --- End of Splice data ------------------------------------- */ /* ------------------------------------------------------------ */ static void SplicePredictor_parse_data(SplicePredictor *sp, gchar *path){ register FILE *fp = fopen(path, "r"); register LineParse *lp; register gint i; gfloat num; register gchar *word; register GArray *number_list = g_array_new(FALSE, FALSE, sizeof(gfloat)); if(!fp) g_error("Could not open splice frequency data [%s]", path); lp = LineParse_create(fp); /**/ sp->model_length = 0; while(LineParse_word(lp) != EOF){ if(lp->word->len){ word = lp->word->pdata[0]; if(word[0] == '#'){ continue; } } switch(lp->word->len){ case 0: /* ignore */ break; case 1: word = lp->word->pdata[0]; if(!g_strcasecmp(word, "splice")) sp->model_splice_after = sp->model_length; else g_error("bad line in splice data file"); break; case 4: for(i = 0; i < 4; i++){ num = atof(lp->word->pdata[i]); g_array_append_val(number_list, num); } sp->model_length++; break; default: g_error("bad line in splice data file"); break; } } sp->model_data = NULL; LineParse_destroy(lp); fclose(fp); sp->model_data = (gfloat**)Matrix2d_create(sp->model_length, 5, sizeof(gfloat)); for(i = 0; i < sp->model_length; i++){ sp->model_data[i][0] = g_array_index(number_list, gfloat, (i*4)); sp->model_data[i][1] = g_array_index(number_list, gfloat, (i*4)+1); sp->model_data[i][2] = g_array_index(number_list, gfloat, (i*4)+2); sp->model_data[i][3] = g_array_index(number_list, gfloat, (i*4)+3); } g_array_free(number_list, TRUE); return; } #if 0 static void SplicePredictor_info(SplicePredictor *sp){ register gint i; g_message("type [%d] len [%d] splice_after [%d]", sp->type, sp->model_length, sp->model_splice_after); for(i = 0; i < sp->model_length; i++) g_message(" [%d] [%2.2f][%2.2f][%2.2f][%2.2f]", i, sp->model_data[i][0], sp->model_data[i][1], sp->model_data[i][2], sp->model_data[i][3]); return; } #endif /* 0 */ static void SplicePredictor_add_5SS_data(SplicePredictor *sp, gchar *path){ if((!path) || (!strlen(path)) || (!g_strcasecmp(path, "primate"))) SplicePredictor_add_data_primate_5SS(sp); else SplicePredictor_parse_data(sp, path); return; } static void SplicePredictor_add_3SS_data(SplicePredictor *sp, gchar *path){ if((!path) || (!strlen(path)) || (!g_strcasecmp(path, "primate"))) SplicePredictor_add_data_primate_3SS(sp); else SplicePredictor_parse_data(sp, path); sp->model_splice_after -= 2; return; } /**/ static SplicePredictor_GTAGonly *SplicePredictor_GTAGonly_create( SplicePredictor *sp){ register SplicePredictor_GTAGonly *spgo = g_new(SplicePredictor_GTAGonly, 1); switch(sp->type){ case SpliceType_ss5_forward: spgo->expect_one = 'G'; spgo->expect_two = 'T'; break; case SpliceType_ss3_forward: spgo->expect_one = 'A'; spgo->expect_two = 'G'; break; case SpliceType_ss5_reverse: spgo->expect_one = 'A'; spgo->expect_two = 'C'; break; case SpliceType_ss3_reverse: spgo->expect_one = 'C'; spgo->expect_two = 'T'; break; default: g_error("Unknown splice type [%d]", sp->type); break; } return spgo; } SplicePredictor *SplicePredictor_create(SpliceType type){ register SplicePredictor *sp = g_new(SplicePredictor, 1); register gint i, j, a, z; register gfloat swap; register Splice_ArgumentSet *sas = Splice_ArgumentSet_create(NULL); sp->ref_count = 1; sp->type = type; if((type == SpliceType_ss5_forward) || (type == SpliceType_ss5_reverse)){ SplicePredictor_add_5SS_data(sp, sas->splice5_data_path); } else { /* use SpliceType_3_PRIME */ SplicePredictor_add_3SS_data(sp, sas->splice3_data_path); } if((type == SpliceType_ss5_reverse) || (type == SpliceType_ss3_reverse)){ for(a = 0, z = sp->model_length-1; a < z; a++, z--){ for(j = 0; j < 4; j++){ swap = sp->model_data[a][j]; sp->model_data[a][j] = sp->model_data[z][j]; sp->model_data[z][j] = swap; } } sp->model_splice_after = sp->model_length - sp->model_splice_after - 2; } for(i = 0; i < (1<index[i] = 4; if((type == SpliceType_ss5_forward) || (type == SpliceType_ss3_forward)){ /* forward */ sp->index['A'] = sp->index['a'] = 0; sp->index['C'] = sp->index['c'] = 1; sp->index['G'] = sp->index['g'] = 2; sp->index['T'] = sp->index['t'] = 3; } else { /* reverse */ sp->index['T'] = sp->index['t'] = 0; sp->index['G'] = sp->index['g'] = 1; sp->index['C'] = sp->index['c'] = 2; sp->index['A'] = sp->index['a'] = 3; } for(i = 0; i < sp->model_length; i++){ for(j = 0; j < 4; j++){ sp->model_data[i][j] = (((gfloat)(1+sp->model_data[i][j])) /((25.0+1.0))); sp->model_data[i][j] = log(sp->model_data[i][j]) * 1.5; } sp->model_data[i][4] = 0.0; } if(sas->force_gtag) sp->gtag_only = SplicePredictor_GTAGonly_create(sp); else sp->gtag_only = NULL; return sp; } void SplicePredictor_destroy(SplicePredictor *sp){ if(--sp->ref_count) return; if(sp->gtag_only) g_free(sp->gtag_only); g_free(sp->model_data); g_free(sp); return; } SplicePredictor *SplicePredictor_share(SplicePredictor *sp){ sp->ref_count++; return sp; } static gboolean Splice_predict_is_on_GTAG(SplicePredictor *sp, gchar *seq, guint seq_pos){ register gchar base_one = toupper(seq[seq_pos]), base_two = toupper(seq[seq_pos+1]); return ((base_one == sp->gtag_only->expect_one) && (base_two == sp->gtag_only->expect_two)); } static gfloat Splice_predict_position(SplicePredictor *sp, gchar *seq, gint seq_len, guint seq_pos){ register gfloat pos_score, score = 0.0; register gint i, seq_start = seq_pos-sp->model_splice_after, model_start = 0, calc_length = sp->model_length; if(seq_start < 0){ model_start = -seq_start; seq_start = 0; calc_length -= model_start; } if((seq_start + calc_length) > seq_len){ calc_length = seq_len-seq_start; } for(i = 0; i < calc_length; i++){ pos_score = sp->model_data[model_start+i] [sp->index[(guchar)seq[seq_start+i]]]; score += pos_score; } if(sp->gtag_only && (!Splice_predict_is_on_GTAG(sp, seq, seq_pos))) return SPLICE_IMPOSSIBLY_LOW_SCORE; return score; } gint SplicePredictor_predict(SplicePredictor *sp, gchar *seq, gint seq_len, guint start, guint length, gfloat *score){ register guint i; register guint max_pos = -1; register gfloat curr_score, max_score = SPLICE_IMPOSSIBLY_LOW_SCORE; g_assert((start+length) <= seq_len); for(i = 0; i < length; i++){ curr_score = Splice_predict_position(sp, seq, seq_len, start+i); if(max_score < curr_score){ max_score = curr_score; max_pos = i; } } if(score) *score = max_score; return start+max_pos; } void SplicePredictor_predict_array(SplicePredictor *sp, gchar *seq, guint seq_len, guint start, guint length, gfloat *pred){ register guint i; g_assert((start+length) <= seq_len); g_assert(seq); g_assert(pred); for(i = 0; i < length; i++){ pred[i] = Splice_predict_position(sp, seq, seq_len, start+i); } return; } static gint SplicePredictor_round(gfloat num){ return (gint)((num < 0)?(num - 0.5):(num + 0.5)); } void SplicePredictor_predict_array_int(SplicePredictor *sp, gchar *seq, guint seq_len, guint start, guint length, gint *pred){ register guint i; register gfloat score; g_assert(seq); g_assert(length?(pred?TRUE:FALSE):TRUE); g_assert((start+length) <= seq_len); for(i = 0; i < length; i++){ score = Splice_predict_position(sp, seq, seq_len, i+start); pred[i] = SplicePredictor_round(score); } return; } gfloat SplicePredictor_get_max_score(SplicePredictor *sp){ register gfloat score = 0.0, pos_score; register gint i, j; for(i = 0; i < sp->model_length; i++){ pos_score = sp->model_data[i][0]; for(j = 1; j < 4; j++) if(pos_score < sp->model_data[i][j]) pos_score = sp->model_data[i][j]; score += pos_score; } return score; } SplicePredictorSet *SplicePredictorSet_create(void){ register SplicePredictorSet *sps = g_new(SplicePredictorSet, 1); sps->ss5_forward = SplicePredictor_create(SpliceType_ss5_forward); sps->ss5_reverse = SplicePredictor_create(SpliceType_ss5_reverse); sps->ss3_forward = SplicePredictor_create(SpliceType_ss3_forward); sps->ss3_reverse = SplicePredictor_create(SpliceType_ss3_reverse); return sps; } void SplicePredictorSet_destroy(SplicePredictorSet *sps){ SplicePredictor_destroy(sps->ss5_forward); SplicePredictor_destroy(sps->ss5_reverse); SplicePredictor_destroy(sps->ss3_forward); SplicePredictor_destroy(sps->ss3_reverse); g_free(sps); return; } /* TODO: o Maybe add support for branch signal prediction * o Add support for different organisms */ /**/ static gpointer SplicePredictor_cache_get_func(gint pos, gpointer page_data, gpointer user_data){ register gint *data = page_data; return GINT_TO_POINTER(data[pos]); } static SparseCache_Page *SplicePrediction_cache_fill_func(gint start, gpointer user_data){ register SplicePrediction *spn = user_data; register SparseCache_Page *page = g_new(SparseCache_Page, 1); register gint seq_start, seq_length, pred_start, pred_length; register gchar *seq; page->get_func = SplicePredictor_cache_get_func; page->copy_func = NULL; pred_start = MAX(0, start); pred_length = MIN(spn->length, start+SparseCache_PAGE_SIZE) - pred_start; seq_start = MAX(0, pred_start-spn->sp->model_splice_after); seq_length = MIN(spn->length, pred_start+pred_length +(spn->sp->model_length-spn->sp->model_splice_after)) - seq_start; page->data = g_new(gint, pred_length); page->data_size = sizeof(gint)*pred_length; seq = spn->get_seq_func(seq_start, seq_length, spn->user_data); SplicePredictor_predict_array_int(spn->sp, seq, seq_length, pred_start-seq_start, pred_length, page->data); g_free(seq); return page; } SplicePrediction *SplicePrediction_create(SplicePredictor *sp, gint len, SplicePrediction_get_seq_func get_seq_func, gboolean use_single, gpointer user_data){ register SplicePrediction *spn = g_new(SplicePrediction, 1); register gchar *seq; g_assert(get_seq_func); g_assert(len > 0); spn->sp = SplicePredictor_share(sp); spn->length = len; spn->get_seq_func = get_seq_func; spn->user_data = user_data; if(use_single){ spn->single = g_new(gint, len); seq = get_seq_func(0, len, user_data); SplicePredictor_predict_array_int(sp, seq, len, 0, len, spn->single); g_free(seq); spn->cache = NULL; } else { spn->single = NULL; spn->cache = SparseCache_create(len, SplicePrediction_cache_fill_func, NULL, NULL, spn); } return spn; } void SplicePrediction_destroy(SplicePrediction *spn){ SplicePredictor_destroy(spn->sp); if(spn->single) g_free(spn->single); if(spn->cache) SparseCache_destroy(spn->cache); g_free(spn); return; } gint SplicePrediction_get(SplicePrediction *spn, gint pos){ g_assert(spn); if(spn->single){ /* Have precomputed prediction */ g_assert(pos >= 0); g_assert(pos < spn->length); return spn->single[pos]; } g_assert(spn->cache); return GPOINTER_TO_INT(SparseCache_get(spn->cache, pos)); } /* FIXME: optimisation: remove conditional in codegen */ /**/ SplicePrediction_Set *SplicePrediction_Set_add(SplicePrediction_Set *spns, SplicePredictorSet *sps, SpliceType type, gint len, SplicePrediction_get_seq_func get_seq_func, gboolean use_single, gpointer user_data){ if(!spns) spns = g_new0(SplicePrediction_Set, 1); switch(type){ case SpliceType_ss5_forward: if(!spns->ss5_forward) spns->ss5_forward = SplicePrediction_create(sps->ss5_forward, len, get_seq_func, use_single, user_data); break; case SpliceType_ss3_forward: if(!spns->ss3_forward) spns->ss3_forward = SplicePrediction_create(sps->ss3_forward, len, get_seq_func, use_single, user_data); break; case SpliceType_ss5_reverse: if(!spns->ss5_reverse) spns->ss5_reverse = SplicePrediction_create(sps->ss5_reverse, len, get_seq_func, use_single, user_data); break; case SpliceType_ss3_reverse: if(!spns->ss3_reverse) spns->ss3_reverse = SplicePrediction_create(sps->ss3_reverse, len, get_seq_func, use_single, user_data); break; default: g_error("Unknown SpliceType [%d]", type); break; } return spns; } void SplicePrediction_Set_destroy(SplicePrediction_Set *sps){ if(sps->ss5_forward) SplicePrediction_destroy(sps->ss5_forward); if(sps->ss5_reverse) SplicePrediction_destroy(sps->ss5_reverse); if(sps->ss3_forward) SplicePrediction_destroy(sps->ss3_forward); if(sps->ss3_reverse) SplicePrediction_destroy(sps->ss3_reverse); g_free(sps); return; } /**/ exonerate-2.4.0/src/sequence/submat.test.c0000644000175000017500000000244711162714341015453 00000000000000/****************************************************************\ * * * Substitution Matrix Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "submat.h" int main(int argc, char **argv){ register Submat *s = Submat_create(NULL); g_message("test AA [%d], AT [%d], AN [%d] NN [%d]", Submat_lookup(s, 'A', 'A'), Submat_lookup(s, 'A', 'T'), Submat_lookup(s, 'A', 'N'), Submat_lookup(s, 'N', 'N')); Submat_destroy(s); return 0; } exonerate-2.4.0/src/sequence/submat.c0000644000175000017500000004205111162714341014470 00000000000000/****************************************************************\ * * * Substutition Matrix Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ /* Uses built in substitution tables, and can read blast format tables. Order : "ARNDCQEGHILKMFPSTWYVBZX*" */ #include #include #include "submat.h" static guchar local_submat_index[ALPHABETSIZE] = { 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, /* 32 */ 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 23, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, /* 64 */ /* a b c d e f g h i j k l m n o */ 24, 0, 20, 4, 3, 6, 13, 7, 8, 9, 24, 11, 10, 12, 2, 24, /* 14, 5, 1, 15, 16, 24, 19, 17, 22, 18, 21, 24, 24, 24, 24, 24, */ 14, 5, 1, 15, 16, 4, 19, 17, 22, 18, 21, 24, 24, 24, 24, 24, /* p q r s t u v w x y z */ /* 96 */ /* A B C D E F G H I J K L M N O */ 24, 0, 20, 4, 3, 6, 13, 7, 8, 9, 24, 11, 10, 12, 2, 24, /* 14, 5, 1, 15, 16, 24, 19, 17, 22, 18, 21, 24, 24, 24, 24, 24, */ 14, 5, 1, 15, 16, 4, 19, 17, 22, 18, 21, 24, 24, 24, 24, 24, /* P Q R S T U V W X Y Z */ /* 128 */ 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, /* 192 */ 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24 /* 256 */ }; /* FIXME: selenocysteine (U) is just treated as a cysteine at the moment ... */ /* A R N D C Q E G H I L K M F P S T W Y V B Z X * */ static SubmatMatrix local_submat_blosum62 = { { 4,-1,-2,-2, 0,-1,-1, 0,-2,-1,-1,-1,-1,-2,-1, 1, 0,-3,-2, 0,-2,-1, 0,-4}, {-1, 5, 0,-2,-3, 1, 0,-2, 0,-3,-2, 2,-1,-3,-2,-1,-1,-3,-2,-3,-1, 0,-1,-4}, {-2, 0, 6, 1,-3, 0, 0, 0, 1,-3,-3, 0,-2,-3,-2, 1, 0,-4,-2,-3, 3, 0,-1,-4}, {-2,-2, 1, 6,-3, 0, 2,-1,-1,-3,-4,-1,-3,-3,-1, 0,-1,-4,-3,-3, 4, 1,-1,-4}, { 0,-3,-3,-3, 9,-3,-4,-3,-3,-1,-1,-3,-1,-2,-3,-1,-1,-2,-2,-1,-3,-3,-2,-4}, {-1, 1, 0, 0,-3, 5, 2,-2, 0,-3,-2, 1, 0,-3,-1, 0,-1,-2,-1,-2, 0, 3,-1,-4}, {-1, 0, 0, 2,-4, 2, 5,-2, 0,-3,-3, 1,-2,-3,-1, 0,-1,-3,-2,-2, 1, 4,-1,-4}, { 0,-2, 0,-1,-3,-2,-2, 6,-2,-4,-4,-2,-3,-3,-2, 0,-2,-2,-3,-3,-1,-2,-1,-4}, {-2, 0, 1,-1,-3, 0, 0,-2, 8,-3,-3,-1,-2,-1,-2,-1,-2,-2, 2,-3, 0, 0,-1,-4}, {-1,-3,-3,-3,-1,-3,-3,-4,-3, 4, 2,-3, 1, 0,-3,-2,-1,-3,-1, 3,-3,-3,-1,-4}, {-1,-2,-3,-4,-1,-2,-3,-4,-3, 2, 4,-2, 2, 0,-3,-2,-1,-2,-1, 1,-4,-3,-1,-4}, {-1, 2, 0,-1,-3, 1, 1,-2,-1,-3,-2, 5,-1,-3,-1, 0,-1,-3,-2,-2, 0, 1,-1,-4}, {-1,-1,-2,-3,-1, 0,-2,-3,-2, 1, 2,-1, 5, 0,-2,-1,-1,-1,-1, 1,-3,-1,-1,-4}, {-2,-3,-3,-3,-2,-3,-3,-3,-1, 0, 0,-3, 0, 6,-4,-2,-2, 1, 3,-1,-3,-3,-1,-4}, {-1,-2,-2,-1,-3,-1,-1,-2,-2,-3,-3,-1,-2,-4, 7,-1,-1,-4,-3,-2,-2,-1,-2,-4}, { 1,-1, 1, 0,-1, 0, 0, 0,-1,-2,-2, 0,-1,-2,-1, 4, 1,-3,-2,-2, 0, 0, 0,-4}, { 0,-1, 0,-1,-1,-1,-1,-2,-2,-1,-1,-1,-1,-2,-1, 1, 5,-2,-2, 0,-1,-1, 0,-4}, {-3,-3,-4,-4,-2,-2,-3,-2,-2,-3,-2,-3,-1, 1,-4,-3,-2,11, 2,-3,-4,-3,-2,-4}, {-2,-2,-2,-3,-2,-1,-2,-3, 2,-1,-1,-2,-1, 3,-3,-2,-2, 2, 7,-1,-3,-2,-1,-4}, { 0,-3,-3,-3,-1,-2,-2,-3,-3, 3, 1,-2, 1,-1,-2,-2, 0,-3,-1, 4,-3,-2,-1,-4}, {-2,-1, 3, 4,-3, 0, 1,-1, 0,-3,-4, 0,-3,-3,-2, 0,-1,-4,-3,-3, 4, 1,-1,-4}, {-1, 0, 0, 1,-3, 3, 4,-2, 0,-3,-3, 1,-1,-3,-1, 0,-1,-3,-2,-2, 1, 4,-1,-4}, { 0,-1,-1,-1,-2,-1,-1,-1,-1,-1,-1,-1,-1,-1,-2, 0, 0,-2,-1,-1,-1,-1,-1,-4}, {-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4,-4, 1} }; static SubmatMatrix local_submat_pam250 = { { 2,-2, 0, 0,-2, 0, 0, 1,-1,-1,-2,-1,-1,-3, 1, 1, 1,-6,-3, 0, 0, 0, 0,-8}, {-2, 6, 0,-1,-4, 1,-1,-3, 2,-2,-3, 3, 0,-4, 0, 0,-1, 2,-4,-2,-1, 0,-1,-8}, { 0, 0, 2, 2,-4, 1, 1, 0, 2,-2,-3, 1,-2,-3, 0, 1, 0,-4,-2,-2, 2, 1, 0,-8}, { 0,-1, 2, 4,-5, 2, 3, 1, 1,-2,-4, 0,-3,-6,-1, 0, 0,-7,-4,-2, 3, 3,-1,-8}, {-2,-4,-4,-5,12,-5,-5,-3,-3,-2,-6,-5,-5,-4,-3, 0,-2,-8, 0,-2,-4,-5,-3,-8}, { 0, 1, 1, 2,-5, 4, 2,-1, 3,-2,-2, 1,-1,-5, 0,-1,-1,-5,-4,-2, 1, 3,-1,-8}, { 0,-1, 1, 3,-5, 2, 4, 0, 1,-2,-3, 0,-2,-5,-1, 0, 0,-7,-4,-2, 3, 3,-1,-8}, { 1,-3, 0, 1,-3,-1, 0, 5,-2,-3,-4,-2,-3,-5, 0, 1, 0,-7,-5,-1, 0, 0,-1,-8}, {-1, 2, 2, 1,-3, 3, 1,-2, 6,-2,-2, 0,-2,-2, 0,-1,-1,-3, 0,-2, 1, 2,-1,-8}, {-1,-2,-2,-2,-2,-2,-2,-3,-2, 5, 2,-2, 2, 1,-2,-1, 0,-5,-1, 4,-2,-2,-1,-8}, {-2,-3,-3,-4,-6,-2,-3,-4,-2, 2, 6,-3, 4, 2,-3,-3,-2,-2,-1, 2,-3,-3,-1,-8}, {-1, 3, 1, 0,-5, 1, 0,-2, 0,-2,-3, 5, 0,-5,-1, 0, 0,-3,-4,-2, 1, 0,-1,-8}, {-1, 0,-2,-3,-5,-1,-2,-3,-2, 2, 4, 0, 6, 0,-2,-2,-1,-4,-2, 2,-2,-2,-1,-8}, {-3,-4,-3,-6,-4,-5,-5,-5,-2, 1, 2,-5, 0, 9,-5,-3,-3, 0, 7,-1,-4,-5,-2,-8}, { 1, 0, 0,-1,-3, 0,-1, 0, 0,-2,-3,-1,-2,-5, 6, 1, 0,-6,-5,-1,-1, 0,-1,-8}, { 1, 0, 1, 0, 0,-1, 0, 1,-1,-1,-3, 0,-2,-3, 1, 2, 1,-2,-3,-1, 0, 0, 0,-8}, { 1,-1, 0, 0,-2,-1, 0, 0,-1, 0,-2, 0,-1,-3, 0, 1, 3,-5,-3, 0, 0,-1, 0,-8}, {-6, 2,-4,-7,-8,-5,-7,-7,-3,-5,-2,-3,-4, 0,-6,-2,-5,17, 0,-6,-5,-6,-4,-8}, {-3,-4,-2,-4, 0,-4,-4,-5, 0,-1,-1,-4,-2, 7,-5,-3,-3, 0,10,-2,-3,-4,-2,-8}, { 0,-2,-2,-2,-2,-2,-2,-1,-2, 4, 2,-2, 2,-1,-1,-1, 0,-6,-2, 4,-2,-2,-1,-8}, { 0,-1, 2, 3,-4, 1, 3, 0, 1,-2,-3, 1,-2,-4,-1, 0, 0,-5,-3,-2, 3, 2,-1,-8}, { 0, 0, 1, 3,-5, 3, 3, 0, 2,-2,-3, 0,-2,-5, 0, 0,-1,-6,-4,-2, 2, 3,-1,-8}, { 0,-1, 0,-1,-3,-1,-1,-1,-1,-1,-1,-1,-1,-2,-1, 0, 0,-4,-2,-1,-1,-1,-1,-8}, {-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8, 1} }; static SubmatMatrix local_submat_nucleic = { { 5, 1,-2,-1,-4, 0, 0,-4,-1, 0, 0,-4, 1, 0, 0,-4,-4, 1,-4,-1,-4, 0,-2, 0}, { 1,-1,-1,-1,-4, 0, 0, 1,-3, 0, 0,-2,-2, 0, 0,-2,-4,-2,-4,-1,-3, 0,-1, 0}, {-2,-1,-1,-1,-2, 0, 0,-2,-1, 0, 0,-1,-1, 0, 0,-1,-2,-1,-1,-1,-1, 0,-1, 0}, {-1,-1,-1,-1,-4, 0, 0,-1,-2, 0, 0,-1,-3, 0, 0,-3,-1,-1,-3,-2,-2, 0,-1, 0}, {-4,-4,-2,-4, 5, 0, 0,-4,-1, 0, 0,-4, 1, 0, 0, 1,-4,-4, 1,-1,-1, 0,-2, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, {-4, 1,-2,-1,-4, 0, 0, 5,-4, 0, 0, 1,-4, 0, 0, 1,-4,-4,-4,-1,-1, 0,-2, 0}, {-1,-3,-1,-2,-1, 0, 0,-4,-1, 0, 0,-3,-1, 0, 0,-3,-1,-1,-1,-2,-2, 0,-1, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, {-4,-2,-1,-1,-4, 0, 0, 1,-3, 0, 0,-1,-4, 0, 0,-2, 1,-2,-2,-3,-1, 0,-1, 0}, { 1,-2,-1,-3, 1, 0, 0,-4,-1, 0, 0,-4,-1, 0, 0,-2,-4,-2,-2,-1,-3, 0,-1, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, {-4,-2,-1,-3, 1, 0, 0, 1,-3, 0, 0,-2,-2, 0, 0,-1,-4,-4,-2,-1,-1, 0,-1, 0}, {-4,-4,-2,-1,-4, 0, 0,-4,-1, 0, 0, 1,-4, 0, 0,-4, 5, 1, 1,-4,-1, 0,-2, 0}, { 1,-2,-1,-1,-4, 0, 0,-4,-1, 0, 0,-2,-2, 0, 0,-4, 1,-1,-2,-3,-3, 0,-1, 0}, {-4,-4,-1,-3, 1, 0, 0,-4,-1, 0, 0,-2,-2, 0, 0,-2, 1,-2,-1,-3,-1, 0,-1, 0}, {-1,-1,-1,-2,-1, 0, 0,-1,-2, 0, 0,-3,-1, 0, 0,-1,-4,-3,-3,-1,-2, 0,-1, 0}, {-4,-3,-1,-2,-1, 0, 0,-1,-2, 0, 0,-1,-3, 0, 0,-1,-1,-3,-1,-2,-1, 0,-1, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, {-2,-1,-1,-1,-2, 0, 0,-2,-1, 0, 0,-1,-1, 0, 0,-1,-2,-1,-1,-1,-1, 0,-1, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} }; static SubmatMatrix local_submat_edit = { { 0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}, {-1, 0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}, {-1,-1, 0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}, {-1,-1,-1, 0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}, {-1,-1,-1,-1, 0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}, {-1,-1,-1,-1,-1, 0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}, {-1,-1,-1,-1,-1,-1, 0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}, {-1,-1,-1,-1,-1,-1,-1, 0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}, {-1,-1,-1,-1,-1,-1,-1,-1, 0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}, {-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}, {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}, {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}, {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}, {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1}, {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,-1,-1,-1,-1,-1,-1,-1,-1,-1}, {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,-1,-1,-1,-1,-1,-1,-1,-1}, {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,-1,-1,-1,-1,-1,-1,-1}, {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,-1,-1,-1,-1,-1,-1}, {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,-1,-1,-1,-1,-1}, {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,-1,-1,-1,-1}, {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,-1,-1,-1}, {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,-1,-1}, {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,-1}, {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0} }; static SubmatMatrix local_submat_identity = { { 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1} }; static SubmatMatrix local_submat_iupac_identity = { { 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0}, { 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0}, { 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0}, { 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0}, { 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0}, { 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0}, { 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0}, { 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0}, { 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0}, { 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0}, { 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0}, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1} }; /* Possible identity according to IUPAC redundancy codes */ static void Submat_read_matrix(SubmatMatrix matrix, gchar *path){ register FILE *fp = fopen(path, "r"); register gint i, total, ch, prev = '\n', num = 0, pos = 0, res; register gboolean neg, seen; guchar sub_index[SUBMAT_ALPHABETSIZE]; if(!fp) g_error("Could not open substitution matrix [%s]\n", path); while((ch = getc(fp)) != EOF){ if((prev == '\n') && (ch != '#')) break; prev = ch; } for(i = 0; i < SUBMAT_ALPHABETSIZE; i++) sub_index[i] = '\0'; total = 0; do { if(ch == '\n') break; else if(ch != ' '){ if(local_submat_index[ch] == SUBMAT_ALPHABETSIZE) g_error("Unrecognised matrix element [%c]\n", ch); else sub_index[total++] = ch; } } while((ch = getc(fp)) != EOF); neg = seen = FALSE; res = 0; while((ch = getc(fp)) != EOF){ /* Read data */ switch(ch){ case '-': neg = TRUE; break; case '\n': case ' ': if(seen){ if(neg) num = -num; matrix[local_submat_index[res]] [local_submat_index[sub_index[pos]]] = num; neg = seen = FALSE; num = 0; pos++; } break; case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': num *= 10; num += (ch-'0'); seen = TRUE; break; default: pos = 0; res = ch; break; } } fclose(fp); return; } static void Submat_copy_matrix(SubmatMatrix dst, SubmatMatrix src){ register gint i, j; for(i = 0; i < SUBMAT_ALPHABETSIZE; i++) for(j = 0; j < SUBMAT_ALPHABETSIZE; j++) dst[i][j] = src[i][j]; return; } Submat *Submat_create(gchar *path){ register Submat *s = g_new0(Submat, 1); s->ref_count = 1; s->index = local_submat_index; if((!path) || (!strcmp(path, "nucleic"))) /* default */ Submat_copy_matrix(s->matrix, local_submat_nucleic); else if(!strcmp(path, "blosum62")) Submat_copy_matrix(s->matrix, local_submat_blosum62); else if(!strcmp(path, "pam250")) Submat_copy_matrix(s->matrix, local_submat_pam250); else if(!strcmp(path, "edit")) Submat_copy_matrix(s->matrix, local_submat_edit); else if(!strcmp(path, "identity")) Submat_copy_matrix(s->matrix, local_submat_identity); else if(!strcmp(path, "iupac-identity")) Submat_copy_matrix(s->matrix, local_submat_iupac_identity); else Submat_read_matrix(s->matrix, path); return s; } void Submat_destroy(Submat *s){ if(--s->ref_count) return; g_free(s); return; } Submat *Submat_share(Submat *s){ s->ref_count++; return s; } gint Submat_max_score(Submat *s){ register gint i, j, max; g_assert(s); max = s->matrix[0][0]; for(i = 0; i < SUBMAT_ALPHABETSIZE; i++) for(j = 0; j < SUBMAT_ALPHABETSIZE; j++) if(max < s->matrix[i][j]) max = s->matrix[i][j]; return max; } exonerate-2.4.0/src/sequence/translate.test.c0000644000175000017500000000430611162714341016151 00000000000000/****************************************************************\ * * * Nucleotide Translation Code * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "translate.h" static void reverse_func(gchar *dna, gint length, gpointer user_data){ register gint *rt_count = user_data; g_message("Reverse translated [%s]", dna); (*rt_count)++; return; } static void check_translate(Translate *t, gchar *codon, gchar crib){ register gchar aa = Translate_codon(t, codon); g_message("Translate [%s] -> [%c] (crib:[%c]", codon, aa, crib); g_assert(aa == crib); return; } static void check_reverse_translate(Translate *t, gchar *word, gint crib){ int rt_count = 0; g_message("reverse translate [%s]", word); Translate_reverse(t, word, 5, reverse_func, &rt_count); g_message("produced [%d] reverse translations\n", rt_count); g_assert(rt_count == crib); return; } gint Argument_main(Argument *arg){ register Translate *t; Translate_ArgumentSet_create(arg); Argument_process(arg, "translate.test", NULL, NULL); t = Translate_create(FALSE); check_translate(t, "ATG", 'M'); check_translate(t, "GGX", 'G'); check_translate(t, "NGT", 'X'); check_translate(t, "TGA", '*'); /**/ check_reverse_translate(t, "S*MXG", 72); check_reverse_translate(t, "SSSSS", 7776); check_reverse_translate(t, "MMMMM", 1); Translate_destroy(t); return 0; } exonerate-2.4.0/src/sequence/splice.test.c0000644000175000017500000000522711162714340015435 00000000000000/****************************************************************\ * * * Library for Splice site prediction. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "splice.h" #include /* For strlen() */ static void check_splice_prediction(SplicePredictor *sp, gchar *seq, gint crib){ gfloat score; register gint len = strlen(seq); register gint pos = SplicePredictor_predict(sp, seq, len, 0, len, &score); g_message("pos [%d] crib [%d] score [%f]", pos, crib, score); g_assert(pos == crib); g_assert(score > 0); return; } gint Argument_main(Argument *arg){ register SplicePredictorSet *sps = SplicePredictorSet_create(); g_message("Checking splice site predition [%f,%f,%f,%f]", SplicePredictor_get_max_score(sps->ss5_forward), SplicePredictor_get_max_score(sps->ss5_reverse), SplicePredictor_get_max_score(sps->ss3_forward), SplicePredictor_get_max_score(sps->ss3_reverse)); check_splice_prediction(sps->ss5_forward, /* v */ "TGGCGCTCCTGGCCTCTGCCCGTAAGCACTTGGTGGGACTGGG", 21); /* (21) |GT... 5ss fwd */ check_splice_prediction(sps->ss3_forward, /* v */ "CAGCCCTGCTCTTTCCTCAGGAGCTTCAGAGGCCGAGGATGCC", 18); /* 3ss fwd ...AG| (18) */ check_splice_prediction(sps->ss5_reverse, /* v */ "CCCAGTCCCACCAAGTGCTTACGGGCAGAGGCCAGGAGCGCCA", 20); /* 3ss rev ... AC| (20) */ check_splice_prediction(sps->ss3_reverse, /* v */ "GGCATCCTCGGCCTCTGAAGCTCCTGAGGAAAGAGCAGGGCTG", 23); /* |CT... 5ss rev (23) */ SplicePredictorSet_destroy(sps); return 0; } exonerate-2.4.0/src/sequence/alphabet.test.c0000644000175000017500000000217211162714330015731 00000000000000/****************************************************************\ * * * Simple Alphabet Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "alphabet.h" gint Argument_main(Argument *arg){ register Alphabet *dna_alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE); Alphabet_destroy(dna_alphabet); return 0; } exonerate-2.4.0/src/sequence/codonsubmat.test.c0000644000175000017500000000434111162714331016470 00000000000000/****************************************************************\ * * * Codon Substitution Matrix Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "argument.h" #include "codonsubmat.h" gint Argument_main(Argument *arg){ register CodonSubmat *cs = CodonSubmat_create(); register guchar *alphabet = (guchar*)"ACGTN"; register gint a, b, c, d, e, f; register gint max; guchar codon_a[4], codon_b[4]; codon_a[3] = codon_b[3] = '\0'; for(a = 0; a < 5; a++){ codon_a[0] = alphabet[a]; for(b = 0; b < 5; b++){ codon_a[1] = alphabet[b]; for(c = 0; c < 5; c++){ codon_a[2] = alphabet[c]; for(d = 0; d < 5; d++){ codon_b[0] = alphabet[d]; for(e = 0; e < 5; e++){ codon_b[1] = alphabet[e]; for(f = 0; f < 5; f++){ codon_b[2] = alphabet[f]; g_message("Score [%s]:[%s] [%d]", codon_a, codon_b, CodonSubmat_lookup(cs, codon_a, codon_b)); } } } } } } max = CodonSubmat_max_score(cs); CodonSubmat_destroy(cs); g_message("Max value = [%d]", max); g_assert(max == 19); return 0; } exonerate-2.4.0/src/sequence/codonsubmat.c0000644000175000017500000012105711162714331015516 00000000000000/****************************************************************\ * * * Codon Substutition Matrix Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ /* Codon Substitution Data taken from: * "Empirical codon substitution matrix" * Adrian Schneider, Gina M Cannarozzi and Gaston H Gonnet * BMC Bioinformatics. 2005, 6:134 */ /* In alphabetical order of codon */ #include #include #include /* For tolower() */ #include /* For log() */ #include "codonsubmat.h" static gint local_codon_substitution_count [CODON_ALPHABETSIZE_ACGT] [CODON_ALPHABETSIZE_ACGT] = { /* AAA */ {192416, 5180, 139387, 5795, 4300, 1732, 1129, 1860, 20646, 2819, 12905, 2447, 1031, 476, 1993, 580, 7019, 1702, 10180, 1597, 1178, 530, 358, 689, 4714, 4108, 5511, 2693, 363, 366, 884, 398, 8048, 1636, 5765, 1836, 1979, 1019, 494, 1188, 1686, 799, 938, 654, 591, 364, 976, 468, 0, 247, 0, 261, 1159, 733, 292, 981, 0, 280, 264, 298, 341, 143, 442, 172}, /* AAC */ { 5180, 142704, 5105, 76958, 2579, 4969, 943, 3302, 1338, 19096, 1159, 9374, 309, 1115, 772, 663, 1484, 5608, 3282, 3172, 567, 801, 232, 663, 333, 942, 541, 433, 126, 446, 471, 263, 2525, 10085, 3034, 6768, 1131, 2290, 489, 1536, 1728, 4186, 1544, 1992, 242, 636, 590, 466, 0, 1308, 0, 803, 1077, 2021, 360, 1682, 0, 817, 100, 575, 139, 395, 238, 305}, /* AAG */ {139387, 5105, 246420, 5132, 3195, 1797, 1538, 1577, 15336, 2826, 19246, 2259, 649, 503, 3109, 515, 5156, 1891, 15687, 1526, 945, 524, 430, 604, 4333, 5231, 7539, 2968, 281, 370, 1292, 364, 5401, 1605, 9352, 1732, 1492, 1138, 599, 1064, 1215, 885, 1386, 628, 432, 370, 1230, 389, 0, 273, 0, 257, 919, 702, 298, 830, 0, 311, 441, 290, 263, 120, 530, 170}, /* AAT */ { 5795, 76958, 5132, 107878, 2735, 2939, 939, 4546, 1376, 11367, 1093, 13749, 351, 681, 842, 1064, 1674, 3512, 3272, 4474, 686, 603, 232, 919, 372, 550, 502, 610, 141, 272, 467, 459, 3072, 6326, 3252, 10678, 1354, 1451, 432, 2016, 1747, 2537, 1269, 2667, 267, 424, 602, 625, 0, 841, 0, 1251, 1133, 1275, 342, 1970, 0, 534, 107, 865, 181, 228, 272, 401}, /* ACA */ { 4300, 2579, 3195, 2735, 70464, 35853, 18993, 36681, 1840, 4516, 1039, 3463, 3103, 1946, 5607, 2147, 1277, 520, 2129, 547, 3628, 1634, 854, 2310, 276, 300, 383, 229, 569, 568, 1783, 601, 1993, 673, 1966, 983, 9590, 5142, 2048, 5778, 1320, 923, 844, 726, 2163, 1576, 4056, 1895, 0, 156, 0, 206, 7353, 4581, 1837, 6050, 0, 400, 152, 526, 798, 329, 964, 429}, /* ACC */ { 1732, 4969, 1797, 2939, 35853, 72268, 15062, 34881, 700, 8759, 604, 3844, 1277, 4040, 2951, 2200, 631, 812, 1389, 497, 1543, 2597, 560, 1934, 187, 386, 266, 188, 245, 898, 1240, 508, 960, 1167, 1203, 802, 4073, 10379, 1650, 5231, 772, 1616, 606, 756, 982, 2920, 2745, 1760, 0, 298, 0, 241, 3687, 7244, 1275, 5121, 0, 702, 97, 486, 329, 582, 559, 394}, /* ACG */ { 1129, 943, 1538, 939, 18993, 15062, 9680, 13523, 428, 1896, 506, 1292, 626, 757, 2625, 774, 345, 184, 912, 160, 704, 598, 346, 620, 114, 151, 190, 75, 154, 256, 787, 244, 495, 296, 748, 285, 2165, 1989, 1035, 1818, 372, 384, 366, 238, 558, 623, 1746, 707, 0, 78, 0, 91, 1860, 1833, 795, 1914, 0, 152, 75, 196, 207, 136, 397, 169}, /* ACT */ { 1860, 3302, 1577, 4546, 36681, 34881, 13523, 50168, 703, 5687, 538, 5850, 1335, 2174, 3070, 3693, 743, 527, 1270, 660, 1826, 1601, 581, 2933, 152, 270, 225, 260, 267, 532, 1170, 758, 1033, 889, 1236, 1262, 4658, 5276, 1507, 8364, 812, 1026, 650, 1014, 1034, 1677, 2847, 2631, 0, 232, 0, 311, 4237, 4379, 1355, 7128, 0, 498, 74, 697, 389, 391, 616, 571}, /* AGA */ { 20646, 1338, 15336, 1376, 1840, 700, 428, 703, 64584, 2232, 35379, 1962, 573, 252, 945, 270, 2600, 1083, 3685, 966, 402, 236, 173, 267, 15497, 11485, 15504, 8562, 191, 208, 573, 224, 1648, 399, 1144, 455, 712, 450, 198, 463, 2666, 844, 1151, 598, 307, 176, 428, 207, 0, 145, 0, 141, 429, 331, 121, 422, 0, 282, 472, 326, 179, 71, 211, 109}, /* AGC */ { 2819, 19096, 2826, 11367, 4516, 8759, 1896, 5687, 2232, 115828, 2078, 54263, 429, 1496, 1021, 954, 1079, 2354, 2379, 1395, 793, 1227, 331, 950, 551, 1609, 766, 673, 130, 512, 632, 353, 1815, 3818, 2273, 2615, 2075, 4419, 844, 2847, 4073, 11112, 3476, 4472, 387, 1015, 920, 802, 0, 637, 0, 450, 2489, 4806, 1038, 3883, 0, 3222, 187, 2186, 179, 398, 264, 328}, /* AGG */ { 12905, 1159, 19246, 1093, 1039, 604, 506, 538, 35379, 2078, 50950, 1647, 317, 220, 1264, 223, 1629, 1036, 4284, 760, 336, 210, 181, 222, 10455, 11329, 16771, 6741, 133, 219, 629, 169, 933, 322, 1571, 353, 516, 430, 228, 344, 1226, 734, 2008, 458, 173, 190, 577, 164, 0, 123, 0, 139, 265, 262, 113, 300, 0, 282, 928, 256, 134, 58, 217, 93}, /* AGT */ { 2447, 9374, 2259, 13749, 3463, 3844, 1292, 5850, 1962, 54263, 1647, 65410, 369, 634, 792, 1164, 858, 1161, 1612, 1511, 674, 552, 217, 952, 382, 611, 555, 753, 105, 230, 416, 332, 1539, 2036, 1794, 3160, 1780, 2003, 574, 2839, 3018, 4831, 2430, 5223, 288, 490, 717, 807, 0, 353, 0, 544, 2045, 2280, 693, 3451, 0, 1656, 154, 2286, 134, 183, 235, 338}, /* ATA */ { 1031, 309, 649, 351, 3103, 1277, 626, 1335, 573, 429, 317, 369, 30638, 28175, 7711, 25576, 288, 130, 452, 155, 462, 224, 121, 301, 108, 87, 109, 75, 2138, 2486, 5627, 2186, 523, 109, 445, 168, 1415, 765, 360, 898, 314, 178, 184, 119, 5479, 4631, 9780, 4914, 0, 109, 0, 134, 571, 331, 130, 345, 0, 106, 76, 119, 2544, 701, 2496, 891}, /* ATC */ { 476, 1115, 503, 681, 1946, 4040, 757, 2174, 252, 1496, 220, 634, 28175, 122704, 7704, 65193, 264, 333, 536, 242, 339, 490, 134, 372, 64, 141, 105, 83, 1923, 8057, 8591, 4462, 335, 304, 395, 174, 1009, 2396, 467, 1451, 213, 488, 230, 194, 4773, 18081, 15695, 10382, 0, 402, 0, 282, 392, 642, 129, 507, 0, 369, 99, 269, 1904, 2777, 2626, 1832}, /* ATG */ { 1993, 772, 3109, 842, 5607, 2951, 2625, 3070, 945, 1021, 1264, 792, 7711, 7704, 214848, 7959, 927, 401, 2499, 454, 907, 528, 366, 670, 227, 290, 444, 189, 3664, 5569, 20280, 4883, 875, 299, 1505, 324, 2756, 2046, 1059, 2026, 618, 451, 736, 327, 3533, 3494, 14027, 3874, 0, 332, 0, 338, 1381, 920, 505, 1193, 0, 251, 429, 279, 3689, 1481, 8984, 1728}, /* ATT */ { 580, 663, 515, 1064, 2147, 2200, 774, 3693, 270, 954, 223, 1164, 25576, 65193, 7959, 92314, 253, 255, 549, 359, 406, 331, 111, 591, 93, 95, 99, 102, 1876, 4633, 7578, 5795, 370, 209, 482, 404, 1252, 1323, 412, 2096, 269, 309, 196, 306, 4791, 9574, 13874, 15034, 0, 262, 0, 369, 431, 495, 137, 731, 0, 242, 105, 365, 1920, 1659, 2852, 2625}, /* CAA */ { 7019, 1484, 5156, 1674, 1277, 631, 345, 743, 2600, 1079, 1629, 858, 288, 264, 927, 253, 59834, 5592, 80759, 5231, 4070, 1546, 1134, 1926, 2527, 1646, 2395, 1298, 920, 757, 2285, 834, 6770, 1219, 5911, 1431, 1215, 741, 369, 798, 892, 441, 544, 311, 373, 281, 659, 354, 0, 567, 0, 580, 1137, 755, 264, 917, 0, 304, 312, 320, 464, 237, 669, 285}, /* CAC */ { 1702, 5608, 1891, 3512, 520, 812, 184, 527, 1083, 2354, 1036, 1161, 130, 333, 401, 255, 5592, 99998, 12592, 54421, 1206, 1967, 553, 1426, 1135, 4128, 1592, 1884, 305, 1413, 1218, 901, 1120, 2009, 1556, 1296, 426, 753, 185, 486, 368, 827, 318, 334, 139, 325, 373, 210, 0, 6304, 0, 3889, 519, 988, 190, 845, 0, 993, 208, 720, 217, 1193, 348, 849}, /* CAG */ { 10180, 3282, 15687, 3272, 2129, 1389, 912, 1270, 3685, 2379, 4284, 1612, 452, 536, 2499, 549, 80759, 12592, 295886, 10607, 5905, 3254, 2960, 3395, 3393, 3962, 8200, 2514, 1401, 1989, 7001, 1805, 10336, 2693, 18331, 2695, 1920, 1873, 919, 1588, 1203, 1053, 1424, 675, 568, 600, 1676, 652, 0, 1225, 0, 1035, 1827, 1581, 678, 1639, 0, 615, 1190, 619, 885, 482, 1801, 496}, /* CAT */ { 1597, 3172, 1526, 4474, 547, 497, 160, 660, 966, 1395, 760, 1511, 155, 242, 454, 359, 5231, 54421, 10607, 65564, 1113, 1200, 454, 1972, 954, 2015, 1215, 2269, 270, 910, 1126, 1287, 1256, 1251, 1428, 1844, 448, 393, 166, 580, 367, 494, 278, 508, 127, 205, 338, 261, 0, 3419, 0, 4605, 492, 614, 151, 1025, 0, 617, 251, 1012, 236, 684, 352, 954}, /* CCA */ { 1178, 567, 945, 686, 3628, 1543, 704, 1826, 402, 793, 336, 674, 462, 339, 907, 406, 4070, 1206, 5905, 1113, 96410, 44179, 22483, 58526, 465, 384, 509, 283, 1421, 934, 3255, 1141, 1194, 414, 1309, 514, 4498, 2561, 1031, 2962, 672, 545, 470, 381, 683, 519, 1253, 566, 0, 139, 0, 152, 6667, 3396, 1365, 4480, 0, 155, 137, 230, 741, 245, 1030, 300}, /* CCC */ { 530, 801, 524, 603, 1634, 2597, 598, 1601, 236, 1227, 210, 552, 224, 490, 528, 331, 1546, 1967, 3254, 1200, 44179, 76760, 17395, 50743, 238, 651, 340, 293, 494, 1849, 1646, 1030, 599, 572, 843, 409, 2153, 4232, 902, 2634, 412, 684, 384, 325, 307, 695, 761, 451, 0, 298, 0, 175, 2904, 6600, 1004, 4368, 0, 290, 55, 215, 292, 503, 488, 308}, /* CCG */ { 358, 232, 430, 232, 854, 560, 346, 581, 173, 331, 181, 217, 121, 134, 366, 111, 1134, 553, 2960, 454, 22483, 17395, 12070, 18860, 162, 224, 371, 130, 347, 378, 1695, 416, 351, 191, 513, 189, 1157, 1109, 688, 970, 223, 262, 242, 131, 158, 202, 498, 201, 0, 75, 0, 61, 1485, 1287, 642, 1472, 0, 85, 94, 97, 203, 98, 442, 122}, /* CCT */ { 689, 663, 604, 919, 2310, 1934, 620, 2933, 267, 950, 222, 952, 301, 372, 670, 591, 1926, 1426, 3395, 1972, 58526, 50743, 18860, 93780, 273, 413, 312, 541, 576, 1280, 1943, 2255, 744, 431, 873, 596, 3013, 3191, 934, 4810, 493, 553, 374, 541, 358, 492, 914, 888, 0, 219, 0, 305, 3913, 4322, 1159, 8477, 0, 284, 70, 475, 371, 312, 691, 656}, /* CGA */ { 4714, 333, 4333, 372, 276, 187, 114, 152, 15497, 551, 10455, 382, 108, 64, 227, 93, 2527, 1135, 3393, 954, 465, 238, 162, 273, 15332, 12217, 13400, 9093, 207, 219, 485, 175, 414, 115, 402, 127, 189, 148, 65, 109, 488, 210, 307, 184, 73, 57, 129, 57, 0, 107, 0, 145, 177, 137, 56, 136, 0, 275, 379, 268, 78, 49, 133, 60}, /* CGC */ { 4108, 942, 5231, 550, 300, 386, 151, 270, 11485, 1609, 11329, 611, 87, 141, 290, 95, 1646, 4128, 3962, 2015, 384, 651, 224, 413, 12217, 40282, 16178, 15122, 124, 750, 658, 364, 303, 314, 554, 219, 174, 409, 112, 191, 316, 804, 303, 252, 53, 146, 202, 100, 0, 439, 0, 280, 153, 397, 85, 239, 0, 1232, 354, 522, 95, 198, 143, 87}, /* CGG */ { 5511, 541, 7539, 502, 383, 266, 190, 225, 15504, 766, 16771, 555, 109, 105, 444, 99, 2395, 1592, 8200, 1215, 509, 340, 371, 312, 13400, 16178, 26516, 10668, 210, 321, 1172, 299, 440, 197, 905, 200, 275, 238, 165, 238, 423, 398, 697, 190, 85, 116, 306, 99, 0, 149, 0, 141, 215, 238, 114, 214, 0, 409, 1065, 335, 127, 84, 285, 88}, /* CGT */ { 2693, 433, 2968, 610, 229, 188, 75, 260, 8562, 673, 6741, 753, 75, 83, 189, 102, 1298, 1884, 2514, 2269, 283, 293, 130, 541, 9093, 15122, 10668, 14346, 98, 283, 392, 397, 288, 147, 313, 212, 164, 190, 63, 213, 260, 300, 190, 319, 48, 76, 133, 100, 0, 234, 0, 349, 143, 191, 50, 311, 0, 474, 218, 748, 73, 96, 132, 138}, /* CTA */ { 363, 126, 281, 141, 569, 245, 154, 267, 191, 130, 133, 105, 2138, 1923, 3664, 1876, 920, 305, 1401, 270, 1421, 494, 347, 576, 207, 124, 210, 98, 14232, 14222, 38640, 13070, 259, 60, 252, 71, 563, 319, 142, 352, 134, 72, 87, 59, 1125, 911, 2357, 902, 0, 183, 0, 206, 621, 278, 149, 336, 0, 84, 184, 129, 10238, 1246, 12502, 1349}, /* CTC */ { 366, 446, 370, 272, 568, 898, 256, 532, 208, 512, 219, 230, 2486, 8057, 5569, 4633, 757, 1413, 1989, 910, 934, 1849, 378, 1280, 219, 750, 321, 283, 14222, 77258, 64691, 37051, 269, 220, 393, 139, 561, 1136, 264, 687, 167, 268, 148, 108, 1260, 4169, 3973, 2309, 0, 840, 0, 512, 440, 859, 144, 563, 0, 540, 235, 339, 10893, 6333, 18007, 3656}, /* CTG */ { 884, 471, 1292, 467, 1783, 1240, 787, 1170, 573, 632, 629, 416, 5627, 8591, 20280, 7578, 2285, 1218, 7001, 1126, 3255, 1646, 1695, 1943, 485, 658, 1172, 392, 38640, 64691, 236480, 56553, 712, 261, 1205, 260, 1777, 1525, 899, 1501, 353, 352, 498, 231, 3427, 4149, 13822, 4091, 0, 819, 0, 839, 1564, 1083, 619, 1233, 0, 439, 1055, 504, 30609, 5669, 65624, 6145}, /* CTT */ { 398, 263, 364, 459, 601, 508, 244, 758, 224, 353, 169, 332, 2186, 4462, 4883, 5795, 834, 901, 1805, 1287, 1141, 1030, 416, 2255, 175, 364, 299, 397, 13070, 37051, 56553, 56660, 285, 167, 347, 221, 646, 712, 242, 979, 172, 186, 138, 169, 1209, 2079, 3407, 3272, 0, 520, 0, 649, 516, 495, 145, 928, 0, 316, 225, 438, 10125, 3229, 17355, 5183}, /* GAA */ { 8048, 2525, 5401, 3072, 1993, 960, 495, 1033, 1648, 1815, 933, 1539, 523, 335, 875, 370, 6770, 1120, 10336, 1256, 1194, 599, 351, 744, 414, 303, 440, 288, 259, 269, 712, 285, 206378, 24935, 169440, 28911, 4616, 2196, 1162, 2462, 5785, 1997, 2920, 1599, 1082, 645, 1894, 798, 0, 234, 0, 280, 1146, 745, 302, 981, 0, 137, 126, 187, 275, 130, 423, 180}, /* GAC */ { 1636, 10085, 1605, 6326, 673, 1167, 296, 889, 399, 3818, 322, 2036, 109, 304, 299, 209, 1219, 2009, 2693, 1251, 414, 572, 191, 431, 115, 314, 197, 147, 60, 220, 261, 167, 24935, 179326, 32603, 109888, 1258, 2641, 589, 1529, 2157, 4864, 1555, 2004, 237, 701, 547, 465, 0, 608, 0, 406, 570, 971, 216, 755, 0, 263, 45, 221, 87, 175, 116, 114}, /* GAG */ { 5765, 3034, 9352, 3252, 1966, 1203, 748, 1236, 1144, 2273, 1571, 1794, 445, 395, 1505, 482, 5911, 1556, 18331, 1428, 1309, 843, 513, 873, 402, 554, 905, 313, 252, 393, 1205, 347, 169440, 32603, 306080, 32897, 4089, 2938, 2087, 2798, 3762, 2387, 4836, 1598, 976, 823, 3117, 849, 0, 290, 0, 328, 1111, 994, 398, 1118, 0, 214, 240, 202, 279, 167, 507, 182}, /* GAT */ { 1836, 6768, 1732, 10678, 983, 802, 285, 1262, 455, 2615, 353, 3160, 168, 174, 324, 404, 1431, 1296, 2695, 1844, 514, 409, 189, 596, 127, 219, 200, 212, 71, 139, 260, 221, 28911, 109888, 32897, 182052, 1528, 1598, 520, 2698, 2256, 2810, 1611, 3626, 295, 412, 647, 868, 0, 379, 0, 709, 680, 670, 209, 1178, 0, 186, 54, 334, 86, 98, 156, 206}, /* GCA */ { 1979, 1131, 1492, 1354, 9590, 4073, 2165, 4658, 712, 2075, 516, 1780, 1415, 1009, 2756, 1252, 1215, 426, 1920, 448, 4498, 2153, 1157, 3013, 189, 174, 275, 164, 563, 561, 1777, 646, 4616, 1258, 4089, 1528, 71446, 42285, 15562, 48702, 4400, 2448, 2627, 1992, 3498, 2533, 7151, 3146, 0, 171, 0, 203, 7531, 4721, 1802, 6059, 0, 480, 146, 584, 774, 328, 948, 484}, /* GCC */ { 1019, 2290, 1138, 1451, 5142, 10379, 1989, 5276, 450, 4419, 430, 2003, 765, 2396, 2046, 1323, 741, 753, 1873, 393, 2561, 4232, 1109, 3191, 148, 409, 238, 190, 319, 1136, 1525, 712, 2196, 2641, 2938, 1598, 42285, 109722, 16086, 57041, 2770, 5134, 2293, 2068, 1890, 6486, 5502, 3573, 0, 351, 0, 255, 4869, 9986, 1800, 6924, 0, 1065, 147, 697, 421, 792, 700, 509}, /* GCG */ { 494, 489, 599, 432, 2048, 1650, 1035, 1507, 198, 844, 228, 574, 360, 467, 1059, 412, 369, 185, 919, 166, 1031, 902, 688, 934, 65, 112, 165, 63, 142, 264, 899, 242, 1162, 589, 2087, 520, 15562, 16086, 9674, 15220, 1135, 1096, 1337, 655, 841, 1072, 3562, 1117, 0, 103, 0, 73, 1755, 1851, 959, 2061, 0, 229, 72, 215, 164, 169, 447, 166}, /* GCT */ { 1188, 1536, 1064, 2016, 5778, 5231, 1818, 8364, 463, 2847, 344, 2839, 898, 1451, 2026, 2096, 798, 486, 1588, 580, 2962, 2634, 970, 4810, 109, 191, 238, 213, 352, 687, 1501, 979, 2462, 1529, 2798, 2698, 48702, 57041, 15220, 87906, 2987, 2955, 2050, 3068, 2160, 3308, 5558, 5958, 0, 277, 0, 360, 5339, 5936, 1674, 9628, 0, 697, 92, 943, 476, 502, 774, 796}, /* GGA */ { 1686, 1728, 1215, 1747, 1320, 772, 372, 812, 2666, 4073, 1226, 3018, 314, 213, 618, 269, 892, 368, 1203, 367, 672, 412, 223, 493, 488, 316, 423, 260, 134, 167, 353, 172, 5785, 2157, 3762, 2256, 4400, 2770, 1135, 2987, 107006, 49273, 52181, 36526, 861, 482, 1260, 577, 0, 73, 0, 116, 1063, 876, 290, 1030, 0, 415, 258, 494, 149, 113, 209, 168}, /* GGC */ { 799, 4186, 885, 2537, 923, 1616, 384, 1026, 844, 11112, 734, 4831, 178, 488, 451, 309, 441, 827, 1053, 494, 545, 684, 262, 553, 210, 804, 398, 300, 72, 268, 352, 186, 1997, 4864, 2387, 2810, 2448, 5134, 1096, 2955, 49273, 97496, 36303, 38034, 351, 1106, 910, 657, 0, 251, 0, 162, 815, 1434, 371, 1134, 0, 1362, 192, 854, 90, 226, 180, 184}, /* GGG */ { 938, 1544, 1386, 1269, 844, 606, 366, 650, 1151, 3476, 2008, 2430, 184, 230, 736, 196, 544, 318, 1424, 278, 470, 384, 242, 374, 307, 303, 697, 190, 87, 148, 498, 138, 2920, 1555, 4836, 1611, 2627, 2293, 1337, 2050, 52181, 36303, 57212, 24188, 525, 471, 1707, 502, 0, 81, 0, 97, 758, 688, 323, 720, 0, 381, 636, 380, 116, 105, 291, 110}, /* GGT */ { 654, 1992, 628, 2667, 726, 756, 238, 1014, 598, 4472, 458, 5223, 119, 194, 327, 306, 311, 334, 675, 508, 381, 325, 131, 541, 184, 252, 190, 319, 59, 108, 231, 169, 1599, 2004, 1598, 3626, 1992, 2068, 655, 3068, 36526, 38034, 24188, 45116, 310, 419, 741, 822, 0, 105, 0, 183, 609, 662, 219, 1035, 0, 566, 110, 917, 83, 95, 117, 171}, /* GTA */ { 591, 242, 432, 267, 2163, 982, 558, 1034, 307, 387, 173, 288, 5479, 4773, 3533, 4791, 373, 139, 568, 127, 683, 307, 158, 358, 73, 53, 85, 48, 1125, 1260, 3427, 1209, 1082, 237, 976, 295, 3498, 1890, 841, 2160, 861, 351, 525, 310, 18232, 13787, 36807, 15198, 0, 85, 0, 96, 728, 403, 163, 534, 0, 131, 64, 157, 1632, 440, 1673, 557}, /* GTC */ { 364, 636, 370, 424, 1576, 2920, 623, 1677, 176, 1015, 190, 490, 4631, 18081, 3494, 9574, 281, 325, 600, 205, 519, 695, 202, 492, 57, 146, 116, 76, 911, 4169, 4149, 2079, 645, 701, 823, 412, 2533, 6486, 1072, 3308, 482, 1106, 471, 419, 13787, 54440, 44453, 27897, 0, 332, 0, 208, 589, 988, 175, 715, 0, 377, 77, 272, 972, 1927, 1567, 1115}, /* GTG */ { 976, 590, 1230, 602, 4056, 2745, 1746, 2847, 428, 920, 577, 717, 9780, 15695, 14027, 13874, 659, 373, 1676, 338, 1253, 761, 498, 914, 129, 202, 306, 133, 2357, 3973, 13822, 3407, 1894, 547, 3117, 647, 7151, 5502, 3562, 5558, 1260, 910, 1707, 741, 36807, 44453, 162686, 46454, 0, 303, 0, 278, 1444, 1089, 539, 1354, 0, 391, 294, 478, 2622, 1431, 5885, 1606}, /* GTT */ { 468, 466, 389, 625, 1895, 1760, 707, 2631, 207, 802, 164, 807, 4914, 10382, 3874, 15034, 354, 210, 652, 261, 566, 451, 201, 888, 57, 100, 99, 100, 902, 2309, 4091, 3272, 798, 465, 849, 868, 3146, 3573, 1117, 5958, 577, 657, 502, 822, 15198, 27897, 46454, 48518, 0, 181, 0, 279, 726, 727, 192, 1127, 0, 318, 112, 377, 1102, 1049, 1825, 1770}, /* TAA */ { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, #ifdef INCLUDE_STOP_CODON_FREQUENCIES 2564, 0, 732, 0, 0, 0, 0, 0, 1164, 0, 0, 0, 0, 0, 0, 0}, #else /* INCLUDE_STOP_CODON_FREQUENCIES */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, #endif /* INCLUDE_STOP_CODON_FREQUENCIES */ /* TAC */ { 247, 1308, 273, 841, 156, 298, 78, 232, 145, 637, 123, 353, 109, 402, 332, 262, 567, 6304, 1225, 3419, 139, 298, 75, 219, 107, 439, 149, 234, 183, 840, 819, 520, 234, 608, 290, 379, 171, 351, 103, 277, 73, 251, 81, 105, 85, 332, 303, 181, 0, 138828, 0, 75342, 368, 1128, 149, 794, 0, 2235, 555, 1461, 349, 11866, 429, 8131}, /* TAG */ { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, #ifdef INCLUDE_STOP_CODON_FREQUENCIES 732, 0, 1132, 0, 0, 0, 0, 0, 534, 0, 0, 0, 0, 0, 0, 0}, #else /* INCLUDE_STOP_CODON_FREQUENCIES */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, #endif /* INCLUDE_STOP_CODON_FREQUENCIES */ /* TAT */ { 261, 803, 257, 1251, 206, 241, 91, 311, 141, 450, 139, 544, 134, 282, 338, 369, 580, 3889, 1035, 4605, 152, 175, 61, 305, 145, 280, 141, 349, 206, 512, 839, 649, 280, 406, 328, 709, 203, 255, 73, 360, 116, 162, 97, 183, 96, 208, 278, 279, 0, 75342, 0, 95904, 406, 708, 137, 1247, 0, 1327, 535, 2349, 352, 7329, 417, 10375}, /* TCA */ { 1159, 1077, 919, 1133, 7353, 3687, 1860, 4237, 429, 2489, 265, 2045, 571, 392, 1381, 431, 1137, 519, 1827, 492, 6667, 2904, 1485, 3913, 177, 153, 215, 143, 621, 440, 1564, 516, 1146, 570, 1111, 680, 7531, 4869, 1755, 5339, 1063, 815, 758, 609, 728, 589, 1444, 726, 0, 368, 0, 406, 53222, 30645, 12204, 36223, 0, 868, 374, 1086, 1891, 603, 1941, 840}, /* TCC */ { 733, 2021, 702, 1275, 4581, 7244, 1833, 4379, 331, 4806, 262, 2280, 331, 642, 920, 495, 755, 988, 1581, 614, 3396, 6600, 1287, 4322, 137, 397, 238, 191, 278, 859, 1083, 495, 745, 971, 994, 670, 4721, 9986, 1851, 5936, 876, 1434, 688, 662, 403, 988, 1089, 727, 0, 1128, 0, 708, 30645, 79396, 11832, 45278, 0, 2263, 247, 1381, 655, 1997, 1030, 1070}, /* TCG */ { 292, 360, 298, 342, 1837, 1275, 795, 1355, 121, 1038, 113, 693, 130, 129, 505, 137, 264, 190, 678, 151, 1365, 1004, 642, 1159, 56, 85, 114, 50, 149, 144, 619, 145, 302, 216, 398, 209, 1802, 1800, 959, 1674, 290, 371, 323, 219, 163, 175, 539, 192, 0, 149, 0, 137, 12204, 11832, 6594, 12964, 0, 325, 220, 343, 275, 202, 772, 285}, /* TCT */ { 981, 1682, 830, 1970, 6050, 5121, 1914, 7128, 422, 3883, 300, 3451, 345, 507, 1193, 731, 917, 845, 1639, 1025, 4480, 4368, 1472, 8477, 136, 239, 214, 311, 336, 563, 1233, 928, 981, 755, 1118, 1178, 6059, 6924, 2061, 9628, 1030, 1134, 720, 1035, 534, 715, 1354, 1127, 0, 794, 0, 1247, 36223, 45278, 12964, 78934, 0, 1651, 257, 2645, 849, 1209, 1299, 2209}, /* TGA */ { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, #ifdef INCLUDE_STOP_CODON_FREQUENCIES 1164, 0, 534, 0, 0, 0, 0, 0, 3580, 0, 0, 0, 0, 0, 0, 0}, #else /* INCLUDE_STOP_CODON_FREQUENCIES */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, #endif /* INCLUDE_STOP_CODON_FREQUENCIES */ /* TGC */ { 280, 817, 311, 534, 400, 702, 152, 498, 282, 3222, 282, 1656, 106, 369, 251, 242, 304, 993, 615, 617, 155, 290, 85, 284, 275, 1232, 409, 474, 84, 540, 439, 316, 137, 263, 214, 186, 480, 1065, 229, 697, 415, 1362, 381, 566, 131, 377, 391, 318, 0, 2235, 0, 1327, 868, 2263, 325, 1651, 0, 103986, 771, 57765, 166, 1430, 279, 1086}, /* TGG */ { 264, 100, 441, 107, 152, 97, 75, 74, 472, 187, 928, 154, 76, 99, 429, 105, 312, 208, 1190, 251, 137, 55, 94, 70, 379, 354, 1065, 218, 184, 235, 1055, 225, 126, 45, 240, 54, 146, 147, 72, 92, 258, 192, 636, 110, 64, 77, 294, 112, 0, 555, 0, 535, 374, 247, 220, 257, 0, 771, 189124, 755, 243, 629, 1053, 730}, /* TGT */ { 298, 575, 290, 865, 526, 486, 196, 697, 326, 2186, 256, 2286, 119, 269, 279, 365, 320, 720, 619, 1012, 230, 215, 97, 475, 268, 522, 335, 748, 129, 339, 504, 438, 187, 221, 202, 334, 584, 697, 215, 943, 494, 854, 380, 917, 157, 272, 478, 377, 0, 1461, 0, 2349, 1086, 1381, 343, 2645, 0, 57765, 755, 88088, 203, 944, 334, 1600}, /* TTA */ { 341, 139, 263, 181, 798, 329, 207, 389, 179, 179, 134, 134, 2544, 1904, 3689, 1920, 464, 217, 885, 236, 741, 292, 203, 371, 78, 95, 127, 73, 10238, 10893, 30609, 10125, 275, 87, 279, 86, 774, 421, 164, 476, 149, 90, 116, 83, 1632, 972, 2622, 1102, 0, 349, 0, 352, 1891, 655, 275, 849, 0, 166, 243, 203, 25300, 2266, 18809, 2526}, /* TTC */ { 143, 395, 120, 228, 329, 582, 136, 391, 71, 398, 58, 183, 701, 2777, 1481, 1659, 237, 1193, 482, 684, 245, 503, 98, 312, 49, 198, 84, 96, 1246, 6333, 5669, 3229, 130, 175, 167, 98, 328, 792, 169, 502, 113, 226, 105, 95, 440, 1927, 1431, 1049, 0, 11866, 0, 7329, 603, 1997, 202, 1209, 0, 1430, 629, 944, 2266, 167022, 3062, 90000}, /* TTG */ { 442, 238, 530, 272, 964, 559, 397, 616, 211, 264, 217, 235, 2496, 2626, 8984, 2852, 669, 348, 1801, 352, 1030, 488, 442, 691, 133, 143, 285, 132, 12502, 18007, 65624, 17355, 423, 116, 507, 156, 948, 700, 447, 774, 209, 180, 291, 117, 1673, 1567, 5885, 1825, 0, 429, 0, 417, 1941, 1030, 772, 1299, 0, 279, 1053, 334, 18809, 3062, 44874, 3690}, /* TTT */ { 172, 305, 170, 401, 429, 394, 169, 571, 109, 328, 93, 338, 891, 1832, 1728, 2625, 285, 849, 496, 954, 300, 308, 122, 656, 60, 87, 88, 138, 1349, 3656, 6145, 5183, 180, 114, 182, 206, 484, 509, 166, 796, 168, 184, 110, 171, 557, 1115, 1606, 1770, 0, 8131, 0, 10375, 840, 1070, 285, 2209, 0, 1086, 730, 1600, 2526, 90000, 3690, 156062} }; static gint CodonSubmat_round(gdouble num){ return ( num < 0 ) ? num - 0.5 : num + 0.5; } /* Not perfect, but portable */ CodonSubmat *CodonSubmat_create(){ register CodonSubmat *cs = g_new(CodonSubmat, 1); register guchar *alphabet = (guchar*)"ACGTN"; register gint i, j, a, b, c, freq, m, n, total = 0; register gdouble ratio, logbase = log(10.0), prob, chance; gint codon_freq_acgtn[CODON_ALPHABETSIZE_ACGTN] = {0}; gint red_codon[CODON_ALPHABETSIZE_ACGT][8]; gint count_acgtn[CODON_ALPHABETSIZE_ACGTN] [CODON_ALPHABETSIZE_ACGTN] = {{0}}; cs->ref_count = 1; /* Fill the base index */ for(i = 0; i < ALPHABETSIZE; i++) cs->base_index[i] = 4; for(i = 0; i < 4; i++) cs->base_index[alphabet[i]] = cs->base_index[tolower(alphabet[i])] = i; /* Fill the codon index */ i = 0; for(a = 0; a < 5; a++) for(b = 0; b < 5; b++) for(c = 0; c < 5; c++) cs->codon_index[a][b][c] = i++; /* Fill the redundant codon table * eg. [AAA] -> {AAN,ANA,NAA,ANN,NAN,NNA,NNN} */ i = 0; for(a = 0; a < 4; a++) for(b = 0; b < 4; b++) for(c = 0; c < 4; c++){ red_codon[i][0] = cs->codon_index[a][b][c]; red_codon[i][1] = cs->codon_index[a][b][4]; red_codon[i][2] = cs->codon_index[a][4][c]; red_codon[i][3] = cs->codon_index[4][b][c]; red_codon[i][4] = cs->codon_index[a][4][4]; red_codon[i][5] = cs->codon_index[4][b][4]; red_codon[i][6] = cs->codon_index[4][4][c]; red_codon[i][7] = cs->codon_index[4][4][4]; i++; } /* Calculate codon frequencies */ for(i = 0; i < CODON_ALPHABETSIZE_ACGT; i++){ for(j = 0; j < CODON_ALPHABETSIZE_ACGT; j++){ codon_freq_acgtn[red_codon[i][0]] += local_codon_substitution_count[i][j]; total += local_codon_substitution_count[i][j]; } } /* Calculate redundant codon frequencies */ for(i = 0; i < CODON_ALPHABETSIZE_ACGT; i++) for(j = 1; j < 8; j++) codon_freq_acgtn[red_codon[i][j]] += codon_freq_acgtn[red_codon[i][0]]; /* Fill the combined count matrix including redundancies */ for(i = 0; i < CODON_ALPHABETSIZE_ACGT; i++){ for(j = 0; j < CODON_ALPHABETSIZE_ACGT; j++){ freq = local_codon_substitution_count[i][j]; for(m = 0; m < 8; m++) for(n = 0; n < 8; n++) count_acgtn[red_codon[i][m]] [red_codon[j][n]] += freq; } } /* Calculate log odds scores */ for(i = 0; i < CODON_ALPHABETSIZE_ACGTN; i++){ for(j = 0; j < CODON_ALPHABETSIZE_ACGTN; j++){ freq = count_acgtn[i][j]; if(freq){ prob = (gdouble)(freq << 1) / ((gdouble) codon_freq_acgtn[i] + codon_freq_acgtn[j]); chance = ((gdouble) codon_freq_acgtn[i] + codon_freq_acgtn[j]) / (gdouble)(total << 1); ratio = prob / chance; cs->codon_submat[i][j] = CodonSubmat_round(10.0 * (log(ratio)/logbase)); } else { cs->codon_submat[i][j] = -50; } } } return cs; } void CodonSubmat_destroy(CodonSubmat *cs){ if(--cs->ref_count) return; g_free(cs); return; } CodonSubmat *CodonSubmat_share(CodonSubmat *cs){ cs->ref_count++; return cs; } gint CodonSubmat_max_score(CodonSubmat *cs){ register gint i, j, max; g_assert(cs); max = cs->codon_submat[0][0]; for(i = 0; i < CODON_ALPHABETSIZE_ACGTN; i++) for(j = 0; j < CODON_ALPHABETSIZE_ACGTN; j++) if(max < cs->codon_submat[i][j]) max = cs->codon_submat[i][j]; return max; } void CodonSubmat_add_nucleic(CodonSubmat *cs, Submat *nucleic){ register gint a, b, c, d, e, f, codon_a, codon_b; register gchar *alphabet = "ACGTN"; register gint score; for(a = 0; a < 5; a++){ for(b = 0; b < 5; b++){ for(c = 0; c < 5; c++){ for(d = 0; d < 5; d++){ for(e = 0; e < 5; e++){ for(f = 0; f < 5; f++){ score = Submat_lookup(nucleic, alphabet[a], alphabet[d]) + Submat_lookup(nucleic, alphabet[b], alphabet[e]) + Submat_lookup(nucleic, alphabet[c], alphabet[f]); codon_a = cs->codon_index[a][b][c]; codon_b = cs->codon_index[d][e][f]; cs->codon_submat[codon_a][codon_b] /= 2; cs->codon_submat[codon_a][codon_b] += score; } } } } } } return; } exonerate-2.4.0/src/sequence/sequence.h0000644000175000017500000001175511163171272015022 00000000000000/****************************************************************\ * * * Simple Sequence Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_SEQUENCE_H #define INCLUDED_SEQUENCE_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include #ifdef USE_PTHREADS #include #endif /* USE_PTHREADS */ #include "alphabet.h" #include "translate.h" #include "sparsecache.h" typedef enum { Sequence_Type_INTMEM, Sequence_Type_EXTMEM, Sequence_Type_SUBSEQ, Sequence_Type_REVCOMP, Sequence_Type_FILTER, Sequence_Type_TRANSLATE } Sequence_Type; typedef enum { Sequence_Strand_FORWARD, Sequence_Strand_REVCOMP, Sequence_Strand_UNKNOWN } Sequence_Strand; typedef struct { gchar *id; Sequence_Strand strand; gint cds_start; gint cds_length; } Sequence_Annotation; typedef struct { gchar *annotation_path; GTree *annotation_tree; } Sequence_ArgumentSet; Sequence_ArgumentSet *Sequence_ArgumentSet_create(Argument *arg); /**/ Sequence_Strand Sequence_Strand_revcomp(Sequence_Strand strand); #define Sequence_Strand_get_as_char(strand) \ (((strand) == Sequence_Strand_UNKNOWN)?'.' \ :((strand) == Sequence_Strand_FORWARD)?'+':'-') /* Shows strand of sequence as [-.+] */ #define Sequence_Strand_get_as_string(strand) \ (((strand) == Sequence_Strand_UNKNOWN)?"unknown" \ :((strand) == Sequence_Strand_FORWARD)?"forward":"revcomp") #define Sequence_get_strand_as_char(seq) \ Sequence_Strand_get_as_char((seq)->strand) typedef struct { guint ref_count; gchar *id; gchar *def; guint len; Sequence_Strand strand; Alphabet *alphabet; Sequence_Annotation *annotation; gpointer data; Sequence_Type type; gint (*get_symbol)(gpointer data, gint pos); #ifdef USE_PTHREADS pthread_mutex_t seq_lock; #endif /* USE_PTHREADS */ } Sequence; #define Sequence_get_symbol(sequence, pos) \ ((sequence)->get_symbol((sequence)->data, pos)) #define Sequence_get_symbol_TEMP(sequence, pos) \ ((sequence)->get_symbol((sequence)->data, pos)) Sequence *Sequence_create(gchar *id, gchar *def, gchar *seq, guint len, Sequence_Strand strand, Alphabet *alphabet); /* Any of the Sequence_create arguments, id, def and seq * may be NULL, and len may also be zero. * * If len is non-zero, the seq argument need not be NULL terminated. * (However it will be NULL terminated in the Sequence object). * * If alphabet is NULL, (UNKNOWN|UNMASKED) is assumed. */ Sequence *Sequence_create_extmem(gchar *id, gchar *def, guint len, Sequence_Strand strand, Alphabet *alphabet, SparseCache *cache); void Sequence_preload_extmem(Sequence *s); void Sequence_destroy(Sequence *s); Sequence *Sequence_share(Sequence *s); Sequence *Sequence_subseq(Sequence *s, guint start, guint length); void Sequence_print_fasta(Sequence *s, FILE *fp, gboolean show_info); gint Sequence_print_fasta_block(Sequence *s, FILE *fp); void Sequence_revcomp_in_place(gchar *seq, guint length); Sequence *Sequence_revcomp(Sequence *s); void Sequence_reverse_in_place(gchar *seq, guint length); gint Sequence_checksum(Sequence *s); void Sequence_strncpy(Sequence *s, gint start, gint length, gchar *dst); void Sequence_strcpy(Sequence *s, gchar *dst); gchar *Sequence_get_substr(Sequence *s, gint start, gint length); gchar *Sequence_get_str(Sequence *s); gsize Sequence_memory_usage(Sequence *s); /**/ void Sequence_filter_in_place(gchar *seq, guint length, Alphabet *alphabet, Alphabet_Filter_Type filter_type); Sequence *Sequence_filter(Sequence *s, Alphabet_Filter_Type filter_type); Sequence *Sequence_translate(Sequence *s, Translate *translate, gint frame); Sequence *Sequence_mask(Sequence *s); void Sequence_lock(Sequence *s); void Sequence_unlock(Sequence *s); /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_SEQUENCE_H */ exonerate-2.4.0/src/sequence/submat.h0000644000175000017500000000370711162714341014502 00000000000000/****************************************************************\ * * * Substitution Matrix Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_SUBMAT_H #define INCLUDED_SUBMAT_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include #ifndef ALPHABETSIZE #define ALPHABETSIZE (1<matrix[(submat)->index[(guchar)(symbol_a)]] \ [(submat)->index[(guchar)(symbol_b)]]) #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_SUBMAT_H */ exonerate-2.4.0/src/sequence/translate.h0000644000175000017500000000624611162714341015205 00000000000000/****************************************************************\ * * * Nucleotide Translation Code * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_TRANSLATE_H #define INCLUDED_TRANSLATE_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ /**/ #include #include "argument.h" typedef struct { gchar *genetic_code; } Translate_ArgumentSet; Translate_ArgumentSet *Translate_ArgumentSet_create(Argument *arg); #define Translate_AA_SET_SIZE 40 #define Translate_NT_SET_SIZE (1<<4) #define Translate_PIMA_SET_SIZE 18 #define Translate_ALPHABET_SIZE (1<<8) #define Translate_TRANSLATION_SIZE 4096 typedef struct { guint ref_count; guchar *nt; guchar *aa; guchar *code; guchar nt2d[Translate_ALPHABET_SIZE]; guchar aa2d[Translate_ALPHABET_SIZE]; gint aamask[Translate_AA_SET_SIZE]; guchar trans[Translate_TRANSLATION_SIZE]; GPtrArray *revtrans[Translate_ALPHABET_SIZE]; } Translate; Translate *Translate_create(gboolean use_pima); Translate *Translate_share(Translate *t); void Translate_destroy(Translate *t); gint Translate_sequence(Translate *t, gchar *dna, gint dna_length, gint frame, gchar *aaseq, guchar *filter); /* Sequence_translate frame is 1, 2, 3 for forward, * or -1,-2,-3 for reverse. * Sufficient space for translated sequence must provided in aaseq * Translated seq is NULL terminated. * Length of translated sequence is returned */ typedef void (*Translate_reverse_func)(gchar *dna, gint length, gpointer user_data); void Translate_reverse(Translate *t, gchar *aaseq, gint length, Translate_reverse_func trf, gpointer user_data); #define Translate_codon(t, codon) \ ((gchar)((t)->aa[(t)->trans[((t)->nt2d[(guchar)(codon)[0]]) \ |((t)->nt2d[(guchar)(codon)[1]]<<4) \ |((t)->nt2d[(guchar)(codon)[2]]<<8)]])) #define Translate_base(t, a, b, c) \ ((t)->aa[(t)->trans[((t)->nt2d[(guchar)(a)]) \ |((t)->nt2d[(guchar)(b)]<<4) \ |((t)->nt2d[(guchar)(c)]<<8)]]) /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_TRANSLATE_H */ exonerate-2.4.0/src/sequence/splice.h0000644000175000017500000001173011162714340014460 00000000000000/****************************************************************\ * * * Library for Splice site prediction. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_SPLICE_H #define INCLUDED_SPLICE_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include #include #include "sparsecache.h" #include "argument.h" #ifndef ALPHABETSIZE #define ALPHABETSIZE (1<<(CHAR_BIT)) #endif /* ALPHABETSIZE */ typedef struct { gchar *splice3_data_path; gchar *splice5_data_path; gboolean force_gtag; } Splice_ArgumentSet; /* When gtag_only is TRUE, only standard GT...AG splice sites * will be assigned a good prediction score. */ Splice_ArgumentSet *Splice_ArgumentSet_create(Argument *arg); /**/ typedef enum { SpliceType_ss5_forward, SpliceType_ss3_forward, SpliceType_ss5_reverse, SpliceType_ss3_reverse } SpliceType; typedef struct { gchar expect_one; gchar expect_two; } SplicePredictor_GTAGonly; typedef struct { gint ref_count; SpliceType type; gint model_length; gint model_splice_after; gfloat **model_data; guchar index[ALPHABETSIZE]; SplicePredictor_GTAGonly *gtag_only; } SplicePredictor; SplicePredictor *SplicePredictor_create(SpliceType type); void SplicePredictor_destroy(SplicePredictor *sp); SplicePredictor *SplicePredictor_share(SplicePredictor *sp); gint SplicePredictor_predict(SplicePredictor *sp, gchar *seq, gint seq_len, guint start, guint length, gfloat *score); /* Returns the position, and fills the score when provided */ void SplicePredictor_predict_array(SplicePredictor *sp, gchar *seq, guint seq_len, guint start, guint length, gfloat *pred); /* Prediction score will be written to * which must contain space for predictions. */ void SplicePredictor_predict_array_int(SplicePredictor *sp, gchar *seq, guint seq_len, guint start, guint length, gint *pred); /* As SplicePredictor_predict_array(), * but results are cast as (int)(score*factor) */ gfloat SplicePredictor_get_max_score(SplicePredictor *sp); /* Returns maximum predicton score for any sequence */ typedef struct { SplicePredictor *ss5_forward; SplicePredictor *ss5_reverse; SplicePredictor *ss3_forward; SplicePredictor *ss3_reverse; } SplicePredictorSet; SplicePredictorSet *SplicePredictorSet_create(void); void SplicePredictorSet_destroy(SplicePredictorSet *sps); /**/ typedef gchar *(*SplicePrediction_get_seq_func) (gint start, gint len, gpointer user_data); typedef struct { SplicePredictor *sp; gint *single; gint length; SparseCache *cache; SplicePrediction_get_seq_func get_seq_func; gpointer user_data; } SplicePrediction; SplicePrediction *SplicePrediction_create(SplicePredictor *sp, gint len, SplicePrediction_get_seq_func get_seq_func, gboolean use_single, gpointer user_data); void SplicePrediction_destroy(SplicePrediction *spn); gint SplicePrediction_get(SplicePrediction *spn, gint pos); typedef struct { SplicePrediction *ss5_forward; SplicePrediction *ss5_reverse; SplicePrediction *ss3_forward; SplicePrediction *ss3_reverse; } SplicePrediction_Set; SplicePrediction_Set *SplicePrediction_Set_add(SplicePrediction_Set *spns, SplicePredictorSet *sps, SpliceType type, gint len, SplicePrediction_get_seq_func get_seq_func, gboolean use_single, gpointer user_data); void SplicePrediction_Set_destroy(SplicePrediction_Set *sps); /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_SPLICE_H */ exonerate-2.4.0/src/sequence/alphabet.h0000644000175000017500000000750611176125722014774 00000000000000/****************************************************************\ * * * Simple Alphabet Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_ALPHABET_H #define INCLUDED_ALPHABET_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include #include "argument.h" #ifndef ALPHABET_SIZE #define ALPHABET_SIZE (CHAR_BIT << 8) #endif /* ALPHABET_SIZE */ typedef struct { gboolean use_aa_tla; } Alphabet_ArgumentSet; Alphabet_ArgumentSet *Alphabet_ArgumentSet_create(Argument *arg); /**/ typedef enum { Alphabet_Type_DNA, Alphabet_Type_PROTEIN, Alphabet_Type_UNKNOWN } Alphabet_Type; Alphabet_Type Alphabet_Type_guess(gchar *str); gchar *Alphabet_Argument_parse_alphabet_type(gchar *arg_string, gpointer data); typedef struct { gint ref_count; gboolean is_soft_masked; guchar *member; Alphabet_Type type; guchar *is_member; guchar *unmasked; guchar *masked; guchar *is_valid; guchar *complement; guchar *clean; guchar *non_ambig; } Alphabet; Alphabet *Alphabet_create(Alphabet_Type type, gboolean is_soft_masked); void Alphabet_destroy(Alphabet *alphabet); Alphabet *Alphabet_share(Alphabet *alphabet); /**/ typedef enum { Alphabet_Filter_Type_UNMASKED, Alphabet_Filter_Type_MASKED, Alphabet_Filter_Type_IS_VALID, Alphabet_Filter_Type_COMPLEMENT, Alphabet_Filter_Type_CLEAN, Alphabet_Filter_Type_CLEAN_ACGTN, Alphabet_Filter_Type_NON_AMBIG, Alphabet_Filter_Type_NUMBER_OF_TYPES } Alphabet_Filter_Type; guchar *Alphabet_get_filter_by_type(Alphabet *alphabet, Alphabet_Filter_Type filter_type); gchar *Alphabet_Filter_Type_get_name(Alphabet_Filter_Type filter_type); #define Alphabet_is_masked(alphabet, symbol) \ ((alphabet)->unmasked[(symbol)] != alphabet->masked[(symbol)]) #define Alphabet_get_unmasked(alphabet, symbol) \ ((alphabet)->unmasked[(symbol)_]) #define Alphabet_get_masked(alphabet, symbol) \ ((alphabet)->masked[(symbol)]) #define Alphabet_symbol_is_valid(alphabet, symbol) \ ((alphabet)->is_valid[(symbol)] != '-') #define Alphabet_symbol_is_member(alphabet, symbol) \ ((alphabet)->non_ambig[(symbol)] != '-') #define Alphabet_symbol_complement(alphabet, symbol) \ ((alphabet)->complement[(symbol)]) #define Alphabet_symbol_clean(alphabet, symbol) \ ((alphabet)->clean[(symbol)]) #define Alphabet_is_DNA(alphabet) \ ((alphabet)->type == Alphabet_Type_DNA) /**/ gchar *Alphabet_Type_get_name(Alphabet_Type type); Alphabet_Type Alphabet_name_get_type(gchar *name); gchar *Alphabet_aa2tla(gchar aa); /* Give three-letter-abbreviation for amino acid. * eg 'M' -> "Met" etc */ gchar *Alphabet_nt2ambig(gchar nt); /* Returns the ambiguity bases for the given nucleotide * The returned string does not need to be freed. */ /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_ALPHABET_H */ exonerate-2.4.0/src/sequence/codonsubmat.h0000644000175000017500000000700411162714331015516 00000000000000/****************************************************************\ * * * Codon Substitution Matrix Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_CODONSUBMAT_H #define INCLUDED_CODONSUBMAT_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include #include #include #ifndef ALPHABETSIZE #define ALPHABETSIZE (1 << CHAR_BIT) #endif /* ALPHABETSIZE */ #define CODON_ALPHABETSIZE_ACGT (4*4*4) #define CODON_ALPHABETSIZE_ACGTN (5*5*5) typedef struct { gint ref_count; guchar base_index[ALPHABETSIZE]; gint codon_index[5][5][5]; gint codon_submat[CODON_ALPHABETSIZE_ACGTN] [CODON_ALPHABETSIZE_ACGTN]; } CodonSubmat; CodonSubmat *CodonSubmat_create(void); void CodonSubmat_destroy(CodonSubmat *cs); CodonSubmat *CodonSubmat_share(CodonSubmat *cs); gint CodonSubmat_max_score(CodonSubmat *cs); void CodonSubmat_add_nucleic(CodonSubmat *cs, Submat *nucleic); #define CodonSubmat_get_base(codonsubmat, base) \ ((codonsubmat)->base_index[(base)]) #define CodonSubmat_get_codon(codonsubmat, codon) \ ((codonsubmat)->codon_index \ [(CodonSubmat_get_base(codonsubmat, codon[0]))] \ [(CodonSubmat_get_base(codonsubmat, codon[1]))] \ [(CodonSubmat_get_base(codonsubmat, codon[2]))] ) #define CodonSubmat_get_codon_base(codonsubmat, base_a, base_b, base_c) \ ((codonsubmat)->codon_index \ [(CodonSubmat_get_base(codonsubmat, base_a))] \ [(CodonSubmat_get_base(codonsubmat, base_b))] \ [(CodonSubmat_get_base(codonsubmat, base_c))] ) #define CodonSubmat_lookup_codon(codonsubmat, codon_a, codon_b) \ ((codonsubmat)->codon_submat[(codon_a)][(codon_b)]) #define CodonSubmat_lookup(codonsubmat, codon_a, codon_b) \ ((codonsubmat)->codon_submat \ [(CodonSubmat_get_codon((codonsubmat), (codon_a)))] \ [(CodonSubmat_get_codon((codonsubmat), (codon_b)))] ) #define CodonSubmat_lookup_base(codonsubmat, base_a1, base_a2, base_a3, \ base_b1, base_b2, base_b3) \ ((codonsubmat)->codon_submat \ [(CodonSubmat_get_codon_base((codonsubmat), \ (base_a1), (base_a2), (base_a3)))] \ [(CodonSubmat_get_codon_base((codonsubmat), \ (base_b1), (base_b2), (base_b3)))]) #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_SUBMAT_H */ exonerate-2.4.0/src/comparison/0000777000175000017500000000000011222703703013452 500000000000000exonerate-2.4.0/src/comparison/Makefile.am0000644000175000017500000000614511222226024015424 00000000000000 TESTS = wordhood.test hspset.test pcr.test \ comparison.test match.test seeder.test noinst_PROGRAMS = $(TESTS) INCLUDES = -I$(top_srcdir)/src/sequence \ -I$(top_srcdir)/src/struct \ -I$(top_srcdir)/src/general noinst_HEADERS = wordhood.h hspset.h pcr.h \ comparison.h match.h seeder.h SEQUENCE_OBJ = $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ -lm wordhood_test_SOURCES = wordhood.test.c wordhood.c wordhood_test_LDADD = $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(SEQUENCE_OBJ) hspset_test_SOURCES = hspset.test.c hspset.c match.c wordhood.c hspset_test_LDADD = $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/general/threadref.o \ $(SEQUENCE_OBJ) pcr_test_SOURCES = pcr.test.c pcr.c wordhood.c pcr_test_LDADD = $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/struct/fsm.o \ $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(SEQUENCE_OBJ) comparison_test_SOURCES = comparison.test.c comparison.c hspset.c \ match.c wordhood.c comparison_test_LDADD = $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/general/threadref.o \ $(SEQUENCE_OBJ) match_test_SOURCES = match.test.c match.c match_test_LDADD = $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(SEQUENCE_OBJ) seeder_test_SOURCES = seeder.test.c seeder.c wordhood.c hspset.c \ match.c comparison.c seeder_test_LDADD = $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/fsm.o \ $(top_srcdir)/src/struct/vfsm.o \ $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/general/threadref.o \ $(SEQUENCE_OBJ) # Files to clear away MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/src/comparison/Makefile.in0000644000175000017500000003606411222703647015453 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ TESTS = wordhood.test hspset.test pcr.test comparison.test match.test seeder.test noinst_PROGRAMS = $(TESTS) INCLUDES = -I$(top_srcdir)/src/sequence -I$(top_srcdir)/src/struct -I$(top_srcdir)/src/general noinst_HEADERS = wordhood.h hspset.h pcr.h comparison.h match.h seeder.h SEQUENCE_OBJ = $(top_srcdir)/src/struct/sparsecache.o $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o $(top_srcdir)/src/sequence/alphabet.o $(top_srcdir)/src/sequence/splice.o $(top_srcdir)/src/sequence/translate.o $(top_srcdir)/src/general/argument.o $(top_srcdir)/src/general/lineparse.o -lm wordhood_test_SOURCES = wordhood.test.c wordhood.c wordhood_test_LDADD = $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/sequence/codonsubmat.o $(SEQUENCE_OBJ) hspset_test_SOURCES = hspset.test.c hspset.c match.c wordhood.c hspset_test_LDADD = $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/sequence/codonsubmat.o $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/general/threadref.o $(SEQUENCE_OBJ) pcr_test_SOURCES = pcr.test.c pcr.c wordhood.c pcr_test_LDADD = $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/sequence/codonsubmat.o $(top_srcdir)/src/struct/fsm.o $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/recyclebin.o $(SEQUENCE_OBJ) comparison_test_SOURCES = comparison.test.c comparison.c hspset.c match.c wordhood.c comparison_test_LDADD = $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/sequence/codonsubmat.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/general/threadref.o $(SEQUENCE_OBJ) match_test_SOURCES = match.test.c match.c match_test_LDADD = $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/sequence/codonsubmat.o $(SEQUENCE_OBJ) seeder_test_SOURCES = seeder.test.c seeder.c wordhood.c hspset.c match.c comparison.c seeder_test_LDADD = $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/sequence/codonsubmat.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/fsm.o $(top_srcdir)/src/struct/vfsm.o $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/general/threadref.o $(SEQUENCE_OBJ) # Files to clear away MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = PROGRAMS = $(noinst_PROGRAMS) DEFS = @DEFS@ -I. -I$(srcdir) CPPFLAGS = @CPPFLAGS@ LDFLAGS = @LDFLAGS@ LIBS = @LIBS@ wordhood_test_OBJECTS = wordhood.test.o wordhood.o wordhood_test_DEPENDENCIES = $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o wordhood_test_LDFLAGS = hspset_test_OBJECTS = hspset.test.o hspset.o match.o wordhood.o hspset_test_DEPENDENCIES = $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/general/threadref.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o hspset_test_LDFLAGS = pcr_test_OBJECTS = pcr.test.o pcr.o wordhood.o pcr_test_DEPENDENCIES = $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o $(top_srcdir)/src/struct/fsm.o \ $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o pcr_test_LDFLAGS = comparison_test_OBJECTS = comparison.test.o comparison.o hspset.o \ match.o wordhood.o comparison_test_DEPENDENCIES = $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/general/threadref.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o comparison_test_LDFLAGS = match_test_OBJECTS = match.test.o match.o match_test_DEPENDENCIES = $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o match_test_LDFLAGS = seeder_test_OBJECTS = seeder.test.o seeder.o wordhood.o hspset.o \ match.o comparison.o seeder_test_DEPENDENCIES = $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/fsm.o \ $(top_srcdir)/src/struct/vfsm.o $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/general/threadref.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o seeder_test_LDFLAGS = CFLAGS = @CFLAGS@ COMPILE = $(CC) $(DEFS) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) CCLD = $(CC) LINK = $(CCLD) $(AM_CFLAGS) $(CFLAGS) $(LDFLAGS) -o $@ HEADERS = $(noinst_HEADERS) DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best SOURCES = $(wordhood_test_SOURCES) $(hspset_test_SOURCES) $(pcr_test_SOURCES) $(comparison_test_SOURCES) $(match_test_SOURCES) $(seeder_test_SOURCES) OBJECTS = $(wordhood_test_OBJECTS) $(hspset_test_OBJECTS) $(pcr_test_OBJECTS) $(comparison_test_OBJECTS) $(match_test_OBJECTS) $(seeder_test_OBJECTS) all: all-redirect .SUFFIXES: .SUFFIXES: .S .c .o .s $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps src/comparison/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status mostlyclean-noinstPROGRAMS: clean-noinstPROGRAMS: -test -z "$(noinst_PROGRAMS)" || rm -f $(noinst_PROGRAMS) distclean-noinstPROGRAMS: maintainer-clean-noinstPROGRAMS: .c.o: $(COMPILE) -c $< .s.o: $(COMPILE) -c $< .S.o: $(COMPILE) -c $< mostlyclean-compile: -rm -f *.o core *.core clean-compile: distclean-compile: -rm -f *.tab.c maintainer-clean-compile: wordhood.test: $(wordhood_test_OBJECTS) $(wordhood_test_DEPENDENCIES) @rm -f wordhood.test $(LINK) $(wordhood_test_LDFLAGS) $(wordhood_test_OBJECTS) $(wordhood_test_LDADD) $(LIBS) hspset.test: $(hspset_test_OBJECTS) $(hspset_test_DEPENDENCIES) @rm -f hspset.test $(LINK) $(hspset_test_LDFLAGS) $(hspset_test_OBJECTS) $(hspset_test_LDADD) $(LIBS) pcr.test: $(pcr_test_OBJECTS) $(pcr_test_DEPENDENCIES) @rm -f pcr.test $(LINK) $(pcr_test_LDFLAGS) $(pcr_test_OBJECTS) $(pcr_test_LDADD) $(LIBS) comparison.test: $(comparison_test_OBJECTS) $(comparison_test_DEPENDENCIES) @rm -f comparison.test $(LINK) $(comparison_test_LDFLAGS) $(comparison_test_OBJECTS) $(comparison_test_LDADD) $(LIBS) match.test: $(match_test_OBJECTS) $(match_test_DEPENDENCIES) @rm -f match.test $(LINK) $(match_test_LDFLAGS) $(match_test_OBJECTS) $(match_test_LDADD) $(LIBS) seeder.test: $(seeder_test_OBJECTS) $(seeder_test_DEPENDENCIES) @rm -f seeder.test $(LINK) $(seeder_test_LDFLAGS) $(seeder_test_OBJECTS) $(seeder_test_LDADD) $(LIBS) tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = src/comparison distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done check-TESTS: $(TESTS) @failed=0; all=0; \ srcdir=$(srcdir); export srcdir; \ for tst in $(TESTS); do \ if test -f $$tst; then dir=.; \ else dir="$(srcdir)"; fi; \ if $(TESTS_ENVIRONMENT) $$dir/$$tst; then \ all=`expr $$all + 1`; \ echo "PASS: $$tst"; \ elif test $$? -ne 77; then \ all=`expr $$all + 1`; \ failed=`expr $$failed + 1`; \ echo "FAIL: $$tst"; \ fi; \ done; \ if test "$$failed" -eq 0; then \ banner="All $$all tests passed"; \ else \ banner="$$failed of $$all tests failed"; \ fi; \ dashes=`echo "$$banner" | sed s/./=/g`; \ echo "$$dashes"; \ echo "$$banner"; \ echo "$$dashes"; \ test "$$failed" -eq 0 info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am $(MAKE) $(AM_MAKEFLAGS) check-TESTS check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall: uninstall-am all-am: Makefile $(PROGRAMS) $(HEADERS) all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-noinstPROGRAMS mostlyclean-compile \ mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-noinstPROGRAMS clean-compile clean-tags clean-generic \ mostlyclean-am clean: clean-am distclean-am: distclean-noinstPROGRAMS distclean-compile distclean-tags \ distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-noinstPROGRAMS \ maintainer-clean-compile maintainer-clean-tags \ maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: mostlyclean-noinstPROGRAMS distclean-noinstPROGRAMS \ clean-noinstPROGRAMS maintainer-clean-noinstPROGRAMS \ mostlyclean-compile distclean-compile clean-compile \ maintainer-clean-compile tags mostlyclean-tags distclean-tags \ clean-tags maintainer-clean-tags distdir check-TESTS info-am info \ dvi-am dvi check check-am installcheck-am installcheck install-exec-am \ install-exec install-data-am install-data install-am install \ uninstall-am uninstall all-redirect all-am all installdirs \ mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/src/comparison/wordhood.test.c0000644000175000017500000000446711162714277016363 00000000000000/****************************************************************\ * * * Library for word-neighbourhood generation * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "wordhood.h" #include "argument.h" static gboolean wordhood_test_func(gchar *word, gint score, gpointer user_data){ register gint *count = user_data; g_message("id [%d] word [%s] score [%d]", ++(*count), word, score); return FALSE; } int Argument_main(Argument *arg){ register gchar *seq = "AAACCCGGGTTT"; register Submat *s = Submat_create("nucleic"); register CodonSubmat *cs = CodonSubmat_create(); register WordHood_Alphabet *wha; register WordHood *wh; gint count; /**/ g_message("using nucleic submat"); count = 0; wha = WordHood_Alphabet_create_from_Submat("ACGT", "ACGT", s, FALSE); wh = WordHood_create(wha, 9, TRUE); WordHood_info(wh); WordHood_traverse(wh, wordhood_test_func, seq, strlen(seq), &count); WordHood_destroy(wh); WordHood_Alphabet_destroy(wha); /**/ g_message("using codon submat"); count = 0; wha = WordHood_Alphabet_create_from_CodonSubmat(cs, FALSE); wh = WordHood_create(wha, 9, TRUE); WordHood_info(wh); g_message("begin trav"); WordHood_traverse(wh, wordhood_test_func, seq, strlen(seq), &count); g_message("done trav"); WordHood_destroy(wh); WordHood_Alphabet_destroy(wha); /**/ Submat_destroy(s); CodonSubmat_destroy(cs); return 0; } exonerate-2.4.0/src/comparison/wordhood.c0000644000175000017500000003070211162714277015374 00000000000000/****************************************************************\ * * * Library for word-neighbourhood generation * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strlen() */ #include /* For toupper() */ #include "wordhood.h" /* Uses pruned DFS on the implicit wordhood trie. */ static gint WordHood_Submat_score_func(WordHood_Alphabet *wha, gchar *seq_a, gchar *seq_b){ return Submat_lookup(wha->submat, *seq_a, *seq_b); } static gboolean WordHood_Submat_is_valid_func(WordHood_Alphabet *wha, gchar *seq){ return (wha->input_index[(guchar)*seq] != -1); } static gint WordHood_Submat_index_func(WordHood_Alphabet *wha, gchar *seq){ return wha->output_index[(guchar)*seq]; } static void WordHood_Alphabet_fill_index(gint *index, gchar *alphabet, gboolean case_sensitive_input){ register gint i; for(i = 0; i < ALPHABETSIZE; i++) index[i] = -1; for(i = 0; alphabet[i]; i++){ if(case_sensitive_input){ index[(guchar)alphabet[i]] = i; } else { index[tolower(alphabet[i])] = i; index[toupper(alphabet[i])] = i; } } return; } static WordHood_Alphabet *WordHood_Alphabet_create(gint advance, gchar *input_alphabet, gchar *output_alphabet, gboolean case_sensitive_input){ register WordHood_Alphabet *wha = g_new(WordHood_Alphabet, 1); g_assert(input_alphabet); wha->ref_count = 1; wha->advance = advance; wha->member_list = g_ptr_array_new(); WordHood_Alphabet_fill_index(wha->input_index, input_alphabet, case_sensitive_input); WordHood_Alphabet_fill_index(wha->output_index, output_alphabet, FALSE); return wha; } WordHood_Alphabet *WordHood_Alphabet_create_from_Submat( gchar *input_alphabet, gchar *output_alphabet, Submat *submat, gboolean case_sensitive_input){ register WordHood_Alphabet *wha = WordHood_Alphabet_create(1, input_alphabet, output_alphabet, case_sensitive_input); register gint i; g_assert(submat); g_assert(output_alphabet); wha->submat = Submat_share(submat); wha->codon_submat = NULL; for(i = 0; output_alphabet[i]; i++) g_ptr_array_add(wha->member_list, g_strndup(output_alphabet+i, 1)); wha->score_func = WordHood_Submat_score_func; wha->is_valid_func = WordHood_Submat_is_valid_func; wha->index_func = WordHood_Submat_index_func; return wha; } static gint WordHood_CodonSubmat_score_func(WordHood_Alphabet *wha, gchar *seq_a, gchar *seq_b){ /* g_message("Codon score [%c%c%c][%c%c%c] = [%d]", seq_a[0], seq_a[1], seq_a[2], seq_b[0], seq_b[1], seq_b[2], CodonSubmat_lookup(wha->codon_submat, (guchar*)seq_a, (guchar*)seq_b)); */ return CodonSubmat_lookup(wha->codon_submat, (guchar*)seq_a, (guchar*)seq_b); } static gboolean WordHood_CodonSubmat_is_valid_func( WordHood_Alphabet *wha, gchar *seq){ if((wha->input_index[(guchar)seq[0]] == -1) || (wha->input_index[(guchar)seq[1]] == -1) || (wha->input_index[(guchar)seq[2]] == -1)) return FALSE; return TRUE; } static gint WordHood_CodonSubmat_index_func(WordHood_Alphabet *wha, gchar *seq){ g_assert(wha->output_index[(guchar)seq[0]] != -1); g_assert(wha->output_index[(guchar)seq[1]] != -1); g_assert(wha->output_index[(guchar)seq[2]] != -1); return (wha->output_index[(guchar)seq[0]] << 4) | (wha->output_index[(guchar)seq[1]] << 2) | wha->output_index[(guchar)seq[2]]; } WordHood_Alphabet *WordHood_Alphabet_create_from_CodonSubmat( CodonSubmat *cs, gboolean case_sensitive_input){ register gint a, b, c; register gchar *input_alphabet = "ABCDGHKMNRSTVWY", *output_alphabet = "ACGT"; register WordHood_Alphabet *wha = WordHood_Alphabet_create(3, input_alphabet, output_alphabet, case_sensitive_input); wha->submat = NULL; wha->codon_submat = CodonSubmat_share(cs); for(a = 0; a < 4; a++) for(b = 0; b < 4; b++) for(c = 0; c < 4; c++) g_ptr_array_add(wha->member_list, g_strdup_printf("%c%c%c", output_alphabet[a], output_alphabet[b], output_alphabet[c])); wha->score_func = WordHood_CodonSubmat_score_func; wha->is_valid_func = WordHood_CodonSubmat_is_valid_func; wha->index_func = WordHood_CodonSubmat_index_func; return wha; } WordHood_Alphabet *WordHood_Alphabet_share(WordHood_Alphabet *wha){ wha->ref_count++; return wha; } void WordHood_Alphabet_destroy(WordHood_Alphabet *wha){ register gint i; if(--wha->ref_count) return; if(wha->submat) Submat_destroy(wha->submat); if(wha->codon_submat) CodonSubmat_destroy(wha->codon_submat); for(i = 0; i < wha->member_list->len; i++){ g_assert(wha->member_list->pdata[i]); g_free(wha->member_list->pdata[i]); } g_ptr_array_free(wha->member_list, TRUE); g_free(wha); return; } /**/ WordHood *WordHood_create(WordHood_Alphabet *wha, gint threshold, gboolean use_dropoff){ register WordHood *wh = g_new(WordHood, 1); wh->wha = WordHood_Alphabet_share(wha); wh->threshold = threshold; wh->use_dropoff = use_dropoff; wh->orig_word = NULL; wh->alloc_len = 64; wh->curr_word = g_new0(gchar, wh->alloc_len); wh->depth_threshold = g_new0(gint, wh->alloc_len); wh->curr_len = 0; wh->word_pos = 0; wh->curr_score = 0; return wh; } void WordHood_destroy(WordHood *wh){ WordHood_Alphabet_destroy(wh->wha); g_free(wh->depth_threshold); g_free(wh->curr_word); g_free(wh); return; } static void WordHood_set_word(WordHood *wh, gchar *word, gint len, gint threshold){ register gint i; g_assert(len > 0); g_assert(!(len % wh->wha->advance)); wh->orig_word = word; wh->curr_len = len; if(wh->alloc_len < len){ wh->alloc_len = len; wh->curr_word = g_renew(gchar, wh->curr_word, wh->alloc_len); wh->depth_threshold = g_renew(gint, wh->depth_threshold, wh->alloc_len); wh->depth_threshold = g_renew(gint, wh->depth_threshold, wh->alloc_len); } wh->depth_threshold[wh->curr_len - wh->wha->advance] = threshold; for(i = wh->curr_len-(wh->wha->advance << 1); i >= 0; i -= wh->wha->advance){ wh->depth_threshold[i] = wh->depth_threshold[i+wh->wha->advance] - wh->wha->score_func(wh->wha, word+i+wh->wha->advance, word+i+wh->wha->advance); } return; } static void WordHood_Alphabet_info(WordHood_Alphabet *wha){ g_message("WordHood_Alphabet:\n" " ref_count [%d]\n" " advance [%d]\n" " member_list->len [%d]\n" "--\n", wha->ref_count, wha->advance, wha->member_list->len); return; } void WordHood_info(WordHood *wh){ WordHood_Alphabet_info(wh->wha); g_message("WordHood:\n" " threshold [%d]\n" " use_dropoff [%s]\n" "--\n", wh->threshold, wh->use_dropoff?"yes":"no"); return; } /**/ static gint WordHood_score_pos(WordHood *wh){ return wh->wha->score_func(wh->wha, wh->orig_word+wh->word_pos, wh->curr_word+wh->word_pos); } static gchar *WordHood_copy_member(WordHood *wh, gint member){ g_assert(member < wh->wha->member_list->len); return strncpy(wh->curr_word+wh->word_pos, wh->wha->member_list->pdata[member], wh->wha->advance); } static gboolean WordHood_Traverser_next(WordHood *wh){ /* Ascend while at end of alphabet */ while(!strncmp(wh->curr_word+wh->word_pos, wh->wha->member_list->pdata [wh->wha->member_list->len-1], wh->wha->advance)){ wh->curr_score -= WordHood_score_pos(wh); if(!wh->word_pos) return TRUE; wh->word_pos -= wh->wha->advance; } /* traverse */ wh->curr_score -= WordHood_score_pos(wh); WordHood_copy_member(wh, wh->wha->index_func(wh->wha, wh->curr_word+wh->word_pos)+1); wh->curr_score += WordHood_score_pos(wh); return FALSE; } static void WordHood_traverse_word(WordHood *wh, WordHood_Traverse_Func whtf, gpointer user_data){ WordHood_copy_member(wh, 0); /* Copy first member */ wh->word_pos = 0; wh->curr_score = WordHood_score_pos(wh); do { if(wh->curr_score < wh->depth_threshold[wh->word_pos]){ if(WordHood_Traverser_next(wh)) break; } else if(wh->word_pos == (wh->curr_len - wh->wha->advance)){ /* is at leaf */ if(whtf(wh->curr_word, wh->curr_score, user_data)) break; if(WordHood_Traverser_next(wh)) break; } else { /* descend */ wh->word_pos += wh->wha->advance; WordHood_copy_member(wh, 0); wh->curr_score += WordHood_score_pos(wh); } } while(TRUE); return; } /**/ static gboolean WordHood_word_is_valid(WordHood *wh){ register gint i; g_assert(wh->orig_word); for(i = 0; i < wh->curr_len; i += wh->wha->advance) if(!wh->wha->is_valid_func(wh->wha, wh->orig_word+i)) return FALSE; return TRUE; } /* This can reject softmasked words when the index is case-sensitive */ static gint WordHood_score_word(WordHood *wh){ register gint i, score = 0; for(i = 0; i < wh->curr_len; i += wh->wha->advance) score += wh->wha->score_func(wh->wha, wh->orig_word+i, wh->orig_word+i); return score; } void WordHood_traverse(WordHood *wh, WordHood_Traverse_Func whtf, gchar *word, gint len, gpointer user_data){ register gint actual_threshold; g_assert(wh); g_assert(whtf); g_assert(word); wh->orig_word = word; wh->curr_len = len; if(!WordHood_word_is_valid(wh)) return; if(wh->use_dropoff){ /* FIXME: supply word_score to obviate this */ actual_threshold = WordHood_score_word(wh) - wh->threshold; } else { actual_threshold = wh->threshold; } WordHood_set_word(wh, word, len, actual_threshold); WordHood_traverse_word(wh, whtf, user_data); wh->orig_word = NULL; return; } /* FIXME: dropoff wordhood: * change to only exclude negatively scoring words * from the wordhood, rather than any ones containing * non-alphabet words. */ /* FIXME: optimisations * o use submat ordering * (this will make it run much faster for proteins) * o detect redundant words * o detect mappable words in context of the submat * (will probably only work well with nucleotides) */ exonerate-2.4.0/src/comparison/hspset.test.c0000644000175000017500000000613611162714274016034 00000000000000/****************************************************************\ * * * Library for HSP sets (high-scoring segment pairs) * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "hspset.h" #include "submat.h" #include "translate.h" typedef struct { guint query_start; guint target_start; } TestHSPseed; static void test_hsp_set(Match_Type match_type, gchar *qy_seq, gchar *tg_seq, TestHSPseed *seed, gint seed_total){ register Sequence *query = Sequence_create("qy", NULL, qy_seq, 0, Sequence_Strand_UNKNOWN, NULL), *target = Sequence_create("tg", NULL, tg_seq, 0, Sequence_Strand_UNKNOWN, NULL); register Match *match = Match_find(match_type); register HSP_Param *hsp_param = HSP_Param_create(match, TRUE); register HSPset *hsp_set = HSPset_create(query, target, hsp_param); register gint i; for(i = 0; i < seed_total; i++) HSPset_seed_hsp(hsp_set, seed[i].query_start, seed[i].target_start); HSPset_finalise(hsp_set); HSPset_print(hsp_set); HSPset_destroy(hsp_set); HSP_Param_destroy(hsp_param); Sequence_destroy(query); Sequence_destroy(target); return; } gint Argument_main(Argument *arg){ register gchar *ntnt_qy = "AAAAGTGAGAGAGAGAGAGAGGCGAAAAAAAAAACCCCCCCCCCACCCCGCGA", /* ||||||| | |||||||||| |||||||||| ||||||| */ *ntnt_tg = "TTTTGTGAGAGTGTGAGAGAGGCGTTTTTTTTTTCCCCCCCCCCTCCCCGCCT"; /* 0 1 2 3 4 5 */ /* 01234567890123456789012345678901234567890123456789012*/ register gchar *aant_qy = "PNKDEGSCPIECDFLCRHQYISDP", *aant_tg = "ACGTACGTACGTACGAGTGCGTGCCCCCTTNNNTGTGACTACATCTGCAAAACGTACGTACGT"; HSPset_ArgumentSet_create(arg); Match_ArgumentSet_create(arg); Argument_process(arg, "hspset.test", NULL, NULL); TestHSPseed ntnt_seed[2] = { { 8, 8}, { 36, 36} }, ntaa_seed[1] = { { 8, 24} }; g_message("d2d:"); test_hsp_set(Match_Type_DNA2DNA, ntnt_qy, ntnt_tg, (TestHSPseed*)ntnt_seed, 2); g_message("p2d:"); test_hsp_set(Match_Type_PROTEIN2DNA, aant_qy, aant_tg, (TestHSPseed*)ntaa_seed, 1); return 0; } exonerate-2.4.0/src/comparison/hspset.c0000644000175000017500000015131611222476322015053 00000000000000/****************************************************************\ * * * Library for HSP sets (high-scoring segment pairs) * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strlen() */ #include /* For tolower() */ #include /* For qsort() */ #include "hspset.h" HSPset_ArgumentSet *HSPset_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static HSPset_ArgumentSet has = {0}; if(arg){ as = ArgumentSet_create("HSP creation options"); ArgumentSet_add_option(as, '\0', "hspfilter", NULL, "Aggressive HSP filtering level", "0", Argument_parse_int, &has.filter_threshold); ArgumentSet_add_option(as, '\0', "useworddropoff", NULL, "Use word neighbourhood dropoff", "TRUE", Argument_parse_boolean, &has.use_wordhood_dropoff); ArgumentSet_add_option(as, '\0', "seedrepeat", NULL, "Seeds per diagonal required for HSP seeding", "1", Argument_parse_int, &has.seed_repeat); /**/ ArgumentSet_add_option(as, 0, "dnawordlen", "bp", "Wordlength for DNA words", "12", Argument_parse_int, &has.dna_wordlen); ArgumentSet_add_option(as, 0, "proteinwordlen", "aa", "Wordlength for protein words", "6", Argument_parse_int, &has.protein_wordlen); ArgumentSet_add_option(as, 0, "codonwordlen", "bp", "Wordlength for codon words", "12", Argument_parse_int, &has.codon_wordlen); /**/ ArgumentSet_add_option(as, 0, "dnahspdropoff", "score", "DNA HSP dropoff score", "30", Argument_parse_int, &has.dna_hsp_dropoff); ArgumentSet_add_option(as, 0, "proteinhspdropoff", "score", "Protein HSP dropoff score", "20", Argument_parse_int, &has.protein_hsp_dropoff); ArgumentSet_add_option(as, 0, "codonhspdropoff", "score", "Codon HSP dropoff score", "40", Argument_parse_int, &has.codon_hsp_dropoff); /**/ ArgumentSet_add_option(as, 0, "dnahspthreshold", "score", "DNA HSP threshold score", "75", Argument_parse_int, &has.dna_hsp_threshold); ArgumentSet_add_option(as, 0, "proteinhspthreshold", "score", "Protein HSP threshold score", "30", Argument_parse_int, &has.protein_hsp_threshold); ArgumentSet_add_option(as, 0, "codonhspthreshold", "score", "Codon HSP threshold score", "50", Argument_parse_int, &has.codon_hsp_threshold); /**/ ArgumentSet_add_option(as, 0, "dnawordlimit", "score", "Score limit for dna word neighbourhood", "0", Argument_parse_int, &has.dna_word_limit); ArgumentSet_add_option(as, 0, "proteinwordlimit", "score", "Score limit for protein word neighbourhood", "4", Argument_parse_int, &has.protein_word_limit); ArgumentSet_add_option(as, 0, "codonwordlimit", "score", "Score limit for codon word neighbourhood", "4", Argument_parse_int, &has.codon_word_limit); /**/ ArgumentSet_add_option(as, '\0', "geneseed", "threshold", "Geneseed Threshold", "0", Argument_parse_int, &has.geneseed_threshold); ArgumentSet_add_option(as, '\0', "geneseedrepeat", "number", "Seeds per diagonal required for geneseed HSP seeding", "3", Argument_parse_int, &has.geneseed_repeat); /**/ Argument_absorb_ArgumentSet(arg, as); } return &has; } /**/ static gboolean HSP_check_positions(HSPset *hsp_set, gint query_pos, gint target_pos){ g_assert(query_pos >= 0); g_assert(target_pos >= 0); g_assert(query_pos < hsp_set->query->len); g_assert(target_pos < hsp_set->target->len); return TRUE; } static gboolean HSP_check(HSP *hsp){ g_assert(hsp); g_assert(hsp->hsp_set); g_assert(hsp->length); g_assert(HSP_query_end(hsp) <= hsp->hsp_set->query->len); g_assert(HSP_target_end(hsp) <= hsp->hsp_set->target->len); return TRUE; } void HSP_Param_set_wordlen(HSP_Param *hsp_param, gint wordlen){ if(wordlen <= 0) g_error("Wordlength must be greater than zero"); hsp_param->wordlen = wordlen; hsp_param->seedlen = hsp_param->wordlen / hsp_param->match->query->advance; return; } /**/ void HSP_Param_set_dna_hsp_threshold(HSP_Param *hsp_param, gint dna_hsp_threshold){ if(hsp_param->match->type == Match_Type_DNA2DNA) hsp_param->threshold = dna_hsp_threshold; return; } void HSP_Param_set_protein_hsp_threshold(HSP_Param *hsp_param, gint protein_hsp_threshold){ if((hsp_param->match->type == Match_Type_PROTEIN2PROTEIN) || (hsp_param->match->type == Match_Type_PROTEIN2DNA) || (hsp_param->match->type == Match_Type_DNA2PROTEIN)) hsp_param->threshold = protein_hsp_threshold; return; } void HSP_Param_set_codon_hsp_threshold(HSP_Param *hsp_param, gint codon_hsp_threshold){ if(hsp_param->match->type == Match_Type_CODON2CODON) hsp_param->threshold = codon_hsp_threshold; return; } /**/ static void HSP_Param_refresh_wordhood(HSP_Param *hsp_param){ register WordHood_Alphabet *wha = NULL; register Submat *submat; if(hsp_param->wordhood) WordHood_destroy(hsp_param->wordhood); if(hsp_param->has->use_wordhood_dropoff && (!hsp_param->wordlimit)){ hsp_param->wordhood = NULL; } else { submat = (hsp_param->match->type == Match_Type_DNA2DNA) ? hsp_param->match->mas->dna_submat : hsp_param->match->mas->protein_submat; wha = WordHood_Alphabet_create_from_Submat( (gchar*)hsp_param->match->comparison_alphabet->member, (gchar*)hsp_param->match->comparison_alphabet->member, submat, FALSE); g_assert(wha); hsp_param->wordhood = WordHood_create(wha, hsp_param->wordlimit, hsp_param->has->use_wordhood_dropoff); WordHood_Alphabet_destroy(wha); } return; } /* FIXME: optimisation: do not free/recreate wordhood when unnecessary */ void HSP_Param_set_dna_word_limit(HSP_Param *hsp_param, gint dna_word_limit){ if(hsp_param->match->type == Match_Type_DNA2DNA){ hsp_param->wordlimit = dna_word_limit; HSP_Param_refresh_wordhood(hsp_param); } return; } void HSP_Param_set_protein_word_limit(HSP_Param *hsp_param, gint protein_word_limit){ if((hsp_param->match->type == Match_Type_PROTEIN2PROTEIN) || (hsp_param->match->type == Match_Type_PROTEIN2DNA) || (hsp_param->match->type == Match_Type_DNA2PROTEIN)){ hsp_param->wordlimit = protein_word_limit; HSP_Param_refresh_wordhood(hsp_param); } return; } void HSP_Param_set_codon_word_limit(HSP_Param *hsp_param, gint codon_word_limit){ if(hsp_param->match->type == Match_Type_CODON2CODON){ hsp_param->threshold = codon_word_limit; HSP_Param_refresh_wordhood(hsp_param); } return; } /**/ void HSP_Param_set_dna_hsp_dropoff(HSP_Param *hsp_param, gint dna_hsp_dropoff){ if(hsp_param->match->type == Match_Type_DNA2DNA) hsp_param->dropoff = dna_hsp_dropoff; return; } void HSP_Param_set_protein_hsp_dropoff(HSP_Param *hsp_param, gint protein_hsp_dropoff){ if((hsp_param->match->type == Match_Type_PROTEIN2PROTEIN) || (hsp_param->match->type == Match_Type_PROTEIN2DNA) || (hsp_param->match->type == Match_Type_DNA2PROTEIN)) hsp_param->dropoff = protein_hsp_dropoff; return; } void HSP_Param_set_codon_hsp_dropoff(HSP_Param *hsp_param, gint codon_hsp_dropoff){ if(hsp_param->match->type == Match_Type_CODON2CODON) hsp_param->dropoff = codon_hsp_dropoff; return; } /**/ void HSP_Param_set_hsp_threshold(HSP_Param *hsp_param, gint hsp_threshold){ hsp_param->threshold = hsp_threshold; return; } void HSP_Param_set_seed_repeat(HSP_Param *hsp_param, gint seed_repeat){ hsp_param->seed_repeat = seed_repeat; return; } HSP_Param *HSP_Param_create(Match *match, gboolean use_horizon){ register HSP_Param *hsp_param = g_new(HSP_Param, 1); hsp_param->thread_ref = ThreadRef_create(); hsp_param->has = HSPset_ArgumentSet_create(NULL); hsp_param->match = match; hsp_param->seed_repeat = hsp_param->has->seed_repeat; switch(match->type){ case Match_Type_DNA2DNA: hsp_param->dropoff = hsp_param->has->dna_hsp_dropoff; hsp_param->threshold = hsp_param->has->dna_hsp_threshold; HSP_Param_set_wordlen(hsp_param, hsp_param->has->dna_wordlen); hsp_param->wordlimit = hsp_param->has->dna_word_limit; break; case Match_Type_PROTEIN2PROTEIN: /*fallthrough*/ case Match_Type_PROTEIN2DNA: /*fallthrough*/ case Match_Type_DNA2PROTEIN: hsp_param->dropoff = hsp_param->has->protein_hsp_dropoff; hsp_param->threshold = hsp_param->has->protein_hsp_threshold; HSP_Param_set_wordlen(hsp_param, hsp_param->has->protein_wordlen); hsp_param->wordlimit = hsp_param->has->protein_word_limit; break; case Match_Type_CODON2CODON: hsp_param->dropoff = hsp_param->has->codon_hsp_dropoff; hsp_param->threshold = hsp_param->has->codon_hsp_threshold; HSP_Param_set_wordlen(hsp_param, hsp_param->has->codon_wordlen); hsp_param->wordlimit = hsp_param->has->codon_word_limit; g_assert(!(hsp_param->wordlen % 3)); break; default: g_error("Bad Match_Type [%d]", match->type); break; } hsp_param->use_horizon = use_horizon; #ifdef USE_PTHREADS pthread_mutex_init(&hsp_param->hsp_recycle_lock, NULL); #endif /* USE_PTHREADS */ hsp_param->hsp_recycle = RecycleBin_create("HSP", sizeof(HSP), 64); hsp_param->wordhood = NULL; HSP_Param_refresh_wordhood(hsp_param); return hsp_param; } void HSP_Param_destroy(HSP_Param *hsp_param){ g_assert(hsp_param); if(ThreadRef_destroy(hsp_param->thread_ref)) return; if(hsp_param->wordhood) WordHood_destroy(hsp_param->wordhood); #ifdef USE_PTHREADS pthread_mutex_lock(&hsp_param->hsp_recycle_lock); #endif /* USE_PTHREADS */ RecycleBin_destroy(hsp_param->hsp_recycle); #ifdef USE_PTHREADS pthread_mutex_unlock(&hsp_param->hsp_recycle_lock); pthread_mutex_destroy(&hsp_param->hsp_recycle_lock); #endif /* USE_PTHREADS */ g_free(hsp_param); return; } HSP_Param *HSP_Param_share(HSP_Param *hsp_param){ g_assert(hsp_param); ThreadRef_share(hsp_param->thread_ref); return hsp_param; } HSP_Param *HSP_Param_swap(HSP_Param *hsp_param){ g_assert(hsp_param); return HSP_Param_create(Match_swap(hsp_param->match), hsp_param->use_horizon); } HSPset *HSPset_create(Sequence *query, Sequence *target, HSP_Param *hsp_param){ register HSPset *hsp_set = g_new(HSPset, 1); g_assert(query); g_assert(target); hsp_set->ref_count = 1; hsp_set->query = Sequence_share(query); hsp_set->target = Sequence_share(target); hsp_set->param = HSP_Param_share(hsp_param); /**/ if(hsp_param->use_horizon){ hsp_set->horizon = (gint****)Matrix4d_create( 1 + ((hsp_param->seed_repeat > 1)?2:0), query->len, hsp_param->match->query->advance, hsp_param->match->target->advance, sizeof(gint)); } else { hsp_set->horizon = NULL; } hsp_set->hsp_list = g_ptr_array_new(); hsp_set->is_finalised = FALSE; hsp_set->param->has = HSPset_ArgumentSet_create(NULL); if(hsp_set->param->has->filter_threshold){ hsp_set->filter = g_new0(PQueue*, query->len); hsp_set->pqueue_set = PQueueSet_create(); } else { hsp_set->filter = NULL; hsp_set->pqueue_set = NULL; } hsp_set->is_empty = TRUE; return hsp_set; } /* horizon[score(,repeat_count,diag)] * [query_len] * [query_advance] * [target_advance] */ /**/ HSPset *HSPset_share(HSPset *hsp_set){ g_assert(hsp_set); hsp_set->ref_count++; return hsp_set; } void HSPset_destroy(HSPset *hsp_set){ register gint i; register HSP *hsp; g_assert(hsp_set); if(--hsp_set->ref_count) return; HSP_Param_destroy(hsp_set->param); if(hsp_set->filter) g_free(hsp_set->filter); if(hsp_set->pqueue_set) PQueueSet_destroy(hsp_set->pqueue_set); Sequence_destroy(hsp_set->query); Sequence_destroy(hsp_set->target); for(i = 0; i < hsp_set->hsp_list->len; i++){ hsp = hsp_set->hsp_list->pdata[i]; HSP_destroy(hsp); } g_ptr_array_free(hsp_set->hsp_list, TRUE); if(hsp_set->horizon) g_free(hsp_set->horizon); g_free(hsp_set); return; } void HSPset_swap(HSPset *hsp_set, HSP_Param *hsp_param){ register Sequence *query; register gint i, query_start; register HSP *hsp; g_assert(hsp_set->ref_count == 1); /* Swap query and target */ query = hsp_set->query; hsp_set->query = hsp_set->target; hsp_set->target = query; /* Switch parameters */ HSP_Param_destroy(hsp_set->param); hsp_set->param = HSP_Param_share(hsp_param); /* Swap HSPs coordinates */ for(i = 0; i < hsp_set->hsp_list->len; i++){ hsp = hsp_set->hsp_list->pdata[i]; query_start = hsp->query_start; hsp->query_start = hsp->target_start; hsp->target_start = query_start; } return; } void HSPset_revcomp(HSPset *hsp_set){ register Sequence *rc_query = Sequence_revcomp(hsp_set->query), *rc_target = Sequence_revcomp(hsp_set->target); register gint i; register HSP *hsp; g_assert(hsp_set); g_assert(hsp_set->is_finalised); g_assert(hsp_set->ref_count == 1); /**/ Sequence_destroy(hsp_set->query); Sequence_destroy(hsp_set->target); hsp_set->query = rc_query; hsp_set->target = rc_target; for(i = 0; i < hsp_set->hsp_list->len; i++){ hsp = hsp_set->hsp_list->pdata[i]; hsp->query_start = hsp_set->query->len - HSP_query_end(hsp); hsp->target_start = hsp_set->target->len - HSP_target_end(hsp); } return; } static gint HSP_find_cobs(HSP *hsp){ register gint i, query_pos = hsp->query_start, target_pos = hsp->target_start; register Match_Score score = 0; /* Find the HSP centre offset by score */ for(i = 0; i < hsp->length; i++){ g_assert(HSP_check_positions(hsp->hsp_set, query_pos, target_pos)); score += HSP_get_score(hsp, query_pos, target_pos); if(score >= (hsp->score>>1)) break; query_pos += HSP_query_advance(hsp); target_pos += HSP_target_advance(hsp); } return i; } /**/ static void HSP_print_info(HSP *hsp){ g_print("HSP info (%p)\n" " query_start = [%d]\n" " target_start = [%d]\n" " length = [%d]\n" " score = [%d]\n" " cobs = [%d]\n", (gpointer)hsp, hsp->query_start, hsp->target_start, hsp->length, hsp->score, hsp->cobs); return; } typedef struct { HSP *hsp; /* not freed */ gint padding; gint max_advance; gint query_display_pad; gint target_display_pad; GString *top; GString *mid; GString *low; } HSP_Display; static HSP_Display *HSP_Display_create(HSP *hsp, gint padding){ register HSP_Display *hd = g_new(HSP_Display, 1); register gint approx_length; hd->hsp = hsp; hd->padding = padding; hd->max_advance = MAX(HSP_query_advance(hsp), HSP_target_advance(hsp)); hd->query_display_pad = (hd->max_advance -HSP_query_advance(hsp))>>1; hd->target_display_pad = (hd->max_advance -HSP_target_advance(hsp))>>1; approx_length = hd->max_advance * (hsp->length + 2); hd->top = g_string_sized_new(approx_length); hd->mid = g_string_sized_new(approx_length); hd->low = g_string_sized_new(approx_length); return hd; } static void HSP_Display_destroy(HSP_Display *hd){ g_assert(hd); g_string_free(hd->top, TRUE); g_string_free(hd->mid, TRUE); g_string_free(hd->low, TRUE); g_free(hd); return; } static void HSP_Display_add(HSP_Display *hd, gchar *top, gchar *mid, gchar *low){ g_assert(hd); g_assert(top); g_assert(mid); g_assert(low); g_string_append(hd->top, top); g_string_append(hd->mid, mid); g_string_append(hd->low, low); g_assert(hd->top->len == hd->mid->len); g_assert(hd->mid->len == hd->low->len); return; } static gchar HSP_Display_get_ruler_char(HSP_Display *hd, gint pos, gint advance){ register gint stop; register gint pad_length = 3; stop = hd->padding * hd->max_advance; if(pos >= stop){ if(pos < (stop+pad_length)){ return '#'; } stop = ((hd->padding+hd->hsp->length) * hd->max_advance) + pad_length; if(pos >= stop){ if(pos < (stop+pad_length)){ return '#'; } pos -= pad_length; } pos -= pad_length; } if((pos/advance) & 1) return '='; return '-'; } static void HSP_Display_print_ruler(HSP_Display *hd, gint width, gint pos, gboolean is_query){ register gint i, adv; if(is_query){ if(HSP_target_advance(hd->hsp) == 1) return; /* opposite padding */ adv = HSP_target_advance(hd->hsp); } else { /* Is target */ if(HSP_query_advance(hd->hsp) == 1) return; /* opposite padding */ adv = HSP_query_advance(hd->hsp); } g_print(" ruler:["); for(i = 0; i < width; i++) g_print("%c", HSP_Display_get_ruler_char(hd, pos+i, adv)); g_print("]\n"); return; } static void HSP_Display_print(HSP_Display *hd){ register gint pos, pause, width = 50; g_assert(hd); g_assert(hd->top->len == hd->mid->len); g_assert(hd->mid->len == hd->low->len); for(pos = 0, pause = hd->top->len-width; pos < pause; pos += width){ HSP_Display_print_ruler(hd, width, pos, TRUE); g_print(" query:[%.*s]\n [%.*s]\ntarget:[%.*s]\n", width, hd->top->str+pos, width, hd->mid->str+pos, width, hd->low->str+pos); HSP_Display_print_ruler(hd, width, pos, FALSE); g_print("\n"); } HSP_Display_print_ruler(hd, hd->top->len-pos, pos, TRUE); g_print(" query:[%.*s]\n [%.*s]\ntarget:[%.*s]\n", (gint)(hd->top->len-pos), hd->top->str+pos, (gint)(hd->mid->len-pos), hd->mid->str+pos, (gint)(hd->low->len-pos), hd->low->str+pos); HSP_Display_print_ruler(hd, hd->top->len-pos, pos, FALSE); g_print("\n"); return; } static void HSP_Display_insert(HSP_Display *hd, gint position){ register gint query_pos, target_pos; register gboolean is_padding, query_valid = TRUE, target_valid = TRUE; register Match *match = hd->hsp->hsp_set->param->match; gchar query_str[4] = {0}, target_str[4] = {0}, equiv_str[4] = {0}; g_assert(hd); query_pos = hd->hsp->query_start + (HSP_query_advance(hd->hsp) * position); target_pos = hd->hsp->target_start + (HSP_target_advance(hd->hsp) * position); /* If outside HSP, then is_padding */ is_padding = ((position < 0) || (position >= hd->hsp->length)); /* If outside seqs, then invalid */ query_valid = ( (query_pos >= 0) &&((query_pos+HSP_query_advance(hd->hsp)) <= hd->hsp->hsp_set->query->len)); target_valid = ((target_pos >= 0) &&((target_pos+HSP_target_advance(hd->hsp)) <= hd->hsp->hsp_set->target->len)); /* Get equiv string */ if(query_valid && target_valid){ g_assert(HSP_check_positions(hd->hsp->hsp_set, query_pos, target_pos)); HSP_get_display(hd->hsp, query_pos, target_pos, equiv_str); } else { strncpy(equiv_str, "###", hd->max_advance); equiv_str[hd->max_advance] = '\0'; } /* Get query string */ if(query_valid){ Match_Strand_get_raw(match->query, hd->hsp->hsp_set->query, query_pos, query_str); if((match->query->advance == 1) && (hd->max_advance == 3)){ query_str[1] = query_str[0]; query_str[0] = query_str[2] = ' '; query_str[3] = '\0'; } } else { strncpy(query_str, "###", hd->max_advance); query_str[hd->max_advance] = '\0'; } /* Get target string */ if(target_valid){ Match_Strand_get_raw(match->target, hd->hsp->hsp_set->target, target_pos, target_str); if((match->target->advance == 1) && (hd->max_advance == 3)){ target_str[1] = target_str[0]; target_str[0] = target_str[2] = ' '; target_str[3] = '\0'; } } else { strncpy(target_str, "###", hd->max_advance); target_str[hd->max_advance] = '\0'; } /* Make lower case for padding */ if(is_padding){ g_strdown(query_str); g_strdown(target_str); } else { g_strup(query_str); g_strup(target_str); } HSP_Display_add(hd, query_str, equiv_str, target_str); return; } static void HSP_print_alignment(HSP *hsp){ register HSP_Display *hd = HSP_Display_create(hsp, 10); register gint i; for(i = 0; i < hd->padding; i++) /* Pre-padding */ HSP_Display_insert(hd, i-hd->padding); /* Use pad_length == 3 */ HSP_Display_add(hd, " < ", " < ", " < "); /* Start divider */ for(i = 0; i < hsp->length; i++) /* The HSP itself */ HSP_Display_insert(hd, i); HSP_Display_add(hd, " > ", " > ", " > "); /* End divider */ for(i = 0; i < hd->padding; i++) /* Post-padding */ HSP_Display_insert(hd, hsp->length+i); HSP_Display_print(hd); HSP_Display_destroy(hd); return; } /* * HSP display style: * * =-=-=- =-=-=-=-=-=-=-=-=- =-=-=- * nnnnnn < ACGACGCCCACGATCGAT > nnn### * ||| < |||:::||| |||||| > ||| * ### x < A R N D C Q > x ### * ===--- ===---===---===--- ===--- */ static void HSP_print_sugar(HSP *hsp){ g_print("sugar: %s %d %d %c %s %d %d %c %d\n", hsp->hsp_set->query->id, hsp->query_start, hsp->length*HSP_query_advance(hsp), Sequence_get_strand_as_char(hsp->hsp_set->query), hsp->hsp_set->target->id, hsp->target_start, hsp->length*HSP_target_advance(hsp), Sequence_get_strand_as_char(hsp->hsp_set->target), hsp->score); return; } /* Sugar output format: * sugar: * * */ void HSP_print(HSP *hsp, gchar *name){ g_print("draw_hsp(%d, %d, %d, %d, %d, %d, \"%s\")\n", hsp->query_start, hsp->target_start, hsp->length, hsp->cobs, HSP_query_advance(hsp), HSP_target_advance(hsp), name); HSP_print_info(hsp); HSP_print_alignment(hsp); HSP_print_sugar(hsp); return; } void HSPset_print(HSPset *hsp_set){ register gint i; register gchar *name; g_print("HSPset [%p] contains [%d] hsps\n", (gpointer)hsp_set, hsp_set->hsp_list->len); g_print("Comparison of [%s] and [%s]\n", hsp_set->query->id, hsp_set->target->id); for(i = 0; i < hsp_set->hsp_list->len; i++){ name = g_strdup_printf("hsp [%d]", i+1); HSP_print(hsp_set->hsp_list->pdata[i], name); g_free(name); } return; } /**/ static void HSP_init(HSP *nh){ register gint i; register gint query_pos, target_pos; g_assert(HSP_check(nh)); /* Initial hsp score */ query_pos = nh->query_start; target_pos = nh->target_start; nh->score = 0; for(i = 0; i < nh->length; i++){ g_assert(HSP_check_positions(nh->hsp_set, query_pos, target_pos)); nh->score += HSP_get_score(nh, query_pos, target_pos); query_pos += HSP_query_advance(nh); target_pos += HSP_target_advance(nh); } if(nh->score < 0){ HSP_print(nh, "Bad HSP seed"); g_error("Initial HSP score [%d] less than zero", nh->score); } g_assert(HSP_check(nh)); return; } static void HSP_extend(HSP *nh, gboolean forbid_masked){ register Match_Score score, maxscore; register gint query_pos, target_pos; register gint extend, maxext; g_assert(HSP_check(nh)); /* extend left */ maxscore = score = nh->score; query_pos = nh->query_start-HSP_query_advance(nh); target_pos = nh->target_start-HSP_target_advance(nh); for(extend = 1, maxext = 0; ((query_pos >= 0) && (target_pos >= 0)); extend++){ g_assert(HSP_check_positions(nh->hsp_set, query_pos, target_pos)); if((forbid_masked) && (HSP_query_masked(nh, query_pos) || HSP_target_masked(nh, target_pos))) break; score += HSP_get_score(nh, query_pos, target_pos); if(maxscore <= score){ maxscore = score; maxext = extend; } else { if(score < 0) /* See note below */ break; if((maxscore-score) >= nh->hsp_set->param->dropoff) break; } query_pos -= HSP_query_advance(nh); target_pos -= HSP_target_advance(nh); } query_pos = HSP_query_end(nh); target_pos = HSP_target_end(nh); nh->query_start -= (maxext * HSP_query_advance(nh)); nh->target_start -= (maxext * HSP_target_advance(nh)); nh->length += maxext; score = maxscore; /* extend right */ for(extend = 1, maxext = 0; ( ((query_pos+HSP_query_advance(nh)) <= nh->hsp_set->query->len) && ((target_pos+HSP_target_advance(nh)) <= nh->hsp_set->target->len) ); extend++){ g_assert(HSP_check_positions(nh->hsp_set, query_pos, target_pos)); if((forbid_masked) && (HSP_query_masked(nh, query_pos) || HSP_target_masked(nh, target_pos))) break; score += HSP_get_score(nh, query_pos, target_pos); if(maxscore <= score){ maxscore = score; maxext = extend; } else { if(score < 0) /* See note below */ break; if((maxscore-score) >= nh->hsp_set->param->dropoff) break; } query_pos += HSP_query_advance(nh); target_pos += HSP_target_advance(nh); } nh->score = maxscore; nh->length += maxext; g_assert(HSP_check(nh)); return; } /* The score cannot be allowed to drop below zero in the HSP, * as this can result in overlapping HSPs in some circumstances. */ static HSP *HSP_create(HSP *nh){ register HSP *hsp; #ifdef USE_PTHREADS pthread_mutex_lock(&nh->hsp_set->param->hsp_recycle_lock); #endif /* USE_PTHREADS */ hsp = RecycleBin_alloc(nh->hsp_set->param->hsp_recycle); #ifdef USE_PTHREADS pthread_mutex_unlock(&nh->hsp_set->param->hsp_recycle_lock); #endif /* USE_PTHREADS */ hsp->hsp_set = nh->hsp_set; hsp->query_start = nh->query_start; hsp->target_start = nh->target_start; hsp->length = nh->length; hsp->score = nh->score; hsp->cobs = nh->cobs; /* Value can be set by HSPset_finalise(); */ return hsp; } void HSP_destroy(HSP *hsp){ register HSPset *hsp_set = hsp->hsp_set; #ifdef USE_PTHREADS pthread_mutex_lock(&hsp_set->param->hsp_recycle_lock); #endif /* USE_PTHREADS */ RecycleBin_recycle(hsp_set->param->hsp_recycle, hsp); #ifdef USE_PTHREADS pthread_mutex_unlock(&hsp_set->param->hsp_recycle_lock); #endif /* USE_PTHREADS */ return; } static void HSP_trim_ends(HSP *hsp){ register gint i; register gint query_pos, target_pos; /* Trim left to first good match */ g_assert(HSP_check(hsp)); for(i = 0; i < hsp->length; i++){ if(HSP_get_score(hsp, hsp->query_start, hsp->target_start) > 0) break; hsp->query_start += HSP_query_advance(hsp); hsp->target_start += HSP_target_advance(hsp); } hsp->length -= i; /**/ g_assert(HSP_check(hsp)); query_pos = HSP_query_end(hsp) - HSP_query_advance(hsp); target_pos = HSP_target_end(hsp) - HSP_target_advance(hsp); /* Trim right to last good match */ while(hsp->length > 0){ g_assert(HSP_check_positions(hsp->hsp_set, query_pos, target_pos)); if(HSP_get_score(hsp, query_pos, target_pos) > 0) break; hsp->length--; query_pos -= HSP_query_advance(hsp); target_pos -= HSP_target_advance(hsp); } g_assert(HSP_check(hsp)); return; } /* This is to remove any unmatching ends from the HSP seed. */ static gboolean HSP_PQueue_comp_func(gpointer low, gpointer high, gpointer user_data){ register HSP *hsp_low = (HSP*)low, *hsp_high = (HSP*)high; return hsp_low->score - hsp_high->score; } static void HSP_store(HSP *nascent_hsp){ register HSPset *hsp_set = nascent_hsp->hsp_set; register PQueue *pq; register HSP *hsp = NULL; g_assert(nascent_hsp); if(nascent_hsp->score < hsp_set->param->threshold) return; if(hsp_set->param->has->filter_threshold){ /* If have filter */ /* Get cobs value */ nascent_hsp->cobs = HSP_find_cobs(nascent_hsp); pq = hsp_set->filter[HSP_query_cobs(nascent_hsp)]; if(pq){ /* Put in PQueue if better than worst */ if(PQueue_total(pq) < hsp_set->param->has->filter_threshold){ hsp = HSP_create(nascent_hsp); PQueue_push(pq, hsp); } else { g_assert(PQueue_total(pq)); hsp = PQueue_top(pq); if(hsp->score < nascent_hsp->score){ hsp = PQueue_pop(pq); HSP_destroy(hsp); hsp = HSP_create(nascent_hsp); PQueue_push(pq, hsp); } } } else { pq = PQueue_create(hsp_set->pqueue_set, HSP_PQueue_comp_func, NULL); hsp_set->filter[HSP_query_cobs(nascent_hsp)] = pq; hsp = HSP_create(nascent_hsp); PQueue_push(pq, hsp); hsp_set->is_empty = FALSE; } } else { hsp = HSP_create(nascent_hsp); g_ptr_array_add(hsp_set->hsp_list, hsp); hsp_set->is_empty = FALSE; } return; } /* FIXME: optimisation: could store HSPs as a list up until * filter_threshold, then convert to a PQueue */ void HSPset_seed_hsp(HSPset *hsp_set, guint query_start, guint target_start){ register gint diag_pos = ((target_start * hsp_set->param->match->query->advance) -(query_start * hsp_set->param->match->target->advance)); register gint query_frame = query_start % hsp_set->param->match->query->advance, target_frame = target_start % hsp_set->param->match->target->advance; register gint section_pos = (diag_pos + hsp_set->query->len) % hsp_set->query->len; HSP nascent_hsp; g_assert(!hsp_set->is_finalised); /**/ g_assert(section_pos >= 0); g_assert(section_pos < hsp_set->query->len); /* Clear if diag has changed */ if(hsp_set->param->seed_repeat > 1){ if(hsp_set->horizon[2][section_pos][query_frame][target_frame] != (diag_pos + hsp_set->query->len)){ hsp_set->horizon[0][section_pos][query_frame][target_frame] = 0; hsp_set->horizon[1][section_pos][query_frame][target_frame] = 0; hsp_set->horizon[2][section_pos][query_frame][target_frame] = diag_pos + hsp_set->query->len; } } /* Check whether we have seen this HSP already */ if(target_start < hsp_set->horizon[0] [section_pos] [query_frame] [target_frame]) return; if(hsp_set->param->seed_repeat > 1){ if(++hsp_set->horizon[1][section_pos][query_frame][target_frame] < hsp_set->param->seed_repeat) return; hsp_set->horizon[1][section_pos][query_frame][target_frame] = 0; } /* Nascent HSP building: */ nascent_hsp.hsp_set = hsp_set; nascent_hsp.query_start = query_start; nascent_hsp.target_start = target_start; nascent_hsp.length = hsp_set->param->seedlen; nascent_hsp.cobs = 0; g_assert(HSP_check(&nascent_hsp)); HSP_trim_ends(&nascent_hsp); /* Score is irrelevant before HSP_init() */ HSP_init(&nascent_hsp); /* Try to make above threshold HSP using masking */ if(hsp_set->param->match->query->mask_func || hsp_set->param->match->target->mask_func){ HSP_extend(&nascent_hsp, TRUE); if(nascent_hsp.score < hsp_set->param->threshold){ hsp_set->horizon[0][section_pos][query_frame][target_frame] = HSP_target_end(&nascent_hsp); return; } } /* Extend the HSP again ignoring masking */ HSP_extend(&nascent_hsp, FALSE); HSP_store(&nascent_hsp); hsp_set->horizon[0][section_pos][query_frame][target_frame] = HSP_target_end(&nascent_hsp); return; } void HSPset_add_known_hsp(HSPset *hsp_set, guint query_start, guint target_start, guint length){ HSP nascent_hsp; nascent_hsp.hsp_set = hsp_set; nascent_hsp.query_start = query_start; nascent_hsp.target_start = target_start; nascent_hsp.length = length; nascent_hsp.cobs = 0; /* Score is irrelevant before HSP_init() */ HSP_init(&nascent_hsp); HSP_store(&nascent_hsp); return; } static void HSPset_seed_hsp_sorted(HSPset *hsp_set, guint query_start, guint target_start, gint ***horizon){ HSP nascent_hsp; register gint diag_pos = ((target_start * hsp_set->param->match->query->advance) -(query_start * hsp_set->param->match->target->advance)); register gint query_frame = query_start % hsp_set->param->match->query->advance, target_frame = target_start % hsp_set->param->match->target->advance; register gint section_pos = (diag_pos + hsp_set->query->len) % hsp_set->query->len; g_assert(!hsp_set->is_finalised); g_assert(!hsp_set->horizon); g_assert(section_pos >= 0); g_assert(section_pos < hsp_set->query->len); /**/ if(horizon[1][query_frame][target_frame] != section_pos){ horizon[1][query_frame][target_frame] = section_pos; horizon[0][query_frame][target_frame] = 0; horizon[2][query_frame][target_frame] = 0; } /* FIXME: seedrepeat overflow here */ if(++horizon[2][query_frame][target_frame] < hsp_set->param->seed_repeat) return; horizon[2][query_frame][target_frame] = 0; /* Check whether we have seen this HSP already */ if(target_start < horizon[0][query_frame][target_frame]) return; /**/ /* Nascent HSP building: */ nascent_hsp.hsp_set = hsp_set; nascent_hsp.query_start = query_start; nascent_hsp.target_start = target_start; nascent_hsp.length = hsp_set->param->seedlen; nascent_hsp.cobs = 0; g_assert(HSP_check(&nascent_hsp)); HSP_trim_ends(&nascent_hsp); /* Score is irrelevant before HSP_init() */ HSP_init(&nascent_hsp); /* Try to make above threshold HSP using masking */ if(hsp_set->param->match->query->mask_func || hsp_set->param->match->target->mask_func){ HSP_extend(&nascent_hsp, TRUE); if(nascent_hsp.score < hsp_set->param->threshold){ horizon[0][query_frame][target_frame] = HSP_target_end(&nascent_hsp); return; } } /* Extend the HSP again ignoring masking */ HSP_extend(&nascent_hsp, FALSE); HSP_store(&nascent_hsp); /**/ horizon[0][query_frame][target_frame] = HSP_target_end(&nascent_hsp); return; } /* horizon[0] = diag * horizon[1] = target_end * horizon[2] = repeat_count */ /* Need to use the global to pass q,t advance to qsort compare func */ static HSPset *HSPset_seed_compare_hsp_set = NULL; static int HSPset_seed_compare(const void *a, const void *b){ register guint *seed_a = (guint*)a, *seed_b = (guint*)b; register gint diag_a, diag_b; register HSPset *hsp_set = HSPset_seed_compare_hsp_set; g_assert(hsp_set); diag_a = ((seed_a[1] * hsp_set->param->match->query->advance) - (seed_a[0] * hsp_set->param->match->target->advance)), diag_b = ((seed_b[1] * hsp_set->param->match->query->advance) - (seed_b[0] * hsp_set->param->match->target->advance)); if(diag_a == diag_b) return seed_a[0] - seed_b[0]; return diag_a - diag_b; } void HSPset_seed_all_hsps(HSPset *hsp_set, guint *seed_list, guint seed_list_len){ register gint i; register gint ***horizon; register gint qpos, tpos; if(seed_list_len > 1){ HSPset_seed_compare_hsp_set = hsp_set; qsort(seed_list, seed_list_len, sizeof(guint) << 1, HSPset_seed_compare); HSPset_seed_compare_hsp_set = NULL; } if(seed_list_len){ horizon = (gint***)Matrix3d_create(3, hsp_set->param->match->query->advance, hsp_set->param->match->target->advance, sizeof(gint)); for(i = 0; i < seed_list_len; i++){ HSPset_seed_hsp_sorted(hsp_set, seed_list[(i << 1)], seed_list[(i << 1) + 1], horizon); qpos = seed_list[(i << 1)]; tpos = seed_list[(i << 1) + 1]; } g_free(horizon); } HSPset_finalise(hsp_set); return; } /**/ HSPset *HSPset_finalise(HSPset *hsp_set){ register gint i; register HSP *hsp; register PQueue *pq; g_assert(!hsp_set->is_finalised); hsp_set->is_finalised = TRUE; if(hsp_set->param->has->filter_threshold && (!hsp_set->is_empty)){ /* Get HSPs from each PQueue */ for(i = 0; i < hsp_set->query->len; i++){ pq = hsp_set->filter[i]; if(pq){ while(PQueue_total(pq)){ hsp = PQueue_pop(pq); g_ptr_array_add(hsp_set->hsp_list, hsp); } } } } /* Set cobs for each HSP */ if(!hsp_set->param->has->filter_threshold){ for(i = 0; i < hsp_set->hsp_list->len; i++){ hsp = hsp_set->hsp_list->pdata[i]; hsp->cobs = HSP_find_cobs(hsp); } } hsp_set->is_finalised = TRUE; return hsp_set; } /**/ static int HSPset_sort_by_diag_then_query_start(const void *a, const void *b){ register HSP **hsp_a = (HSP**)a, **hsp_b = (HSP**)b; register gint diag_a = HSP_diagonal(*hsp_a), diag_b = HSP_diagonal(*hsp_b); if(diag_a == diag_b) return (*hsp_a)->query_start - (*hsp_b)->query_start; return diag_a - diag_b; } static Match_Score HSP_score_overlap(HSP *left, HSP *right){ register Match_Score score = 0; register gint query_pos, target_pos; g_assert(left->hsp_set == right->hsp_set); g_assert(HSP_diagonal(left) == HSP_diagonal(right)); query_pos = HSP_query_end(left) - HSP_query_advance(left); target_pos = HSP_target_end(left) - HSP_target_advance(left); while(query_pos >= right->query_start){ score += HSP_get_score(left, query_pos, target_pos); query_pos -= HSP_query_advance(left); target_pos -= HSP_target_advance(left); } query_pos = right->query_start; target_pos = right->target_start; while(query_pos < (HSP_query_end(left)- HSP_query_advance(right))){ score += HSP_get_score(right, query_pos, target_pos); query_pos += HSP_query_advance(right); target_pos += HSP_target_advance(right); } return score; } /* Returns score for overlapping region of HSPs on same diagonal */ void HSPset_filter_ungapped(HSPset *hsp_set){ register GPtrArray *new_hsp_list; register HSP *curr_hsp, *prev_hsp; register gboolean del_prev, del_curr; register gint i; register Match_Score score; /* Filter strongly overlapping HSPs on same diagonal * but different frames (happens with 3:3 HSPs only) */ if((hsp_set->hsp_list->len > 1) && (hsp_set->param->match->query->advance == 3) && (hsp_set->param->match->target->advance == 3)){ /* FIXME: should not sort when using all-at-once HSPset */ qsort(hsp_set->hsp_list->pdata, hsp_set->hsp_list->len, sizeof(gpointer), HSPset_sort_by_diag_then_query_start); prev_hsp = hsp_set->hsp_list->pdata[0]; del_prev = FALSE; del_curr = FALSE; new_hsp_list = g_ptr_array_new(); for(i = 1; i < hsp_set->hsp_list->len; i++){ curr_hsp = hsp_set->hsp_list->pdata[i]; del_curr = FALSE; if((HSP_diagonal(prev_hsp) == HSP_diagonal(curr_hsp)) && (HSP_query_end(prev_hsp) > curr_hsp->query_start)){ score = HSP_score_overlap(prev_hsp, curr_hsp); if((score << 1) > (curr_hsp->score + prev_hsp->score)){ /* FIXME: use codon_usage scores here instead */ if(prev_hsp->score < curr_hsp->score){ del_prev = TRUE; } else { del_curr = TRUE; } } } if(del_prev) HSP_destroy(prev_hsp); else g_ptr_array_add(new_hsp_list, prev_hsp); prev_hsp = curr_hsp; del_prev = del_curr; } if(del_prev) HSP_destroy(prev_hsp); else g_ptr_array_add(new_hsp_list, prev_hsp); g_ptr_array_free(hsp_set->hsp_list, TRUE); hsp_set->hsp_list = new_hsp_list; } return; } /**/ #define HSPset_SList_PAGE_BIT_WIDTH 10 #define HSPset_SList_PAGE_SIZE (1 << HSPset_SList_PAGE_BIT_WIDTH) RecycleBin *HSPset_SList_RecycleBin_create(void){ return RecycleBin_create("HSPset_Slist", sizeof(HSPset_SList_Node), 4096); } HSPset_SList_Node *HSPset_SList_append(RecycleBin *recycle_bin, HSPset_SList_Node *next, gint query_pos, gint target_pos){ register HSPset_SList_Node *node = RecycleBin_alloc(recycle_bin); node->next = next; node->query_pos = query_pos; node->target_pos = target_pos; return node; } #if 0 typedef struct { gint page_alloc; gint page_total; HSPset_SList_Node **diag_page_list; gint ****horizon; gint *page_used; gint page_used_total; } HSPset_SeedData; static HSPset_SeedData *HSPset_SeedData_create(HSP_Param *hsp_param, gint target_len){ register HSPset_SeedData *seed_data = g_new(HSPset_SeedData, 1); seed_data->page_total = (target_len >> HSPset_SList_PAGE_BIT_WIDTH) + 1; seed_data->page_alloc = seed_data->page_total; seed_data->diag_page_list = g_new0(HSPset_SList_Node*, seed_data->page_total); seed_data->page_used = g_new(gint, seed_data->page_total); seed_data->horizon = (gint****)Matrix4d_create( 2 + ((hsp_param->seed_repeat > 1)?1:0), HSPset_SList_PAGE_SIZE, hsp_param->match->query->advance, hsp_param->match->target->advance, sizeof(gint)); seed_data->page_used_total = 0; return seed_data; } static void HSPset_SeedData_destroy(HSPset_SeedData *seed_data){ g_free(seed_data->diag_page_list); g_free(seed_data->page_used); g_free(seed_data->horizon); g_free(seed_data); return; } static void HSPset_SeedData_set_target_len(HSPset_SeedData *seed_data, HSPset *hsp_set){ register gint new_page_total = (hsp_set->target->len >> HSPset_SList_PAGE_BIT_WIDTH) + 1; register gint i, a, b, c, d; seed_data->page_total = new_page_total; if(seed_data->page_alloc < new_page_total){ seed_data->page_alloc = seed_data->page_total; g_free(seed_data->diag_page_list); seed_data->diag_page_list = g_new(HSPset_SList_Node*, seed_data->page_total); g_free(seed_data->page_used); seed_data->page_used = g_new(gint, seed_data->page_total); } /* Clear diag_page_list */ for(i = 0; i < seed_data->page_total; i++) seed_data->diag_page_list[i] = 0; /* Clear horizon */ for(a = 2 + ((hsp_set->param->seed_repeat > 1)?1:0); a >= 0; a--) for(b = HSPset_SList_PAGE_SIZE; b >= 0; b--) for(c = hsp_set->param->match->query->advance; c >= 0; c--) for(d = hsp_set->param->match->target->advance; d >= 0; d--) seed_data->horizon[a][b][c][d] = 0; seed_data->page_used_total = 0; return; } #endif /* 0 */ void HSPset_seed_all_qy_sorted(HSPset *hsp_set, HSPset_SList_Node *seed_list){ register gint page_total = (hsp_set->target->len >> HSPset_SList_PAGE_BIT_WIDTH) + 1; register HSPset_SList_Node **diag_page_list = g_new0(HSPset_SList_Node*, page_total); register gint *page_used = g_new(gint, page_total); register gint ****horizon = (gint****)Matrix4d_create( 2 + ((hsp_set->param->seed_repeat > 1)?1:0), HSPset_SList_PAGE_SIZE, hsp_set->param->match->query->advance, hsp_set->param->match->target->advance, sizeof(gint)); /* register HSPset_SeedData *seed_data = HSPset_SeedData_create(hsp_set->param, hsp_set->target->len); */ register gint i, page, diag_pos, query_frame, target_frame, section_pos, page_pos; register HSPset_SList_Node *seed; register gint page_used_total = 0; HSP nascent_hsp; /* g_message("[%s] with [%d]", __FUNCTION__, hsp_set->target->len); */ /* HSPset_SeedData_set_target_len(seed_data, hsp_set); */ /* Bin on diagonal into pages */ while(seed_list){ seed = seed_list; seed_list = seed_list->next; /**/ diag_pos = (seed->target_pos * hsp_set->param->match->query->advance) - (seed->query_pos * hsp_set->param->match->target->advance); section_pos = ((diag_pos + hsp_set->target->len) % hsp_set->target->len); page = (section_pos >> HSPset_SList_PAGE_BIT_WIDTH); g_assert(section_pos >= 0); g_assert(section_pos < hsp_set->target->len); g_assert(page >= 0); g_assert(page < page_total); /**/ if(!diag_page_list[page]) page_used[page_used_total++] = page; seed->next = diag_page_list[page]; diag_page_list[page] = seed; } /* Seed each page using page horizon */ for(i = 0; i < page_used_total; i++){ page = page_used[i]; for(seed = diag_page_list[page]; seed; seed = seed->next){ g_assert((!seed->next) || (seed->query_pos <= seed->next->query_pos)); diag_pos = (seed->target_pos * hsp_set->param->match->query->advance) - (seed->query_pos * hsp_set->param->match->target->advance); query_frame = seed->query_pos % hsp_set->param->match->query->advance; target_frame = seed->target_pos % hsp_set->param->match->target->advance; section_pos = ((diag_pos + hsp_set->target->len) % hsp_set->target->len); page_pos = section_pos - (page << HSPset_SList_PAGE_BIT_WIDTH); g_assert(page_pos >= 0); g_assert(page_pos < HSPset_SList_PAGE_SIZE); /* Clear if page has changed */ if(horizon[1][page_pos][query_frame][target_frame] != page){ horizon[0][page_pos][query_frame][target_frame] = 0; horizon[1][page_pos][query_frame][target_frame] = page; if(hsp_set->param->seed_repeat > 1) horizon[2][page_pos][query_frame][target_frame] = 0; } if(seed->query_pos < horizon[0][page_pos][query_frame][target_frame]) continue; if(hsp_set->param->seed_repeat > 1){ if(++horizon[2][page_pos][query_frame][target_frame] < hsp_set->param->seed_repeat){ continue; } horizon[2][page_pos][query_frame][target_frame] = 0; } /* Nascent HSP building: */ nascent_hsp.hsp_set = hsp_set; nascent_hsp.query_start = seed->query_pos; nascent_hsp.target_start = seed->target_pos; nascent_hsp.length = hsp_set->param->seedlen; nascent_hsp.cobs = 0; g_assert(HSP_check(&nascent_hsp)); HSP_trim_ends(&nascent_hsp); /* Score is irrelevant before HSP_init() */ HSP_init(&nascent_hsp); /* Try to make above threshold HSP using masking */ if(hsp_set->param->match->query->mask_func || hsp_set->param->match->target->mask_func){ HSP_extend(&nascent_hsp, TRUE); if(nascent_hsp.score < hsp_set->param->threshold){ horizon[0][page_pos][query_frame][target_frame] = HSP_query_end(&nascent_hsp); continue; } } /* Extend the HSP again ignoring masking */ HSP_extend(&nascent_hsp, FALSE); HSP_store(&nascent_hsp); /**/ horizon[0][page_pos][query_frame][target_frame] = HSP_query_end(&nascent_hsp); } } g_free(diag_page_list); g_free(page_used); g_free(horizon); HSPset_finalise(hsp_set); /* HSPset_SeedData_destroy(seed_data); */ return; } /* horizon[horizon][mailbox][seed_repeat] * [page_size][qadv][tadv] */ /**/ exonerate-2.4.0/src/comparison/match.c0000644000175000017500000010725511162714277014653 00000000000000/****************************************************************\ * * * Match : A module for pairwise symbol comparison * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For toupper() */ #include "match.h" static gchar *Match_Argument_parse_submat(gchar *arg_string, gpointer data){ register Submat **submat = (Submat**)data; gchar *submat_str; register gchar *ret_val = Argument_parse_string(arg_string, &submat_str); if(ret_val) return ret_val; (*submat) = Submat_create(submat_str); return NULL; } static void Match_Argument_cleanup(gpointer user_data){ Match_destroy_all(); return; } Match_ArgumentSet *Match_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static Match_ArgumentSet mas = {0}; if(arg){ as = ArgumentSet_create("Symbol Comparison Options"); ArgumentSet_add_option(as, 0, "softmaskquery", NULL, "Allow softmasking on the query sequence", "FALSE", Argument_parse_boolean, &mas.softmasked_query); ArgumentSet_add_option(as, 0, "softmasktarget", NULL, "Allow softmasking on the target sequence", "FALSE", Argument_parse_boolean, &mas.softmasked_target); /**/ ArgumentSet_add_option(as, 'd', "dnasubmat", "name", "DNA substitution matrix", "nucleic", Match_Argument_parse_submat, &mas.dna_submat); ArgumentSet_add_option(as, 'p', "proteinsubmat", "name", "Protein substitution matrix", "blosum62", Match_Argument_parse_submat, &mas.protein_submat); /**/ Argument_absorb_ArgumentSet(arg, as); Argument_add_cleanup(arg, Match_Argument_cleanup, NULL); /**/ } else { if(!mas.translate){ mas.translate = Translate_create(FALSE); /* FIXME this should be freed somewhere */ } } return &mas; } /**/ #define Match_Type_pair(qt, tt) ((qt) | ((tt) << 8)) Match_Type Match_Type_find(Alphabet_Type query_type, Alphabet_Type target_type, gboolean translate_both){ register Match_Type type = 0; switch(Match_Type_pair(query_type, target_type)){ case Match_Type_pair(Alphabet_Type_DNA, Alphabet_Type_DNA): type = translate_both?Match_Type_CODON2CODON :Match_Type_DNA2DNA; break; case Match_Type_pair(Alphabet_Type_PROTEIN, Alphabet_Type_PROTEIN): type = Match_Type_PROTEIN2PROTEIN; break; case Match_Type_pair(Alphabet_Type_DNA, Alphabet_Type_PROTEIN): type = Match_Type_DNA2PROTEIN; break; case Match_Type_pair(Alphabet_Type_PROTEIN, Alphabet_Type_DNA): type = Match_Type_PROTEIN2DNA; break; default: g_error("Do not have a match type for [%s] [%s]", Alphabet_Type_get_name(query_type), Alphabet_Type_get_name(target_type)); break; } return type; } gchar *Match_Type_get_name(Match_Type type){ register gchar *name = NULL; switch(type){ case Match_Type_DNA2DNA: name = "dna2dna"; break; case Match_Type_PROTEIN2PROTEIN: name = "protein2protein"; break; case Match_Type_DNA2PROTEIN: name = "dna2protein"; break; case Match_Type_PROTEIN2DNA: name = "protein2dna"; break; case Match_Type_CODON2CODON: name = "codon"; break; default: g_error("Bad Match_Type [%d]", type); break; } return name; } static Match_Type Match_Type_mirror(Match_Type type){ register Match_Type mirror = 0; switch(type){ case Match_Type_DNA2DNA: /*fallthrough*/ case Match_Type_PROTEIN2PROTEIN: /*fallthrough*/ case Match_Type_CODON2CODON: /*fallthrough*/ mirror = type; break; case Match_Type_DNA2PROTEIN: mirror = Match_Type_PROTEIN2DNA; break; case Match_Type_PROTEIN2DNA: mirror = Match_Type_DNA2PROTEIN; break; default: g_error("Bad Match_Type for mirroring [%d]", type); } return mirror; } /**/ static gboolean Match_Strand_pos_is_valid(Match_Strand *strand, gint pos, gint length){ g_assert(pos >= 0); g_assert((pos+strand->advance) <= length); return TRUE; } #define Match_seqpos_is_masked(sequence, pos) \ Alphabet_is_masked((sequence)->alphabet, \ Sequence_get_symbol(sequence, pos)) /**/ static Match_Score Match_1_dna_self_func(Match_Strand *strand, Sequence *seq, guint pos){ register gchar symbol; g_assert(Match_Strand_pos_is_valid(strand, pos, seq->len)); symbol = Sequence_get_symbol(seq, pos); return Submat_lookup(strand->match->mas->dna_submat, symbol, symbol); } static Match_Score Match_1_protein_self_func(Match_Strand *strand, Sequence *seq, guint pos){ register gchar symbol; g_assert(Match_Strand_pos_is_valid(strand, pos, seq->len)); symbol = Sequence_get_symbol(seq, pos); return Submat_lookup(strand->match->mas->protein_submat, symbol, symbol); } static gboolean Match_1_mask_func(Match_Strand *strand, Sequence *seq, guint pos){ g_assert(Match_Strand_pos_is_valid(strand, pos, seq->len)); return Match_seqpos_is_masked(seq, pos); } /**/ static Match_Score Match_3_translate_self_func(Match_Strand *strand, Sequence *seq, guint pos){ register gchar symbol; g_assert(Match_Strand_pos_is_valid(strand, pos, seq->len)); symbol = Translate_base(strand->match->mas->translate, Sequence_get_symbol(seq, pos), Sequence_get_symbol(seq, pos+1), Sequence_get_symbol(seq, pos+2)); return Submat_lookup(strand->match->mas->protein_submat, symbol, symbol); } #if 0 static Match_Score Match_3_codon_self_func(Match_Strand *strand, Sequence *seq, guint pos){ register gint codon; g_assert(Match_Strand_pos_is_valid(strand, pos, seq->len)); codon = CodonSubmat_get_codon_base(strand->match->mas->codon_submat, Sequence_get_symbol(seq, pos), Sequence_get_symbol(seq, pos+1), Sequence_get_symbol(seq, pos+2)); return CodonSubmat_lookup_codon(strand->match->mas->codon_submat, codon, codon); } #endif /* 0 */ static gboolean Match_3_mask_func(Match_Strand *strand, Sequence *seq, guint pos){ g_assert(Match_Strand_pos_is_valid(strand, pos, seq->len)); if((Match_seqpos_is_masked(seq, pos)) || (Match_seqpos_is_masked(seq, pos+1)) || (Match_seqpos_is_masked(seq, pos+2))) return TRUE; return FALSE; } /**/ static gchar Match_get_display_symbol(Submat *submat, gchar query_symbol, gchar target_symbol){ register gint score; if(toupper(query_symbol) == toupper(target_symbol)) return '|'; score = Submat_lookup(submat, query_symbol, target_symbol); if(score == 0) return '.'; if(score > 0) return ':'; return ' '; } static gboolean Match_pos_pair_is_valid(Match *match, Sequence *query, Sequence *target, guint query_pos, guint target_pos){ g_assert(Match_Strand_pos_is_valid(match->query, query_pos, query->len)); g_assert(Match_Strand_pos_is_valid(match->target, target_pos, target->len)); return TRUE; } /**/ static Match_Score Match_invalid_split_score_func(Match *match, Sequence *query, Sequence *target, guint qp1, guint qp2, guint qp3, guint tp1, guint tp2, guint tp3){ g_error("Match [%s] cannot use a split score func", Match_Type_get_name(match->type)); return 0; } static void Match_invalid_split_display_func(Match *match, Sequence *query, Sequence *target, guint qp1, guint qp2, guint qp3, guint tp1, guint tp2, guint tp3, gchar *display_str){ g_error("Match [%s] cannot use a split display func", Match_Type_get_name(match->type)); return; } /**/ static Match_Score Match_1_1_dna_score_func(Match *match, Sequence *query, Sequence *target, guint query_pos, guint target_pos){ g_assert(Match_pos_pair_is_valid(match, query, target, query_pos, target_pos)); if(query->annotation) if(query->alphabet->type == Alphabet_Type_DNA) if((query_pos >= query->annotation->cds_start) && (query_pos < (query->annotation->cds_start +query->annotation->cds_length))) return MATCH_IMPOSSIBLY_LOW_SCORE; return Submat_lookup(match->mas->dna_submat, Sequence_get_symbol(query, query_pos), Sequence_get_symbol(target, target_pos)); } static Match_Score Match_1_1_protein_score_func(Match *match, Sequence *query, Sequence *target, guint query_pos, guint target_pos){ g_assert(Match_pos_pair_is_valid(match, query, target, query_pos, target_pos)); return Submat_lookup(match->mas->protein_submat, Sequence_get_symbol(query, query_pos), Sequence_get_symbol(target, target_pos)); } static gchar *Match_1_1_dna_score_macro(void){ return "((ud->query->annotation)\n" "&& (%QP >= ud->query->annotation->cds_start)\n" "&& (%QP < (ud->query->annotation->cds_start\n" " +ud->query->annotation->cds_length)))\n" "?MATCH_IMPOSSIBLY_LOW_SCORE\n" ":Submat_lookup(ud->mas->dna_submat,\n" " Sequence_get_symbol(ud->query, %QP),\n" " Sequence_get_symbol(ud->target, %TP))"; } static gchar *Match_1_1_protein_score_macro(void){ return "Submat_lookup(ud->mas->protein_submat,\n" " Sequence_get_symbol(ud->query, %QP),\n" " Sequence_get_symbol(ud->target, %TP))"; } static void Match_1_1_display_func(Match *match, Sequence *query, Sequence *target, guint query_pos, guint target_pos, gchar *display_str){ register Submat *submat = (match->type == Match_Type_DNA2DNA) ? match->mas->dna_submat : match->mas->protein_submat; g_assert(Match_pos_pair_is_valid(match, query, target, query_pos, target_pos)); display_str[0] = Match_get_display_symbol(submat, Sequence_get_symbol(query, query_pos), Sequence_get_symbol(target, target_pos)); display_str[1] = '\0'; return; } /**/ static Match_Score Match_1_3_split_score_func(Match *match, Sequence *query, Sequence *target, guint qp1, guint qp2, guint qp3, guint tp1, guint tp2, guint tp3){ register gchar query_symbol, target_symbol; /* FIXME: need CDS checks here */ query_symbol = Sequence_get_symbol(query, qp1); target_symbol = Translate_base(match->mas->translate, Sequence_get_symbol(target, tp1), Sequence_get_symbol(target, tp2), Sequence_get_symbol(target, tp3)); return Submat_lookup(match->mas->protein_submat, query_symbol, target_symbol); } static Match_Score Match_1_3_score_func(Match *match, Sequence *query, Sequence *target, guint query_pos, guint target_pos){ g_assert(Match_pos_pair_is_valid(match, query, target, query_pos, target_pos)); return Match_1_3_split_score_func(match, query, target, query_pos, -1, -1, target_pos, target_pos+1, target_pos+2); } static gchar *Match_1_3_score_macro(void){ return "Submat_lookup(ud->mas->protein_submat,\n" " Sequence_get_symbol(ud->query, %QP),\n" " Translate_base(ud->mas->translate,\n" " Sequence_get_symbol(ud->target, %TP),\n" " Sequence_get_symbol(ud->target, %TP+1),\n" " Sequence_get_symbol(ud->target, %TP+2)))\n"; } typedef struct { gchar *display_string; gchar codon_a; gchar codon_b; gchar codon_c; } Match_RT_Data; static void Match_1_3_display_reverse_translate(gchar *dna, gint length, gpointer user_data){ register Match_RT_Data *mrtd = user_data; if(dna[0] == mrtd->codon_a) mrtd->display_string[0] = '!'; if(dna[1] == mrtd->codon_b) mrtd->display_string[1] = '!'; if(dna[2] == mrtd->codon_c) mrtd->display_string[2] = '!'; return; } static void Match_1_3_split_display_func(Match *match, Sequence *query, Sequence *target, guint qp1, guint qp2, guint qp3, guint tp1, guint tp2, guint tp3, gchar *display_str){ register gchar query_symbol, target_symbol, display_symbol; Match_RT_Data mrtd; gchar aaseq[2]; query_symbol = Sequence_get_symbol(query, qp1); target_symbol = Translate_base(match->mas->translate, Sequence_get_symbol(target, tp1), Sequence_get_symbol(target, tp2), Sequence_get_symbol(target, tp3)); display_symbol = Match_get_display_symbol(match->mas->protein_submat, query_symbol, target_symbol); display_str[0] = display_symbol; display_str[1] = display_symbol; display_str[2] = display_symbol; display_str[3] = '\0'; if(query_symbol != target_symbol){ /* Label with reverse translation */ mrtd.display_string = display_str; mrtd.codon_a = Sequence_get_symbol(target, tp1); mrtd.codon_b = Sequence_get_symbol(target, tp2); mrtd.codon_c = Sequence_get_symbol(target, tp3); aaseq[0] = query_symbol; aaseq[1] = '\0'; Translate_reverse(match->mas->translate, aaseq, 1, Match_1_3_display_reverse_translate, &mrtd); } return; } static void Match_1_3_display_func(Match *match, Sequence *query, Sequence *target, guint query_pos, guint target_pos, gchar *display_str){ g_assert(Match_pos_pair_is_valid(match, query, target, query_pos, target_pos)); Match_1_3_split_display_func(match, query, target, query_pos, -1, -1, target_pos, target_pos+1, target_pos+2, display_str); return; } /**/ static Match_Score Match_3_1_split_score_func(Match *match, Sequence *query, Sequence *target, guint qp1, guint qp2, guint qp3, guint tp1, guint tp2, guint tp3){ return Match_1_3_split_score_func(Match_swap(match), target, query, tp1, tp2, tp3, qp1, qp2, qp3); } static Match_Score Match_3_1_score_func(Match *match, Sequence *query, Sequence *target, guint query_pos, guint target_pos){ return Match_1_3_score_func(Match_swap(match), target, query, target_pos, query_pos); } static gchar *Match_3_1_score_macro(void){ return "Submat_lookup(ud->mas->protein_submat,\n" " Translate_base(ud->mas->translate,\n" " Sequence_get_symbol(ud->query, %QP),\n" " Sequence_get_symbol(ud->query, %QP+1),\n" " Sequence_get_symbol(ud->query, %QP+2)),\n" " Sequence_get_symbol(ud->target, %TP))\n"; } static void Match_3_1_split_display_func(Match *match, Sequence *query, Sequence *target, guint qp1, guint qp2, guint qp3, guint tp1, guint tp2, guint tp3, gchar *display_str){ Match_1_3_split_display_func(Match_swap(match), target, query, tp1, tp2, tp3, qp1, qp2, qp3, display_str); return; } static void Match_3_1_display_func(Match *match, Sequence *query, Sequence *target, guint query_pos, guint target_pos, gchar *display_str){ Match_1_3_display_func(Match_swap(match), target, query, target_pos, query_pos, display_str); return; } /**/ #if 0 static Match_Score Match_3_3_split_score_func(Match *match, Sequence *query, Sequence *target, guint qp1, guint qp2, guint qp3, guint tp1, guint tp2, guint tp3){ if(query->annotation) if(query->alphabet->type == Alphabet_Type_DNA) if((qp1 < query->annotation->cds_start) || (qp1 >= (query->annotation->cds_start +query->annotation->cds_length)) || ((qp1 % 3) != (query->annotation->cds_start % 3))) return MATCH_IMPOSSIBLY_LOW_SCORE; return CodonSubmat_lookup_base(match->mas->codon_submat, Sequence_get_symbol(query, qp1), Sequence_get_symbol(query, qp2), Sequence_get_symbol(query, qp3), Sequence_get_symbol(target, tp1), Sequence_get_symbol(target, tp2), Sequence_get_symbol(target, tp3)); } #endif /* 0 */ /* FIXME: temp */ static Match_Score Match_3_3_split_score_func(Match *match, Sequence *query, Sequence *target, guint qp1, guint qp2, guint qp3, guint tp1, guint tp2, guint tp3){ register gchar query_symbol, target_symbol; if(query->annotation) if(query->alphabet->type == Alphabet_Type_DNA) if((qp1 < query->annotation->cds_start) || (qp1 >= (query->annotation->cds_start +query->annotation->cds_length)) || ((qp1 % 3) != (query->annotation->cds_start % 3))) return MATCH_IMPOSSIBLY_LOW_SCORE; query_symbol = Translate_base(match->mas->translate, Sequence_get_symbol(query, qp1), Sequence_get_symbol(query, qp2), Sequence_get_symbol(query, qp3)); target_symbol = Translate_base(match->mas->translate, Sequence_get_symbol(target, tp1), Sequence_get_symbol(target, tp2), Sequence_get_symbol(target, tp3)); return Submat_lookup(match->mas->protein_submat, query_symbol, target_symbol); } static Match_Score Match_3_3_codon_score_func(Match *match, Sequence *query, Sequence *target, guint query_pos, guint target_pos){ g_assert(Match_pos_pair_is_valid(match, query, target, query_pos, target_pos)); return Match_3_3_split_score_func(match, query, target, query_pos, query_pos+1, query_pos+2, target_pos, target_pos+1, target_pos+2); } #if 0 static gchar *Match_3_3_score_macro(void){ return "((ud->query->annotation)\n" " && (ud->query->alphabet->type == Alphabet_Type_DNA)\n" " && ( ((%QP) < ud->query->annotation->cds_start)\n" " || ((%QP) >= (ud->query->annotation->cds_start\n" " +ud->query->annotation->cds_length))\n" " || (((%QP) %% 3) != (ud->query->annotation->cds_start %% 3))))\n" "?MATCH_IMPOSSIBLY_LOW_SCORE\n" ":CodonSubmat_lookup_base(ud->mas->codon_submat,\n" " Sequence_get_symbol(ud->query, %QP),\n" " Sequence_get_symbol(ud->query, %QP+1),\n" " Sequence_get_symbol(ud->query, %QP+2),\n" " Sequence_get_symbol(ud->target, %TP),\n" " Sequence_get_symbol(ud->target, %TP+1),\n" " Sequence_get_symbol(ud->target, %TP+2))\n"; } /* FIXME: optimisations: * remove seq type check * precompute cds_start % 3 */ #endif /* 0 */ /* FIXME: temp */ static gchar *Match_3_3_score_macro(void){ return "((ud->query->annotation)\n" " && (ud->query->alphabet->type == Alphabet_Type_DNA)\n" " && ( ((%QP) < ud->query->annotation->cds_start)\n" " || ((%QP) >= (ud->query->annotation->cds_start\n" " +ud->query->annotation->cds_length))\n" " || (((%QP) %% 3) != (ud->query->annotation->cds_start %% 3))))\n" "?MATCH_IMPOSSIBLY_LOW_SCORE\n" ":Submat_lookup(ud->mas->protein_submat,\n" " Translate_base(ud->mas->translate, \n" " Sequence_get_symbol(ud->query, %QP),\n" " Sequence_get_symbol(ud->query, %QP+1),\n" " Sequence_get_symbol(ud->query, %QP+2)),\n" " Translate_base(ud->mas->translate, \n" " Sequence_get_symbol(ud->target, %TP),\n" " Sequence_get_symbol(ud->target, %TP+1),\n" " Sequence_get_symbol(ud->target, %TP+2)))\n"; } /* FIXME: optimisations: * remove seq type check * precompute cds_start % 3 */ static void Match_3_3_split_display_func(Match *match, Sequence *query, Sequence *target, guint qp1, guint qp2, guint qp3, guint tp1, guint tp2, guint tp3, gchar *display_str){ register gchar query_symbol, target_symbol, display_symbol; query_symbol = Translate_base(match->mas->translate, Sequence_get_symbol(query, qp1), Sequence_get_symbol(query, qp2), Sequence_get_symbol(query, qp3)); target_symbol = Translate_base(match->mas->translate, Sequence_get_symbol(target, tp1), Sequence_get_symbol(target, tp2), Sequence_get_symbol(target, tp3)); display_symbol = Match_get_display_symbol( match->mas->protein_submat, query_symbol, target_symbol); /**/ display_str[0] = display_symbol; if(query_symbol == target_symbol){ if(toupper(Sequence_get_symbol(query, qp1)) != toupper(Sequence_get_symbol(target, tp1))) display_str[0] = '+'; /* Mark Silent */ } else { if(toupper(Sequence_get_symbol(query, qp1)) == toupper(Sequence_get_symbol(target, tp1))) display_str[0] = '!'; /* Mark Conserved */ } /**/ display_str[1] = display_symbol; if(query_symbol == target_symbol){ if(toupper(Sequence_get_symbol(query, qp2)) != toupper(Sequence_get_symbol(target, tp2))) display_str[1] = '+'; /* Mark Silent */ } else { if(toupper(Sequence_get_symbol(query, qp2)) == toupper(Sequence_get_symbol(target, tp2))) display_str[1] = '!'; /* Mark Conserved */ } /**/ display_str[2] = display_symbol; if(query_symbol == target_symbol){ if(toupper(Sequence_get_symbol(query, qp3)) != toupper(Sequence_get_symbol(target, tp3))) display_str[2] = '+'; /* Mark Silent */ } else { if(toupper(Sequence_get_symbol(query, qp3)) == toupper(Sequence_get_symbol(target, tp3))) display_str[2] = '!'; /* Mark Conserved */ } /**/ display_str[3] = '\0'; return; } static void Match_3_3_display_func(Match *match, Sequence *query, Sequence *target, guint query_pos, guint target_pos, gchar *display_str){ g_assert(Match_pos_pair_is_valid(match, query, target, query_pos, target_pos)); Match_3_3_split_display_func(match, query, target, query_pos, query_pos+1, query_pos+2, target_pos, target_pos+1, target_pos+2, display_str); return; } /**/ static Match_Strand *Match_Strand_create(Alphabet_Type alphabet_type, gboolean is_softmasked, gboolean is_translated, guint advance, Match *match){ register Match_Strand *strand = g_new(Match_Strand, 1); strand->alphabet = Alphabet_create(alphabet_type, is_softmasked); strand->advance = advance; strand->is_translated = is_translated; switch(advance){ case 1: strand->self_func = (alphabet_type == Alphabet_Type_DNA) ? Match_1_dna_self_func : Match_1_protein_self_func; strand->mask_func = Match_1_mask_func; break; case 3: /* strand->self_func = is_translated ? Match_3_translate_self_func : Match_3_codon_self_func; */ strand->self_func = Match_3_translate_self_func; /* FIXME: temp */ strand->mask_func = Match_3_mask_func; break; default: g_error("Bad advance [%d]", advance); break; } strand->match = match; return strand; } static void Match_Strand_destroy(Match_Strand *strand){ Alphabet_destroy(strand->alphabet); g_free(strand); return; } void Match_Strand_get_raw(Match_Strand *strand, Sequence *sequence, guint pos, gchar *result){ register gint i; g_assert(result); for(i = 0; i < strand->advance; i++) result[i] = Sequence_get_symbol(sequence, pos+i); result[i] = '\0'; return; } /**/ gchar *Match_Type_get_score_macro(Match_Type type){ register gchar *macro = NULL; switch(type){ case Match_Type_DNA2DNA: macro = Match_1_1_dna_score_macro(); break; case Match_Type_PROTEIN2PROTEIN: macro = Match_1_1_protein_score_macro(); break; case Match_Type_DNA2PROTEIN: macro = Match_3_1_score_macro(); break; case Match_Type_PROTEIN2DNA: macro = Match_1_3_score_macro(); break; case Match_Type_CODON2CODON: macro = Match_3_3_score_macro(); break; default: g_error("Bad Match_Type [%d]", type); break; } g_assert(macro); return macro; } /**/ static Match *Match_create_without_mirror(Match_Type type){ register Match *match = g_new(Match, 1); register gboolean softmask_comparison; match->mas = Match_ArgumentSet_create(NULL); match->type = type; g_assert(match->mas->dna_submat); g_assert(match->mas->protein_submat); softmask_comparison = match->mas->softmasked_query || match->mas->softmasked_target; switch(type){ case Match_Type_DNA2DNA: match->query = Match_Strand_create(Alphabet_Type_DNA, match->mas->softmasked_query, FALSE, 1, match); match->target = Match_Strand_create(Alphabet_Type_DNA, match->mas->softmasked_target, FALSE, 1, match); match->comparison_alphabet = Alphabet_create(Alphabet_Type_DNA, softmask_comparison); match->score_func = Match_1_1_dna_score_func; match->display_func = Match_1_1_display_func; match->split_score_func = Match_invalid_split_score_func; match->split_display_func = Match_invalid_split_display_func; break; case Match_Type_CODON2CODON: /* FIXME: should do this for mixed models only ?? */ #if 0 g_assert(match->mas->dna_submat); g_assert(match->mas->codon_submat); CodonSubmat_add_nucleic(match->mas->codon_submat, match->mas->dna_submat); #endif /* 0 */ /**/ match->query = Match_Strand_create(Alphabet_Type_DNA, match->mas->softmasked_query, FALSE, 3, match); match->target = Match_Strand_create(Alphabet_Type_DNA, match->mas->softmasked_target, FALSE, 3, match); match->comparison_alphabet = Alphabet_create(Alphabet_Type_PROTEIN, softmask_comparison); match->score_func = Match_3_3_codon_score_func; match->display_func = Match_3_3_display_func; match->split_score_func = Match_3_3_split_score_func; match->split_display_func = Match_3_3_split_display_func; break; case Match_Type_PROTEIN2PROTEIN: match->query = Match_Strand_create(Alphabet_Type_PROTEIN, match->mas->softmasked_query, FALSE, 1, match); match->target = Match_Strand_create(Alphabet_Type_PROTEIN, match->mas->softmasked_target, FALSE, 1, match); match->comparison_alphabet = Alphabet_create(Alphabet_Type_PROTEIN, softmask_comparison); match->score_func = Match_1_1_protein_score_func; match->display_func = Match_1_1_display_func; match->split_score_func = Match_invalid_split_score_func; match->split_display_func = Match_invalid_split_display_func; break; case Match_Type_DNA2PROTEIN: match->query = Match_Strand_create(Alphabet_Type_DNA, match->mas->softmasked_query, TRUE, 3, match); match->target = Match_Strand_create(Alphabet_Type_PROTEIN, match->mas->softmasked_target, FALSE, 1, match); match->comparison_alphabet = Alphabet_create(Alphabet_Type_PROTEIN, softmask_comparison); match->score_func = Match_3_1_score_func; match->display_func = Match_3_1_display_func; match->split_score_func = Match_3_1_split_score_func; match->split_display_func = Match_3_1_split_display_func; break; case Match_Type_PROTEIN2DNA: match->query = Match_Strand_create(Alphabet_Type_PROTEIN, match->mas->softmasked_query, FALSE, 1, match); match->target = Match_Strand_create(Alphabet_Type_DNA, match->mas->softmasked_target, TRUE, 3, match); /* protein_submat is used in target strand * because the comparison is done at the protein level */ match->comparison_alphabet = Alphabet_create(Alphabet_Type_PROTEIN, softmask_comparison); match->score_func = Match_1_3_score_func; match->display_func = Match_1_3_display_func; match->split_score_func = Match_1_3_split_score_func; match->split_display_func = Match_1_3_split_display_func; break; default: g_error("Unknown Match Type [%d]", type); break; } match->mirror = NULL; return match; } static Match *local_match_cache[Match_Type_TOTAL] = {0}; static Match *Match_create(Match_Type type){ register Match *match; register Match_Type mirror_type = Match_Type_mirror(type); match = Match_create_without_mirror(type); match->mirror = Match_create_without_mirror(mirror_type); match->mirror->mirror = match; return match; } Match *Match_find(Match_Type type){ if(!local_match_cache[type]){ local_match_cache[type] = Match_create(type); local_match_cache[Match_Type_mirror(type)] = local_match_cache[type]->mirror; } return local_match_cache[type]; } static void Match_destroy_without_mirror(Match *match){ g_assert(match); match->mirror = NULL; Match_Strand_destroy(match->query); Match_Strand_destroy(match->target); Alphabet_destroy(match->comparison_alphabet); local_match_cache[match->type] = NULL; g_free(match); return; } static void Match_destroy(Match *match){ g_assert(match); Match_destroy_without_mirror(match->mirror); Match_destroy_without_mirror(match); return; } void Match_destroy_all(void){ register gint i; register Match *match; for(i = 0; i < Match_Type_TOTAL; i++){ match = local_match_cache[i]; if(match){ local_match_cache[i] = NULL; local_match_cache[match->mirror->type] = NULL; Match_destroy(match); } } return; } Match *Match_swap(Match *match){ return match->mirror; } Match_Score Match_max_score(Match *match){ switch(match->type){ case Match_Type_DNA2DNA: return Submat_max_score(match->mas->dna_submat); case Match_Type_PROTEIN2PROTEIN: case Match_Type_PROTEIN2DNA: case Match_Type_DNA2PROTEIN: case Match_Type_CODON2CODON: return Submat_max_score(match->mas->protein_submat); default: g_error("Unknown match type [%d]", match->type); } return 0; } /**/ exonerate-2.4.0/src/comparison/pcr.test.c0000644000175000017500000000404011162714277015305 00000000000000/****************************************************************\ * * * Library for PCR simulation * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "pcr.h" static gboolean pcr_test_report_func(Sequence *sequence, PCR_Match *match_a, PCR_Match *match_b, gint product_length, gpointer user_data){ g_message("Have match"); return FALSE; } gint Argument_main(Argument *arg){ register PCR *pcr = PCR_create(pcr_test_report_func, NULL, 1, 0); register Alphabet *alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE); register Sequence *sequence = Sequence_create("testseq", NULL, "NNNNACGTNNNAACCNNNNNNNNNNNNNNNNNNNNNNNNNCCGGNNGGTTNN", 0, Sequence_Strand_FORWARD, alphabet); /* PCR_add_experiment(pcr, "test1", "ACGT", "AAKC", 10, 15); */ PCR_add_experiment(pcr, "test1", "ACGT", "AACC", 10, 15); PCR_add_experiment(pcr, "test2", "CCGG", "GNGT", 20, 25); PCR_add_experiment(pcr, "test3", "CCG", "NGT", 20, 25); PCR_add_experiment(pcr, "test3", "CC", "GT", 20, 25); PCR_prepare(pcr); PCR_simulate(pcr, sequence); PCR_destroy(pcr); Sequence_destroy(sequence); Alphabet_destroy(alphabet); return 0; } exonerate-2.4.0/src/comparison/pcr.c0000644000175000017500000004257011162714277014341 00000000000000/****************************************************************\ * * * Library for PCR simulation * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "pcr.h" #include "submat.h" #include "sequence.h" #include /* For strlen() */ /**/ static PCR_Match *PCR_Match_allocate(PCR *pcr){ register PCR_Match *pcr_match; if(pcr->match_recycle){ pcr_match = pcr->match_recycle; /* pcr_match->pcr_probe is used as the next pointer */ pcr->match_recycle = (PCR_Match*)pcr_match->pcr_probe; } else { pcr_match = g_chunk_new(PCR_Match, pcr->match_mem_chunk); } return pcr_match; } static PCR_Match *PCR_Match_create(PCR_Probe *pcr_probe, gint position, gint mismatch){ register PCR *pcr = pcr_probe->pcr_primer->pcr_experiment->pcr; register PCR_Match *pcr_match = PCR_Match_allocate(pcr); pcr_match->pcr_probe = pcr_probe; pcr_match->position = position; pcr_match->mismatch = mismatch; return pcr_match; } static void PCR_Match_destroy(PCR_Match *pcr_match){ register PCR *pcr = pcr_match->pcr_probe->pcr_primer->pcr_experiment->pcr; pcr_match->pcr_probe = (PCR_Probe*)pcr->match_recycle; pcr->match_recycle = pcr_match; return; } /**/ static PCR_Probe *PCR_Probe_create(PCR_Primer *pcr_primer, Sequence_Strand strand, gchar *probe, gint mismatch){ register PCR *pcr = pcr_primer->pcr_experiment->pcr; register PCR_Probe *pcr_probe = g_chunk_new(PCR_Probe, pcr->probe_mem_chunk); pcr_probe->pcr_primer = pcr_primer; pcr_probe->strand = strand; pcr_probe->mismatch = mismatch; return pcr_probe; } static void PCR_Probe_destroy(PCR_Probe *pcr_probe){ register PCR *pcr = pcr_probe->pcr_primer->pcr_experiment->pcr; g_mem_chunk_free(pcr->probe_mem_chunk, pcr_probe); return; } static void PCR_Probe_register_hit(PCR_Probe *pcr_probe, Sequence *sequence, guint seq_pos){ register PCR_Match *pcr_match, *prev_match; register PCR_Primer *pcr_primer = pcr_probe->pcr_primer; register PCR_Experiment *pcr_experiment = pcr_primer->pcr_experiment; register PCR *pcr = pcr_experiment->pcr; register gint i, product_length, mismatch = pcr_probe->mismatch; register SListNode *node; register gint match_start; if(pcr_probe->strand == Sequence_Strand_FORWARD){ match_start = seq_pos - pcr_primer->probe_len + 1; } else { match_start = seq_pos - pcr_primer->length + 1; } if(match_start < 0) return; /* Too close to sequence start */ if((match_start + pcr_primer->length) > sequence->len) return; /* Too close to sequence end */ /* Count number of mismatches in rest of primer */ g_assert(pcr_primer); if(pcr_probe->strand == Sequence_Strand_FORWARD){ for(i = pcr_primer->probe_len; i < pcr_primer->length; i++){ if(pcr_primer->forward[i] != Sequence_get_symbol(sequence, match_start+i)){ mismatch++; if(mismatch > pcr->mismatch_threshold) return; /* Match failed to extend */ } } } else { g_assert(pcr_primer); for(i = 0; i < (pcr_primer->length-pcr_primer->probe_len); i++){ if(pcr_primer->revcomp[i] != Sequence_get_symbol(sequence, match_start+i)){ mismatch++; if(mismatch > pcr->mismatch_threshold) return; /* Match failed to extend */ } } } /* Clear any matches longer in range */ while(!SList_isempty(pcr_experiment->match_list)){ prev_match = SList_first(pcr_experiment->match_list); product_length = match_start - prev_match->position + pcr_probe->pcr_primer->length; if(product_length <= pcr_experiment->max_product_len) break; prev_match = SList_pop(pcr_experiment->match_list); PCR_Match_destroy(prev_match); } /* Create a match for this probe */ pcr_match = PCR_Match_create(pcr_probe, match_start, mismatch); /* Report hit with any previous matches in range */ SList_for_each(pcr_experiment->match_list, node){ prev_match = node->data; product_length = match_start - prev_match->position + pcr_probe->pcr_primer->length; if(product_length < pcr_experiment->min_product_len) break; /* Primers must be on different strands */ if(prev_match->pcr_probe->strand != pcr_match->pcr_probe->strand) /* Primers must be facing each other */ if((prev_match->pcr_probe->strand == Sequence_Strand_FORWARD) && (pcr_match->pcr_probe->strand == Sequence_Strand_REVCOMP)) pcr->report_func(sequence, prev_match, pcr_match, product_length, pcr->user_data); } /* Record this match */ SList_queue(pcr_experiment->match_list, pcr_match); return; } /**/ static PCR_Sensor *PCR_Sensor_create(PCR *pcr){ register PCR_Sensor *pcr_sensor = g_chunk_new(PCR_Sensor, pcr->sensor_mem_chunk); pcr_sensor->owned_probe_list = SList_create(pcr->slist_set); pcr_sensor->borrowed_sensor_list = SList_create(pcr->slist_set); pcr->sensor_count++; return pcr_sensor; } static void PCR_Sensor_destroy(PCR_Sensor *pcr_sensor, PCR *pcr){ register PCR_Probe *pcr_probe; if(pcr_sensor->owned_probe_list){ /* Maybe removed in merge */ while(!SList_isempty(pcr_sensor->owned_probe_list)){ pcr_probe = SList_pop(pcr_sensor->owned_probe_list); PCR_Probe_destroy(pcr_probe); } SList_destroy(pcr_sensor->owned_probe_list); } SList_destroy(pcr_sensor->borrowed_sensor_list); g_mem_chunk_free(pcr->sensor_mem_chunk, pcr_sensor); pcr->sensor_count--; return; } static gpointer PCR_FSM_merge_PCR_Sensor(gpointer a, gpointer b, gpointer user_data){ register PCR *pcr = user_data; register PCR_Sensor *ps_a = a, *ps_b = b; g_assert(ps_a); g_assert(ps_b); g_assert(SList_isempty(ps_a->borrowed_sensor_list)); g_assert(SList_isempty(ps_b->borrowed_sensor_list)); ps_a->owned_probe_list = SList_join(ps_a->owned_probe_list, ps_b->owned_probe_list); SList_destroy(ps_b->owned_probe_list); ps_b->owned_probe_list = NULL; PCR_Sensor_destroy(ps_b, pcr); return ps_a; } /* Merge two PCR_Sensors which have the same sequence */ static gpointer PCR_FSM_combine_PCR_Sensor(gpointer a, gpointer b, gpointer user_data){ register PCR_Sensor *ps_a = a, *ps_b = b; g_assert(ps_a); g_assert(ps_b); /**/ SList_queue(ps_a->borrowed_sensor_list, ps_b); return ps_a; } /* Join two PCR_Sensors when one is a subsequence of the other */ /**/ static void PCR_Primer_add_probe(PCR_Primer *pcr_primer, Sequence_Strand strand, gchar *primer, gint mismatch){ register PCR_Probe *pcr_probe = PCR_Probe_create(pcr_primer, strand, primer, mismatch); register PCR_Sensor *pcr_sensor = PCR_Sensor_create( pcr_primer->pcr_experiment->pcr); SList_queue(pcr_sensor->owned_probe_list, pcr_probe); g_ptr_array_add(pcr_primer->probe_list, pcr_probe); FSM_add(pcr_primer->pcr_experiment->pcr->fsm, primer, pcr_primer->probe_len, pcr_sensor); return; } static gboolean PCR_Primer_Wordhood_traverse_func(gchar *word, gint score, gpointer user_data){ register PCR_Primer *pcr_primer = user_data; register gint mismatch = pcr_primer->probe_len-score; PCR_Primer_add_probe(pcr_primer, Sequence_Strand_FORWARD, word, mismatch); Sequence_revcomp_in_place(word, pcr_primer->probe_len); PCR_Primer_add_probe(pcr_primer, Sequence_Strand_REVCOMP, word, mismatch); Sequence_revcomp_in_place(word, pcr_primer->probe_len); return FALSE; } static PCR_Primer *PCR_Primer_create(PCR_Experiment *pcr_experiment, gchar *primer){ register PCR_Primer *pcr_primer = g_new(PCR_Primer, 1); g_assert(pcr_experiment); g_assert(primer); pcr_primer->pcr_experiment = pcr_experiment; pcr_primer->length = strlen(primer); if((pcr_experiment->pcr->seed_length) && (pcr_primer->length > pcr_experiment->pcr->seed_length)) pcr_primer->probe_len = pcr_experiment->pcr->seed_length; else pcr_primer->probe_len = pcr_primer->length; pcr_primer->forward = g_string_chunk_insert( pcr_experiment->pcr->string_chunk, primer); pcr_primer->revcomp = g_string_chunk_insert( pcr_experiment->pcr->string_chunk, primer); Sequence_revcomp_in_place(pcr_primer->revcomp, pcr_primer->length); g_strup(pcr_primer->forward); g_strup(pcr_primer->revcomp); pcr_primer->probe_list = g_ptr_array_new(); WordHood_traverse(pcr_experiment->pcr->wordhood, PCR_Primer_Wordhood_traverse_func, pcr_primer->forward, pcr_primer->probe_len, pcr_primer); return pcr_primer; } static void PCR_Primer_destroy(PCR_Primer *pcr_primer){ register gint i; for(i = 0; i < pcr_primer->probe_list->len; i++) PCR_Probe_destroy(pcr_primer->probe_list->pdata[i]); g_ptr_array_free(pcr_primer->probe_list, TRUE); g_free(pcr_primer); return; } static gsize PCR_Primer_memory_usage(PCR_Primer *pcr_primer){ register gsize primer_size = sizeof(gchar) * pcr_primer->length; return sizeof(PCR_Primer) + (primer_size * 2) + (pcr_primer->probe_list->len * (sizeof(PCR_Probe) + sizeof(gpointer))); } /**/ static PCR_Experiment *PCR_Experiment_create(PCR *pcr, gchar *id, gchar *primer_a, gchar *primer_b, gint min_product_len, gint max_product_len){ register PCR_Experiment *pcr_experiment = g_new(PCR_Experiment, 1); /* g_message("Adding [%s][%s][%s][%d][%d]", id, primer_a, primer_b, min_product_len, max_product_len); */ g_assert(id); g_assert(primer_a); g_assert(primer_b); g_assert(min_product_len > 0); g_assert(max_product_len >= min_product_len); pcr_experiment->pcr = pcr; pcr_experiment->id = g_string_chunk_insert(pcr->string_chunk, id); pcr_experiment->primer_a = PCR_Primer_create(pcr_experiment, primer_a); pcr_experiment->primer_b = PCR_Primer_create(pcr_experiment, primer_b); pcr_experiment->min_product_len = min_product_len; pcr_experiment->max_product_len = max_product_len; pcr_experiment->match_list = SList_create(pcr->slist_set); pcr_experiment->product_count = 0; return pcr_experiment; } static void PCR_Experiment_clear(PCR_Experiment *pcr_experiment){ register PCR_Match *pcr_match; while(!SList_isempty(pcr_experiment->match_list)){ pcr_match = SList_pop(pcr_experiment->match_list); PCR_Match_destroy(pcr_match); } return; } static void PCR_Experiment_destroy(PCR_Experiment *pcr_experiment){ g_assert(pcr_experiment); PCR_Experiment_clear(pcr_experiment); PCR_Primer_destroy(pcr_experiment->primer_a); PCR_Primer_destroy(pcr_experiment->primer_b); SList_destroy(pcr_experiment->match_list); g_free(pcr_experiment); return; } static gsize PCR_Experiment_memory_usage( PCR_Experiment *pcr_experiment){ return sizeof(PCR_Experiment) + sizeof(gpointer) /* for pcr->experment_list */ + (sizeof(gchar) * strlen(pcr_experiment->id)) + PCR_Primer_memory_usage(pcr_experiment->primer_a) + PCR_Primer_memory_usage(pcr_experiment->primer_b); } /**/ PCR *PCR_create(PCR_ReportFunc report_func, gpointer user_data, gint mismatch_threshold, gint seed_length){ register PCR *pcr = g_new(PCR, 1); register Submat *submat = Submat_create("iupac-identity"); register WordHood_Alphabet *wha; pcr->slist_set = SListSet_create(); pcr->fsm = FSM_create("ACGT", PCR_FSM_merge_PCR_Sensor, PCR_FSM_combine_PCR_Sensor, pcr); pcr->experiment_list = g_ptr_array_new(); pcr->mismatch_threshold = mismatch_threshold; pcr->seed_length = seed_length; pcr->report_func = report_func; pcr->is_prepared = FALSE; wha = WordHood_Alphabet_create_from_Submat( "GARTKWDCSMVYBHN", "ACGT", submat, FALSE); pcr->wordhood = WordHood_create(wha, pcr->mismatch_threshold, TRUE); WordHood_Alphabet_destroy(wha); pcr->experiment_memory_usage = 0; pcr->sensor_count = 0; pcr->sensor_mem_chunk = g_mem_chunk_new("PCR_Sensor", sizeof(PCR_Sensor), sizeof(PCR_Sensor)*4096, G_ALLOC_AND_FREE); pcr->user_data = user_data; Submat_destroy(submat); pcr->string_chunk = g_string_chunk_new(4096); pcr->probe_mem_chunk = g_mem_chunk_new("PCR_Probe", sizeof(PCR_Probe), sizeof(PCR_Probe)*4096, G_ALLOC_AND_FREE); pcr->match_mem_chunk = g_mem_chunk_new("PCR_Match", sizeof(PCR_Match), sizeof(PCR_Match)*4096, G_ALLOC_ONLY); pcr->match_recycle = NULL; return pcr; } void PCR_destroy(PCR *pcr){ register gint i; for(i = 0; i < pcr->experiment_list->len; i++) PCR_Experiment_destroy(pcr->experiment_list->pdata[i]); g_ptr_array_free(pcr->experiment_list, TRUE); FSM_destroy(pcr->fsm); WordHood_destroy(pcr->wordhood); SListSet_destroy(pcr->slist_set); g_mem_chunk_destroy(pcr->sensor_mem_chunk); g_string_chunk_free(pcr->string_chunk); g_mem_chunk_destroy(pcr->probe_mem_chunk); g_mem_chunk_destroy(pcr->match_mem_chunk); g_free(pcr); return; } /**/ static gsize PCR_memory_usage(PCR *pcr){ return sizeof(PCR) + pcr->experiment_memory_usage + SListSet_memory_usage(pcr->slist_set) + FSM_memory_usage(pcr->fsm) + (sizeof(PCR_Sensor) * pcr->sensor_count); } gsize PCR_add_experiment(PCR *pcr, gchar *id, gchar *primer_a, gchar *primer_b, gint min_product_len, gint max_product_len){ register PCR_Experiment *pcr_experiment; g_assert(pcr); g_assert(!pcr->is_prepared); pcr_experiment = PCR_Experiment_create(pcr, id, primer_a, primer_b, min_product_len, max_product_len); g_ptr_array_add(pcr->experiment_list, pcr_experiment); pcr->experiment_memory_usage += PCR_Experiment_memory_usage(pcr_experiment); return PCR_memory_usage(pcr); } void PCR_prepare(PCR *pcr){ g_assert(pcr); g_assert(!pcr->is_prepared); FSM_compile(pcr->fsm); pcr->is_prepared = TRUE; return; } static void PCR_Sensor_register_hit_recur(PCR_Sensor *pcr_sensor, Sequence *sequence, guint seq_pos){ register SListNode *node; register PCR_Probe *pcr_probe; register PCR_Sensor *borrowed_sensor; SList_for_each(pcr_sensor->owned_probe_list, node){ pcr_probe = node->data; PCR_Probe_register_hit(pcr_probe, sequence, seq_pos); } SList_for_each(pcr_sensor->borrowed_sensor_list, node){ borrowed_sensor = node->data; PCR_Sensor_register_hit_recur(borrowed_sensor, sequence, seq_pos); } return; } static void PCR_FSM_traverse_func(guint seq_pos, gpointer node_data, gpointer user_data){ register PCR_Sensor *pcr_sensor = node_data; register Sequence *sequence = user_data; PCR_Sensor_register_hit_recur(pcr_sensor, sequence, seq_pos); return; } void PCR_simulate(PCR *pcr, Sequence *sequence){ register gint i; register gchar *str = Sequence_get_str(sequence); g_assert(pcr->is_prepared); FSM_traverse(pcr->fsm, str, PCR_FSM_traverse_func, sequence); g_free(str); for(i = 0; i < pcr->experiment_list->len; i++) PCR_Experiment_clear(pcr->experiment_list->pdata[i]); return; } /**/ /* FIXME: Add wordhood_abandon and match_abandon thresholds. */ exonerate-2.4.0/src/comparison/comparison.test.c0000644000175000017500000000211011162714274016664 00000000000000/****************************************************************\ * * * Comparison : A module for pairwise sequence comparisons * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "comparison.h" #include "argument.h" gint Argument_main(Argument *arg){ g_message("Test [%s] not implemented", __FILE__); return 0; } exonerate-2.4.0/src/comparison/comparison.c0000644000175000017500000002331011222435657015715 00000000000000/****************************************************************\ * * * Comparison : A module for pairwise sequence comparisons * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "comparison.h" /**/ static HSP_Param *Comparison_Param_get_hsp_param(HSP_Param *hsp_param, gboolean swap_param){ if(hsp_param){ if(swap_param) return HSP_Param_swap(hsp_param); else return HSP_Param_share(hsp_param); } return NULL; } static Comparison_Param *Comparison_Param_create_without_mirror( Alphabet_Type query_type, Alphabet_Type target_type, HSP_Param *dna_hsp_param, HSP_Param *protein_hsp_param, HSP_Param *codon_hsp_param, gboolean swap_param){ register Comparison_Param *param = g_new(Comparison_Param, 1); param->thread_ref = NULL; param->query_type = query_type; param->target_type = target_type; param->dna_hsp_param = Comparison_Param_get_hsp_param(dna_hsp_param, swap_param); param->protein_hsp_param = Comparison_Param_get_hsp_param(protein_hsp_param, swap_param); param->codon_hsp_param = Comparison_Param_get_hsp_param(codon_hsp_param, swap_param); param->mirror = NULL; return param; } Comparison_Param *Comparison_Param_create(Alphabet_Type query_type, Alphabet_Type target_type, HSP_Param *dna_hsp_param, HSP_Param *protein_hsp_param, HSP_Param *codon_hsp_param){ register Comparison_Param *param = Comparison_Param_create_without_mirror( query_type, target_type, dna_hsp_param, protein_hsp_param, codon_hsp_param, FALSE); param->mirror = Comparison_Param_create_without_mirror( target_type, query_type, dna_hsp_param, protein_hsp_param, codon_hsp_param, TRUE); param->mirror->mirror = param; param->mirror->thread_ref = param->thread_ref = ThreadRef_create(); return param; } static void Comparison_Param_destroy_without_mirror( Comparison_Param *param){ g_assert(param); param->mirror = NULL; if(param->dna_hsp_param) HSP_Param_destroy(param->dna_hsp_param); if(param->protein_hsp_param) HSP_Param_destroy(param->protein_hsp_param); if(param->codon_hsp_param) HSP_Param_destroy(param->codon_hsp_param); g_free(param); return; } void Comparison_Param_destroy(Comparison_Param *param){ g_assert(param); if(ThreadRef_destroy(param->thread_ref)) return; Comparison_Param_destroy_without_mirror(param->mirror); Comparison_Param_destroy_without_mirror(param); return; } Comparison_Param *Comparison_Param_share(Comparison_Param *param){ g_assert(param); ThreadRef_share(param->thread_ref); return param; } Comparison_Param *Comparison_Param_swap(Comparison_Param *param){ register Comparison_Param *mirror = Comparison_Param_share(param->mirror); g_assert(param); g_assert(param->mirror); Comparison_Param_destroy(param); return mirror; } HSPset_ArgumentSet *Comparison_Param_get_HSPSet_Argument_Set( Comparison_Param *param){ if(param->dna_hsp_param) return param->dna_hsp_param->has; if(param->protein_hsp_param) return param->protein_hsp_param->has; if(param->codon_hsp_param) return param->codon_hsp_param->has; g_error("Comparison_Param does not contain any HSP_Param"); return NULL; } /**/ Comparison *Comparison_create(Comparison_Param *param, Sequence *query, Sequence *target){ register Comparison *comparison = g_new(Comparison, 1); g_assert(param); g_assert(query); g_assert(target); comparison->thread_ref = ThreadRef_create(); comparison->param = Comparison_Param_share(param); comparison->query = Sequence_share(query); comparison->target = Sequence_share(target); ThreadRef_lock(comparison->param->thread_ref); comparison->dna_hspset = comparison->param->dna_hsp_param ? HSPset_create(query, target, param->dna_hsp_param) : NULL; comparison->protein_hspset = param->protein_hsp_param ? HSPset_create(query, target, param->protein_hsp_param) : NULL; comparison->codon_hspset = param->codon_hsp_param ? HSPset_create(query, target, param->codon_hsp_param) : NULL; ThreadRef_unlock(comparison->param->thread_ref); return comparison; } Comparison *Comparison_share(Comparison *comparison){ ThreadRef_share(comparison->thread_ref); return comparison; } void Comparison_destroy(Comparison *comparison){ g_assert(comparison); if(ThreadRef_destroy(comparison->thread_ref)) return; Comparison_Param_destroy(comparison->param); Sequence_destroy(comparison->query); Sequence_destroy(comparison->target); if(comparison->dna_hspset) HSPset_destroy(comparison->dna_hspset); if(comparison->protein_hspset) HSPset_destroy(comparison->protein_hspset); if(comparison->codon_hspset) HSPset_destroy(comparison->codon_hspset); g_free(comparison); return; } void Comparison_print(Comparison *comparison){ g_message("Comparison qy [%s] (qylen:%d) tg [%s] (tglen:%d)", comparison->query->id, comparison->query->len, comparison->target->id, comparison->target->len); if(comparison->dna_hspset){ g_message("DNA hspset"); HSPset_print(comparison->dna_hspset); } if(comparison->protein_hspset){ g_message("Protein hspset"); HSPset_print(comparison->protein_hspset); } if(comparison->codon_hspset){ g_message("Codon hspset"); HSPset_print(comparison->codon_hspset); } return; } gboolean Comparison_has_hsps(Comparison *comparison){ if(comparison->dna_hspset) if(!HSPset_is_empty(comparison->dna_hspset)) return TRUE; if(comparison->protein_hspset) if(!HSPset_is_empty(comparison->protein_hspset)) return TRUE; if(comparison->codon_hspset) if(!HSPset_is_empty(comparison->codon_hspset)) return TRUE; return FALSE; } void Comparison_finalise(Comparison *comparison){ if(comparison->dna_hspset) comparison->dna_hspset = HSPset_finalise(comparison->dna_hspset); if(comparison->protein_hspset) comparison->protein_hspset = HSPset_finalise(comparison->protein_hspset); if(comparison->codon_hspset) comparison->codon_hspset = HSPset_finalise(comparison->codon_hspset); return; } void Comparison_swap(Comparison *comparison){ register Sequence *query; g_assert(ThreadRef_get_count(comparison->thread_ref) == 1); /* Mirror parameters */ comparison->param = Comparison_Param_share(comparison->param->mirror); Comparison_Param_destroy(comparison->param->mirror); /* Swap query and target */ query = comparison->query; comparison->query = comparison->target; comparison->target = query; /* Swap each HSPset */ if(comparison->dna_hspset) HSPset_swap(comparison->dna_hspset, comparison->param->dna_hsp_param); if(comparison->protein_hspset) HSPset_swap(comparison->protein_hspset, comparison->param->protein_hsp_param); if(comparison->codon_hspset) HSPset_swap(comparison->codon_hspset, comparison->param->codon_hsp_param); return; } /* Swaps query and target of comparison in place */ void Comparison_revcomp(Comparison *comparison){ register Sequence *rc_query = Sequence_revcomp(comparison->query), *rc_target = Sequence_revcomp(comparison->target); g_assert(ThreadRef_get_count(comparison->thread_ref) == 1); g_assert(comparison->dna_hspset); g_assert(!comparison->protein_hspset); g_assert(!comparison->codon_hspset); Sequence_destroy(comparison->query); Sequence_destroy(comparison->target); comparison->query = rc_query; comparison->target = rc_target; HSPset_revcomp(comparison->dna_hspset); return; } /* Revcomps the comparison in place */ /**/ exonerate-2.4.0/src/comparison/match.test.c0000644000175000017500000000210311162714277015613 00000000000000/****************************************************************\ * * * Match : A module for pairwise symbol comparison * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "match.h" #include "argument.h" gint Argument_main(Argument *arg){ g_message("Test [%s] not implemented", __FILE__); return 0; } exonerate-2.4.0/src/comparison/seeder.test.c0000644000175000017500000000630611162714277015777 00000000000000/****************************************************************\ * * * Seeder : A module for seeding pairwise alignments * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "seeder.h" #include "argument.h" static void Seeder_Test_report_func(Comparison *comparison, gpointer user_data){ register gint *count = user_data; register gint i; register HSP *hsp; g_message("Have comparison with [%d] DNA hsps", comparison->dna_hspset->hsp_list->len); for(i = 0; i < comparison->dna_hspset->hsp_list->len; i++){ hsp = comparison->dna_hspset->hsp_list->pdata[i]; HSP_print(hsp, "hsp"); } (*count) += comparison->dna_hspset->hsp_list->len; return; } gint Argument_main(Argument *arg){ register Match *match; register HSP_Param *dna_hsp_param; register Comparison_Param *comparison_param; register Seeder *seeder; register Alphabet *dna_alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE); register Sequence *query = Sequence_create("qy", NULL, "ACGTAACCGGTTAGCT", 0, Sequence_Strand_FORWARD, dna_alphabet), *target = Sequence_create("tg", NULL, "NNNNNNNNNNNNNNNNNACGTAACCGGTTAGCTNNNNNNNNNNNNNNNNNNN" "NNACGTAACCGGTTAGCTNNNNNNNNACGTAACCGGTTAGCTNNNNNNNNNN", 0, Sequence_Strand_FORWARD, dna_alphabet); gint count = 0; Match_ArgumentSet_create(arg); HSPset_ArgumentSet_create(arg); Seeder_ArgumentSet_create(arg); Argument_process(arg, "seeder.test", NULL, NULL); match = Match_find(Match_Type_DNA2DNA); dna_hsp_param = HSP_Param_create(match, TRUE); comparison_param = Comparison_Param_create(Alphabet_Type_DNA, Alphabet_Type_DNA, dna_hsp_param, NULL, NULL); seeder = Seeder_create(1, comparison_param, 0, Seeder_Test_report_func, &count); g_message("Seeder test"); Seeder_add_query(seeder, query); Seeder_add_target(seeder, target); /**/ g_message("final count [%d]", count); g_assert(count == 3); /**/ Alphabet_destroy(dna_alphabet); Sequence_destroy(query); Sequence_destroy(target); HSP_Param_destroy(dna_hsp_param); Comparison_Param_destroy(comparison_param); Seeder_destroy(seeder); return 0; } exonerate-2.4.0/src/comparison/seeder.c0000644000175000017500000010647011176605007015017 00000000000000/****************************************************************\ * * * Seeder : A module for seeding pairwise alignments * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "seeder.h" #include "alphabet.h" #include /* For toupper() */ #include /* For strlen() */ #include /* For pow() */ #include /* For offsetof() */ #ifndef OFFSET_ITEM #define OFFSET_ITEM(type, offset, instance) \ (*(type*)((char*)(instance) + (offset))) #endif /* OFFSET_ITEM */ #ifndef Swap #define Swap(x,y,temp) ((temp)=(x),(x)=(y),(y)=(temp)) #endif /* Swap */ Seeder_ArgumentSet *Seeder_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static Seeder_ArgumentSet sas; if(arg){ as = ArgumentSet_create("Alignment Seeding Options"); ArgumentSet_add_option(as, 'M', "fsmmemory", "Mb", "Memory limit for FSM scanning", "256", Argument_parse_int, &sas.fsm_memory_limit); ArgumentSet_add_option(as, '\0', "forcefsm", "fsm type", "Force FSM type ( normal | compact )", "none", Argument_parse_string, &sas.force_fsm); ArgumentSet_add_option(as, 0, "wordjump", NULL, "Jump between query words", "1", Argument_parse_int, &sas.word_jump); ArgumentSet_add_option(as, 0, "wordambiguity", NULL, "Number of ambiguous words to expand", "1", Argument_parse_int, &sas.word_ambiguity); Argument_absorb_ArgumentSet(arg, as); } return &sas; } /**/ static Seeder_WordInfo *Seeder_WordInfo_create(Seeder *seeder){ register Seeder_WordInfo *word_info = RecycleBin_alloc(seeder->recycle_wordinfo); word_info->seed_list = NULL; word_info->neighbour_list = NULL; if(seeder->saturate_threshold){ word_info->match_count = 0; word_info->match_mailbox = -1; } return word_info; } static void Seeder_WordInfo_empty(Seeder *seeder, Seeder_WordInfo *word_info){ register Seeder_Seed *seed; register Seeder_Neighbour *neighbour; while(word_info->seed_list){ seed = word_info->seed_list; word_info->seed_list = seed->next; RecycleBin_recycle(seeder->recycle_seed, seed); } while(word_info->neighbour_list){ neighbour = word_info->neighbour_list; word_info->neighbour_list = neighbour->next; RecycleBin_recycle(seeder->recycle_neighbour, neighbour); } return; } static void Seeder_WordInfo_add_Seed(Seeder *seeder, Seeder_WordInfo *word_info, Seeder_Context *context, Match_Score query_expect, gint qpos){ register Seeder_Seed *seed; g_assert(word_info); if(seeder->saturate_threshold){ if(!word_info->match_mailbox) /* Already blocked */ return; /* Apply saturate_threshold to frequent query words */ if(++word_info->match_count > query_expect){ word_info->match_mailbox = 0; /* block */ Seeder_WordInfo_empty(seeder, word_info); return; } } seed = RecycleBin_alloc(seeder->recycle_seed); seed->next = word_info->seed_list; seed->query_pos = qpos; seed->context = context; word_info->seed_list = seed; return; } static void Seeder_WordInfo_add_Neighbour(Seeder *seeder, Seeder_WordInfo *word_info, Seeder_WordInfo *neighbour){ register Seeder_Neighbour *n; g_assert(seeder); g_assert(word_info); g_assert(neighbour); if(seeder->saturate_threshold && (!word_info->match_mailbox)) /* Already blocked */ return; n = RecycleBin_alloc(seeder->recycle_neighbour); n->next = word_info->neighbour_list; n->word_info = neighbour; word_info->neighbour_list = n; return; } /**/ static Seeder_QueryInfo *Seeder_QueryInfo_create(Sequence *query){ register Seeder_QueryInfo *query_info = g_new(Seeder_QueryInfo, 1); query_info->query = Sequence_share(query); query_info->curr_comparison = NULL; return query_info; } static void Seeder_QueryInfo_destroy(Seeder_QueryInfo *query_info){ Sequence_destroy(query_info->query); if(query_info->curr_comparison) Comparison_destroy(query_info->curr_comparison); g_free(query_info); return; } /**/ static gpointer Seeder_FSM_merge_func(gpointer a, gpointer b, gpointer user_data){ register Seeder_WordInfo *word_info_a = a, *word_info_b = b; register Seeder *seeder = user_data; /* Assume word_info_b is always empty during merging */ g_assert(!word_info_b->seed_list); g_assert(!word_info_b->neighbour_list); RecycleBin_recycle(seeder->recycle_wordinfo, word_info_b); return word_info_a; } static gpointer Seeder_FSM_combine_func(gpointer a, gpointer b, gpointer user_data){ g_error("Seeder implementation assumes words of same length"); return NULL; } static Seeder_FSM *Seeder_FSM_create(Seeder *seeder){ register Seeder_FSM *seeder_fsm = g_new(Seeder_FSM, 1); register Match *match = seeder->any_hsp_param->match; seeder_fsm->fsm = FSM_create((gchar*)match->comparison_alphabet->member, Seeder_FSM_merge_func, Seeder_FSM_combine_func, seeder); return seeder_fsm; } static void Seeder_FSM_destroy(Seeder_FSM *seeder_fsm){ FSM_destroy(seeder_fsm->fsm); g_free(seeder_fsm); return; } static gsize Seeder_FSM_memory_usage(Seeder_FSM *seeder_fsm){ return sizeof(Seeder_FSM) + FSM_memory_usage(seeder_fsm->fsm); } typedef struct { Seeder *seeder; Seeder_WordInfo *curr_word_info; Seeder_Context *context; } Seeder_TraverseData; static Seeder_WordInfo *Seeder_add_WordInfo(Seeder *seeder, gchar *word, Seeder_Context *context){ register Seeder_WordInfo *word_info; register VFSM_Int state = 0, leaf; if(seeder->seeder_fsm){ /* Add to FSM */ word_info = Seeder_WordInfo_create(seeder); word_info = FSM_add(seeder->seeder_fsm->fsm, word, context->loader->hsp_param->wordlen, word_info); } else { /* Add to VFSM */ g_assert(seeder->seeder_vfsm); state = VFSM_word2state(seeder->seeder_vfsm->vfsm, word); leaf = VFSM_state2leaf(seeder->seeder_vfsm->vfsm, state); word_info = seeder->seeder_vfsm->leaf[leaf]; if(!word_info){ word_info = Seeder_WordInfo_create(seeder); seeder->seeder_vfsm->leaf[leaf] = word_info; } } return word_info; } static gboolean Seeder_word_is_valid(Match *match, Sequence *seq, gint pos, gint len){ if(seq->annotation){ if(match->type == Match_Type_DNA2DNA){ if(((pos+len) > seq->annotation->cds_start) && (pos < (seq->annotation->cds_start +seq->annotation->cds_length))){ return FALSE; } } else { if(match->type == Match_Type_CODON2CODON){ if((pos < seq->annotation->cds_start) || ((pos+len) >= (seq->annotation->cds_start +seq->annotation->cds_length)) || ((pos % 3) != (seq->annotation->cds_start % 3))){ return FALSE; } } } } return TRUE; } /**/ static Seeder_VFSM *Seeder_VFSM_create(Seeder *seeder){ register Seeder_VFSM *seeder_vfsm = g_new(Seeder_VFSM, TRUE); seeder_vfsm->vfsm = VFSM_create( (gchar*)seeder->any_hsp_param->match->comparison_alphabet->member, seeder->any_hsp_param->wordlen); if(!seeder_vfsm->vfsm) g_error("Could not create VFSM for alphabet [%s] depth [%d]", seeder->any_hsp_param->match->comparison_alphabet->member, seeder->any_hsp_param->wordlen); seeder_vfsm->leaf = g_new0(Seeder_WordInfo*, seeder_vfsm->vfsm->lrw); return seeder_vfsm; } static void Seeder_VFSM_destroy(Seeder_VFSM *seeder_vfsm){ VFSM_destroy(seeder_vfsm->vfsm); g_free(seeder_vfsm->leaf); g_free(seeder_vfsm); return; } static gsize Seeder_VFSM_memory_usage(Seeder_VFSM *seeder_vfsm){ return sizeof(Seeder_VFSM) + sizeof(VFSM) + (sizeof(Seeder_WordInfo*) * seeder_vfsm->vfsm->lrw); } /**/ static gboolean Seeder_decide_fsm_type(gchar *force_fsm){ if(!g_strcasecmp(force_fsm, "none")) return TRUE; /* FIXME: improve this by checking if VFSM is a better idea * by looking at the query sizes ... */ if(!g_strcasecmp(force_fsm, "normal")) return TRUE; if(!g_strcasecmp(force_fsm, "compact")) return FALSE; g_error("Unknown fsm type [%s]", force_fsm); return TRUE; } static Seeder_Loader *Seeder_Loader_create(HSP_Param *hsp_param, Seeder_ArgumentSet *sas, gsize hspset_offset){ register Seeder_Loader *loader = g_new(Seeder_Loader, 1); register Alphabet *comparison_alphabet = hsp_param->match->comparison_alphabet; register gint alphabet_size = strlen((gchar*)comparison_alphabet->member); loader->hsp_param = HSP_Param_share(hsp_param); if(hsp_param->match->target->is_translated) loader->tpos_modifier = (hsp_param->wordlen * 3) - 3; else loader->tpos_modifier = hsp_param->wordlen - 1; loader->saturate_expectation = 1.0 / pow(alphabet_size, hsp_param->wordlen); loader->hspset_offset = hspset_offset; return loader; } static void Seeder_Loader_destroy(Seeder_Loader *loader){ HSP_Param_destroy(loader->hsp_param); g_free(loader); return; } Seeder *Seeder_create(gint verbosity, Comparison_Param *comparison_param, Match_Score saturate_threshold, Seeder_ReportFunc report_func, gpointer user_data){ register Seeder *seeder = g_new0(Seeder, 1); seeder->sas = Seeder_ArgumentSet_create(NULL); if(seeder->sas->word_ambiguity < 1) g_error("Word ambiguity cannot be less than one."); if(seeder->sas->word_ambiguity > 1){ if(comparison_param->protein_hsp_param) g_error("Protein ambuiguity symbols not implemented"); g_warning("setting --wordambiguity in exonerate will be SLOW."); g_warning("... use esd2esi --wordambiguity and exonerate-server instead"); } seeder->is_prepared = FALSE; seeder->verbosity = verbosity; seeder->report_func = report_func; seeder->user_data = user_data; seeder->total_query_length = 0; seeder->comparison_count = 0; seeder->saturate_threshold = saturate_threshold; seeder->comparison_param = Comparison_Param_share(comparison_param); seeder->query_info_list = g_ptr_array_new(); if(saturate_threshold) seeder->recycle_wordinfo = RecycleBin_create( "Seeder_WordInfo", sizeof(Seeder_WordInfo), 64); else seeder->recycle_wordinfo = RecycleBin_create( "Seeder_WordInfo", sizeof(Seeder_WordInfo_no_ST), 64); seeder->recycle_seed = RecycleBin_create( "Seeder_Seed", sizeof(Seeder_Seed), 64); seeder->recycle_neighbour = RecycleBin_create( "Seeder_Neighbour", sizeof(Seeder_Neighbour), 64); seeder->recycle_context = RecycleBin_create( "Seeder_Context", sizeof(Seeder_Context), 64); /**/ seeder->any_hsp_param = NULL; if(comparison_param->dna_hsp_param){ seeder->dna_loader = Seeder_Loader_create(comparison_param->dna_hsp_param, seeder->sas, offsetof(Comparison, dna_hspset)); seeder->any_hsp_param = HSP_Param_share(comparison_param->dna_hsp_param); } if(comparison_param->protein_hsp_param){ seeder->protein_loader = Seeder_Loader_create(comparison_param->protein_hsp_param, seeder->sas, offsetof(Comparison, protein_hspset)); seeder->any_hsp_param = HSP_Param_share(comparison_param->protein_hsp_param); } if(comparison_param->codon_hsp_param){ seeder->codon_loader = Seeder_Loader_create(comparison_param->codon_hsp_param, seeder->sas, offsetof(Comparison, codon_hspset)); seeder->any_hsp_param = HSP_Param_share(comparison_param->codon_hsp_param); } g_assert(seeder->any_hsp_param); if(comparison_param->protein_hsp_param && (comparison_param->dna_hsp_param || comparison_param->codon_hsp_param)) g_error("Mixed DNA and protein seeding not implemented"); if(Seeder_decide_fsm_type(seeder->sas->force_fsm)){ /* Use FSM */ seeder->seeder_fsm = Seeder_FSM_create(seeder); seeder->seeder_vfsm = NULL; } else { /* Use VFSM */ if((seeder->dna_loader && seeder->codon_loader) && (seeder->dna_loader->hsp_param->wordlen != seeder->codon_loader->hsp_param->wordlen)) g_error("Multi-wordlength VFSMs not implemented"); seeder->seeder_fsm = NULL; seeder->seeder_vfsm = Seeder_VFSM_create(seeder); } seeder->active_queryinfo_list = g_ptr_array_new(); return seeder; } void Seeder_destroy(Seeder *seeder){ register gint i; register Seeder_QueryInfo *query_info; Comparison_Param_destroy(seeder->comparison_param); for(i = 0; i < seeder->query_info_list->len; i++){ query_info = seeder->query_info_list->pdata[i]; Seeder_QueryInfo_destroy(query_info); } g_ptr_array_free(seeder->query_info_list, TRUE); RecycleBin_destroy(seeder->recycle_wordinfo); RecycleBin_destroy(seeder->recycle_seed); RecycleBin_destroy(seeder->recycle_neighbour); RecycleBin_destroy(seeder->recycle_context); if(seeder->seeder_fsm) Seeder_FSM_destroy(seeder->seeder_fsm); if(seeder->seeder_vfsm) Seeder_VFSM_destroy(seeder->seeder_vfsm); if(seeder->dna_loader) Seeder_Loader_destroy(seeder->dna_loader); if(seeder->protein_loader) Seeder_Loader_destroy(seeder->protein_loader); if(seeder->codon_loader) Seeder_Loader_destroy(seeder->codon_loader); HSP_Param_destroy(seeder->any_hsp_param); g_ptr_array_free(seeder->active_queryinfo_list, TRUE); g_free(seeder); return; } gsize Seeder_memory_usage(Seeder *seeder){ register gsize fsm_memory; g_assert(seeder); if(seeder->seeder_fsm) fsm_memory = Seeder_FSM_memory_usage(seeder->seeder_fsm); else fsm_memory = Seeder_VFSM_memory_usage(seeder->seeder_vfsm); return sizeof(Seeder) + fsm_memory + (seeder->query_info_list->len *(sizeof(gpointer)+sizeof(Seeder_QueryInfo))) + RecycleBin_memory_usage(seeder->recycle_wordinfo) + RecycleBin_memory_usage(seeder->recycle_seed) + RecycleBin_memory_usage(seeder->recycle_neighbour) + RecycleBin_memory_usage(seeder->recycle_context); } void Seeder_memory_info(Seeder *seeder){ g_message("Seeder Total = %dMb", (gint)(Seeder_memory_usage(seeder)>>20)); g_assert(seeder); if(seeder->seeder_fsm) g_message(" -> FSM memory = %dMb", (gint)(Seeder_FSM_memory_usage(seeder->seeder_fsm)>>20)); else g_message(" -> VFSM memory = %dMb", (gint)(Seeder_VFSM_memory_usage(seeder->seeder_vfsm)>>20)); g_message(" -> QueryInfo memory = %dMb", (gint)(seeder->query_info_list->len *(sizeof(gpointer)+sizeof(Seeder_QueryInfo)))>>20); g_message(" -> Recycle_Wordinfo Memory = %dMb", (gint)(RecycleBin_memory_usage(seeder->recycle_wordinfo)>>20)); g_message(" -> Recycle_Seed Memory = %dMb", (gint)(RecycleBin_memory_usage(seeder->recycle_seed)>>20)); g_message(" -> Recycle_Neighbour Memory = %dMb", (gint)(RecycleBin_memory_usage(seeder->recycle_neighbour)>>20)); g_message(" -> Recycle_Context Memory = %dMb", (gint)(RecycleBin_memory_usage(seeder->recycle_context)>>20)); return; } static gint Seeder_get_expect(Seeder *seeder, Seeder_Loader *loader, gint len){ return (gint)((loader->saturate_expectation * (len - loader->hsp_param->wordlen+1)) + seeder->saturate_threshold); } static gboolean Seeder_WordHood_traverse(gchar *word, gint score, gpointer user_data){ register Seeder_TraverseData *traverse_data = user_data; register Seeder *seeder = traverse_data->seeder; register Seeder_WordInfo *word_info; g_assert(seeder); word_info = Seeder_add_WordInfo(traverse_data->seeder, word, traverse_data->context); g_assert(word_info); if(word_info == traverse_data->curr_word_info) return FALSE; /* This is not a neighbour */ /* Make curr seed word a neighbour of this word */ Seeder_WordInfo_add_Neighbour(seeder, word_info, traverse_data->curr_word_info); return FALSE; } static void Seeder_insert_query(Seeder *seeder, Seeder_Context *context, Sequence *query, gint frame){ register Match_Score query_expect = 0; register gint i, ch, pos, orig_pos, valid_count = 0, wj_ctr = 0; register Seeder_WordInfo *word_info; register Match *match = context->loader->hsp_param->match; register VFSM *vfsm = seeder->seeder_vfsm?seeder->seeder_vfsm->vfsm:NULL; register VFSM_Int state = 0, leaf; register Sequence *seq_masked = Sequence_mask(query); register gchar *seq = Sequence_get_str(seq_masked); Seeder_TraverseData traverse_data; if(seeder->saturate_threshold){ seeder->total_query_length += query->len; query_expect = Seeder_get_expect(seeder, context->loader, seeder->total_query_length); } traverse_data.seeder = seeder; traverse_data.context = context; g_assert(context); g_assert(query); g_assert(query->len >= 0); for(i = 0; i < query->len; i++){ if(seeder->seeder_fsm){ if(!Alphabet_symbol_is_member(match->comparison_alphabet, (guchar)seq[i])){ valid_count = 0; continue; } valid_count++; if(valid_count < context->loader->hsp_param->wordlen) continue; } else { ch = toupper(seq[i]); /* FIXME: should filter properly */ if(!vfsm->index[ch]){ state = 0; continue; } /* FIXME: use POW2 versions where applicable */ state = VFSM_change_state_M(vfsm, state, ch); if(!VFSM_state_is_leaf(vfsm, state)) continue; } if(wj_ctr--) continue; wj_ctr = seeder->sas->word_jump - 1; /* FIXME: apply sat_th here */ pos = i - context->loader->hsp_param->wordlen + 1; if(!Seeder_word_is_valid(match, seq_masked, pos, context->loader->hsp_param->wordlen)) continue; if(seeder->seeder_fsm){ word_info = Seeder_WordInfo_create(seeder); word_info = Seeder_add_WordInfo(seeder, seq+pos, context); } else { leaf = VFSM_state2leaf(vfsm, state); word_info = seeder->seeder_vfsm->leaf[leaf]; if(!word_info){ word_info = Seeder_WordInfo_create(seeder); seeder->seeder_vfsm->leaf[leaf] = word_info; } } g_assert(word_info); if(frame) orig_pos = (pos * 3) + frame - 1; else orig_pos = pos; Seeder_WordInfo_add_Seed(seeder, word_info, context, query_expect, orig_pos); if(word_info->seed_list /* not blocked */ && (!word_info->seed_list->next)){ /* 1st seed */ traverse_data.curr_word_info = word_info; if(context->loader->hsp_param->wordhood) WordHood_traverse(context->loader->hsp_param->wordhood, Seeder_WordHood_traverse, seq+pos, context->loader->hsp_param->wordlen, &traverse_data); } } Sequence_destroy(seq_masked); g_free(seq); return; } /* FIXME: optimisation : efficient VFSM wordhood traversal ? */ static Seeder_Context *Seeder_Context_create(Seeder *seeder, Seeder_QueryInfo *query_info, Seeder_Loader *loader){ register Seeder_Context *context = RecycleBin_alloc(seeder->recycle_context); context->loader = loader; context->query_info = query_info; return context; } static void Seeder_load_query(Seeder *seeder, Seeder_QueryInfo *query_info, Seeder_Loader *loader){ register Match *match = loader->hsp_param->match; register gint i; register Sequence *aa_seq; register Seeder_Context *context = Seeder_Context_create(seeder, query_info, loader); g_assert(match); g_assert(query_info->query->alphabet->type == match->query->alphabet->type); if(match->query->is_translated){ g_assert(match->mas->translate); for(i = 0; i < 3; i++){ aa_seq = Sequence_translate(query_info->query, match->mas->translate, i+1); Seeder_insert_query(seeder, context, aa_seq, i+1); Sequence_destroy(aa_seq); } } else { Seeder_insert_query(seeder, context, query_info->query, 0); } return; } gboolean Seeder_add_query(Seeder *seeder, Sequence *query){ register Seeder_QueryInfo *query_info; g_assert(seeder); g_assert(query); g_assert(!seeder->is_prepared); if(seeder->verbosity > 2) g_message("Seeder Loading query [%s]", query->id); query_info = Seeder_QueryInfo_create(query); g_ptr_array_add(seeder->query_info_list, query_info); if(seeder->dna_loader) Seeder_load_query(seeder, query_info, seeder->dna_loader); if(seeder->protein_loader) Seeder_load_query(seeder, query_info, seeder->protein_loader); if(seeder->codon_loader) Seeder_load_query(seeder, query_info, seeder->codon_loader); /* Seeder_memory_info(seeder); */ return Seeder_memory_usage(seeder) > (seeder->sas->fsm_memory_limit << 20); } typedef struct { Seeder *seeder; Sequence *target; gint curr_frame; gint target_expect; } Seeder_TargetInfo; static void Seeder_WordInfo_seed(Seeder_QueryInfo *query_info, Seeder_TargetInfo *target_info, gint query_pos, gint target_pos, Seeder_Loader *loader){ register HSPset *hspset; g_assert(query_pos >= 0); g_assert(target_pos >= 0); target_info->seeder->comparison_count++; if(target_info->seeder->saturate_threshold) target_info->target_expect = Seeder_get_expect( target_info->seeder, loader, target_info->target->len); else target_info->target_expect = 0; if(!query_info->curr_comparison){ query_info->curr_comparison = Comparison_create( target_info->seeder->comparison_param, query_info->query, target_info->target); g_ptr_array_add(target_info->seeder->active_queryinfo_list, query_info); } hspset = OFFSET_ITEM(HSPset*, loader->hspset_offset, query_info->curr_comparison); HSPset_seed_hsp(hspset, query_pos, target_pos); return; } static void Seeder_FSM_traverse_func(guint seq_pos, gpointer node_data, gpointer user_data){ register Seeder_TargetInfo *target_info = user_data; register Seeder_WordInfo *word_info = node_data; register gint tpos, target_pos; register Seeder *seeder = target_info->seeder; register Seeder_Seed *seed; register Seeder_Neighbour *neighbour; if(target_info->curr_frame) tpos = (seq_pos * 3) + target_info->curr_frame - 1; else tpos = seq_pos; /* Ignore if over saturate threshold */ if(seeder->saturate_threshold){ if(!word_info->match_mailbox) /* Already blocked */ return; if(word_info->match_mailbox == seeder->comparison_count){ if(++word_info->match_count > target_info->target_expect) return; /* Saturated */ } else { word_info->match_mailbox = seeder->comparison_count; word_info->match_count = 1; } } g_assert(word_info->seed_list || word_info->neighbour_list); for(seed = word_info->seed_list; seed; seed = seed->next){ target_pos = tpos - seed->context->loader->tpos_modifier; g_assert(target_pos >= 0); Seeder_WordInfo_seed(seed->context->query_info, target_info, seed->query_pos, target_pos, seed->context->loader); } for(neighbour = word_info->neighbour_list; neighbour; neighbour = neighbour->next){ for(seed = neighbour->word_info->seed_list; seed; seed = seed->next){ target_pos = tpos - seed->context->loader->tpos_modifier; g_assert(target_pos >= 0); Seeder_WordInfo_seed(seed->context->query_info, target_info, seed->query_pos, target_pos, seed->context->loader); } } return; } static void Seeder_VFSM_traverse_single(Seeder *seeder, gchar *seq, Seeder_TargetInfo *target_info){ register gint i, ch; register VFSM_Int state = 0, leaf; register VFSM *vfsm = seeder->seeder_vfsm->vfsm; register Seeder_WordInfo *word_info; g_assert(vfsm); for(i = 0; seq[i]; i++){ ch = toupper(seq[i]); /* FIXME: filter properly */ if(!vfsm->index[ch]){ state = 0; continue; } /* FIXME: use POW2 versions where applicable */ state = VFSM_change_state_M(vfsm, state, ch); if(VFSM_state_is_leaf(vfsm, state)){ leaf = VFSM_state2leaf(vfsm, state); word_info = seeder->seeder_vfsm->leaf[leaf]; if(word_info) Seeder_FSM_traverse_func(i, word_info, target_info); } } return; } static void Seeder_VFSM_traverse_ambig(Seeder *seeder, gchar *seq, Seeder_TargetInfo *target_info){ register gint i, j, k, ch; register VFSM_Int state, next_state, leaf; register VFSM *vfsm = seeder->seeder_vfsm->vfsm; register Seeder_WordInfo *word_info; register gchar *ambig; register VFSM_Int *temp_state_list, *curr_state_list = g_new0(VFSM_Int, seeder->sas->word_ambiguity), *next_state_list = g_new0(VFSM_Int, seeder->sas->word_ambiguity); register gint curr_state_list_len = 1, next_state_list_len = 0; g_assert(vfsm); for(i = 0; seq[i]; i++){ ch = toupper(seq[i]); /* FIXME: filter properly */ if((!(ambig = Alphabet_nt2ambig(ch))) || ((strlen(ambig) * curr_state_list_len) > seeder->sas->word_ambiguity)){ next_state_list_len = 0; curr_state_list_len = 1; curr_state_list[0] = 0; continue; } for(j = 0; j < curr_state_list_len; j++){ state = curr_state_list[j]; for(k = 0; ambig[k]; k++){ g_assert(vfsm->index[(guchar)ambig[k]]); next_state = VFSM_change_state_M(vfsm, state, (guchar)ambig[k]); /* FIXME: use POW2 versions where applicable */ if(next_state_list_len && ((next_state == next_state_list[0]) || (next_state == next_state_list[next_state_list_len-1]))) break; next_state_list[next_state_list_len++] = next_state; if(VFSM_state_is_leaf(vfsm, next_state)){ leaf = VFSM_state2leaf(vfsm, next_state); word_info = seeder->seeder_vfsm->leaf[leaf]; if(word_info) Seeder_FSM_traverse_func(i, word_info, target_info); } } } curr_state_list_len = next_state_list_len; next_state_list_len = 0; Swap(next_state_list, curr_state_list, temp_state_list); } g_free(curr_state_list); g_free(next_state_list); return; } static void Seeder_VFSM_traverse(Seeder *seeder, gchar *seq, Seeder_TargetInfo *target_info){ if(seeder->sas->word_ambiguity > 1) Seeder_VFSM_traverse_ambig(seeder, seq, target_info); else Seeder_VFSM_traverse_single(seeder, seq, target_info); return; } static void Seeder_prepare(Seeder *seeder){ g_assert(!seeder->is_prepared); if(seeder->seeder_fsm) FSM_compile(seeder->seeder_fsm->fsm); seeder->is_prepared = TRUE; return; } static void Seeder_FSM_traverse_ambig(Seeder *seeder, gchar *seq, FSM_Traverse_Func ftf, gpointer user_data){ register FSM *f = seeder->seeder_fsm->fsm; register FSM_Node *n, *next_state; register gint c; register gchar *ambig; register gint i, j, k; register FSM_Node **temp_state_list, **curr_state_list = g_new0(FSM_Node*, seeder->sas->word_ambiguity), **next_state_list = g_new0(FSM_Node*, seeder->sas->word_ambiguity); register gint curr_state_list_len = 1, next_state_list_len = 0; g_assert(f->is_compiled); curr_state_list[0] = f->root; for(i = 0; seq[i]; i++){ if((!(ambig = Alphabet_nt2ambig(seq[i]))) || ((strlen(ambig) * curr_state_list_len) > seeder->sas->word_ambiguity)){ next_state_list_len = 0; curr_state_list_len = 1; curr_state_list[0] = f->root; continue; } for(j = 0; j < curr_state_list_len; j++){ n = curr_state_list[j]; for(k = 0; ambig[k]; k++){ c = f->traversal_filter[(guchar)ambig[k]]; if(n[c].data) ftf(i, n[c].data, user_data); next_state = n[c].next; if(next_state_list_len && ((next_state == next_state_list[0]) || (next_state == next_state_list[next_state_list_len-1]))){ break; } next_state_list[next_state_list_len++] = next_state; } } curr_state_list_len = next_state_list_len; next_state_list_len = 0; /* g_message("set curr [%d]", curr_state_list_len); */ Swap(next_state_list, curr_state_list, temp_state_list); } g_free(curr_state_list); g_free(next_state_list); return; } /* Slow (ambiguous) version of FSM_traverse */ /* FIXME: optimisation: remove strlen() etc */ static void Seeder_FSM_traverse(Seeder *seeder, gchar *seq, FSM_Traverse_Func ftf, gpointer user_data){ if(seeder->sas->word_ambiguity > 1) Seeder_FSM_traverse_ambig(seeder, seq, ftf, user_data); else FSM_traverse(seeder->seeder_fsm->fsm, seq, ftf, user_data); return; } void Seeder_add_target(Seeder *seeder, Sequence *target){ register gint i; register Sequence *aa_seq; register gchar *seq; register Seeder_QueryInfo *query_info; register Match *match = seeder->any_hsp_param->match; register Sequence *target_masked; Seeder_TargetInfo target_info; g_assert(seeder); g_assert(target); g_assert(target->alphabet->type == match->target->alphabet->type); target_info.seeder = seeder; target_info.target = Sequence_share(target); if(!seeder->is_prepared) Seeder_prepare(seeder); if(seeder->verbosity > 2) g_message("Seeder finding matches with target [%s]", target->id); if(match->target->is_translated){ g_assert(match->mas->translate); for(i = 0; i < 3; i++){ target_info.curr_frame = i+1; aa_seq = Sequence_translate(target, match->mas->translate, i+1); target_masked = Sequence_mask(aa_seq); Sequence_destroy(aa_seq); seq = Sequence_get_str(target_masked); Sequence_destroy(target_masked); if(seeder->seeder_fsm) Seeder_FSM_traverse(seeder, seq, Seeder_FSM_traverse_func, &target_info); else Seeder_VFSM_traverse(seeder, seq, &target_info); g_free(seq); } } else { target_info.curr_frame = 0; target_masked = Sequence_mask(target); seq = Sequence_get_str(target_masked); Sequence_destroy(target_masked); if(seeder->seeder_fsm){ Seeder_FSM_traverse(seeder, seq, Seeder_FSM_traverse_func, &target_info); } else { Seeder_VFSM_traverse(seeder, seq, &target_info); } g_free(seq); } Sequence_destroy(target_info.target); if(seeder->verbosity > 2) g_message("Seeder done finding matches with target [%s]", target->id); /* Report matches */ for(i = 0; i < seeder->active_queryinfo_list->len; i++){ query_info = seeder->active_queryinfo_list->pdata[i]; g_assert(query_info->curr_comparison); if(Comparison_has_hsps(query_info->curr_comparison)){ Comparison_finalise(query_info->curr_comparison); seeder->report_func(query_info->curr_comparison, seeder->user_data); } Comparison_destroy(query_info->curr_comparison); query_info->curr_comparison = NULL; } g_ptr_array_set_size(seeder->active_queryinfo_list, 0); return; } /**/ /* FIXME: replace FSM/VFSM conditional stuff with funcs */ exonerate-2.4.0/src/comparison/wordhood.h0000644000175000017500000000742011162714277015402 00000000000000/****************************************************************\ * * * Library for word-neighbourhood generation * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_WORDHOOD_H #define INCLUDED_WORDHOOD_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include #include #include "submat.h" #include "codonsubmat.h" #ifndef ALPHABETSIZE #define ALPHABETSIZE (1<<(CHAR_BIT)) #endif /* ALPHABETSIZE */ typedef struct WordHood_Alphabet { gint ref_count; gint advance; gpointer user_data; gint input_index[ALPHABETSIZE]; gint output_index[ALPHABETSIZE]; GPtrArray *member_list; Submat *submat; CodonSubmat *codon_submat; gint (*score_func)(struct WordHood_Alphabet *wha, gchar *seq_a, gchar *seq_b); gboolean (*is_valid_func)(struct WordHood_Alphabet *wha, gchar *seq); gint (*index_func)(struct WordHood_Alphabet *wha, gchar *seq); } WordHood_Alphabet; /* Invalid input gets -1 from the index_func * Output is possible for all of member_list */ WordHood_Alphabet *WordHood_Alphabet_create_from_Submat( gchar *input_alphabet, gchar *output_alphabet, Submat *submat, gboolean case_sensitive_input); WordHood_Alphabet *WordHood_Alphabet_create_from_CodonSubmat( CodonSubmat *cs, gboolean case_sensitive_input); WordHood_Alphabet *WordHood_Alphabet_share(WordHood_Alphabet *wha); void WordHood_Alphabet_destroy(WordHood_Alphabet *wha); /* Using different input and output alphabets allows * generation of a neighbourhood covering redundant symbols * */ typedef gboolean (*WordHood_Traverse_Func)(gchar *word, gint score, gpointer user_data); /* Return TRUE to stop the traversal */ typedef struct { WordHood_Alphabet *wha; gint threshold; gboolean use_dropoff; gchar *orig_word; gchar *curr_word; gint *depth_threshold; gint word_pos; gint curr_score; gint curr_len; gint alloc_len; } WordHood; WordHood *WordHood_create(WordHood_Alphabet *wha, gint threshold, gboolean use_dropoff); /* When use_dropoff is FALSE: * - each word will have a score of at least the threshold * When use_dropoff is TRUE: * - each word will have a score within the threshold * of the score given by comparison to itself * * Thus threshold=0, use_dropoff=TRUE will yield the minimum wordhood * (ie. just valid words from the query). */ void WordHood_destroy(WordHood *wh); void WordHood_info(WordHood *wh); void WordHood_traverse(WordHood *wh, WordHood_Traverse_Func whtf, gchar *word, gint len, gpointer user_data); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_WORDHOOD_H */ exonerate-2.4.0/src/comparison/hspset.h0000644000175000017500000002145711222225703015055 00000000000000/****************************************************************\ * * * Library for HSP sets (high-scoring segment pairs) * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ /* Version 0.4 */ #ifndef INCLUDED_HSPSET_H #define INCLUDED_HSPSET_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "match.h" #include "argument.h" #include "recyclebin.h" #include "pqueue.h" #include "matrix.h" #include "wordhood.h" #include "recyclebin.h" #include "threadref.h" typedef struct { gint filter_threshold; gboolean use_wordhood_dropoff; gint seed_repeat; /**/ gint dna_wordlen; gint protein_wordlen; gint codon_wordlen; /**/ gint dna_hsp_dropoff; gint protein_hsp_dropoff; gint codon_hsp_dropoff; /**/ gint dna_hsp_threshold; gint protein_hsp_threshold; gint codon_hsp_threshold; /**/ gint dna_word_limit; gint protein_word_limit; gint codon_word_limit; /**/ gint geneseed_threshold; gint geneseed_repeat; /**/ } HSPset_ArgumentSet; HSPset_ArgumentSet *HSPset_ArgumentSet_create(Argument *arg); /**/ typedef struct { guint query_start; guint target_start; guint length; /* Length is number of match state visits */ Match_Score score; guint cobs; /* cobs == Centre Offset By Score */ struct HSPset *hsp_set; /* Never included in hsp_set->ref_count */ } HSP; /* FIXME: remove hsp_set from HSP to save space ? */ void HSP_destroy(HSP *hsp); #define HSP_query_advance(hsp) \ ((hsp)->hsp_set->param->match->query->advance) #define HSP_target_advance(hsp) \ ((hsp)->hsp_set->param->match->target->advance) #define HSP_query_end(hsp) \ ((hsp)->query_start \ + ((hsp)->length*HSP_query_advance(hsp))) #define HSP_target_end(hsp) \ ((hsp)->target_start \ + ((hsp)->length*HSP_target_advance(hsp))) #define HSP_query_cobs(hsp) \ ((hsp)->query_start \ +((hsp)->cobs*HSP_query_advance(hsp))) #define HSP_target_cobs(hsp) \ ((hsp)->target_start \ + ((hsp)->cobs*HSP_target_advance(hsp))) #define HSP_diagonal(hsp) \ (((hsp)->target_start*HSP_query_advance(hsp)) \ -((hsp)->query_start*HSP_target_advance(hsp))) /* advance_{query,target} are swapped for position on diagonal */ #define HSP_get_score(hsp, query_pos, target_pos) \ ((hsp)->hsp_set->param->match->score_func( \ (hsp)->hsp_set->param->match, \ (hsp)->hsp_set->query, (hsp)->hsp_set->target, \ (query_pos), (target_pos))) #define HSP_get_display(hsp, query_pos, target_pos, display_str) \ ((hsp)->hsp_set->param->match->display_func( \ (hsp)->hsp_set->param->match, \ (hsp)->hsp_set->query, (hsp)->hsp_set->target, \ (query_pos), (target_pos), display_str)) #define HSP_query_masked(hsp, query_pos) \ ((hsp)->hsp_set->param->match->query->mask_func( \ (hsp)->hsp_set->param->match->query, \ (hsp)->hsp_set->query, (query_pos))) #define HSP_target_masked(hsp, target_pos) \ ((hsp)->hsp_set->param->match->target->mask_func( \ (hsp)->hsp_set->param->match->target, \ (hsp)->hsp_set->target, (target_pos))) #define HSP_query_self(hsp, query_pos) \ ((hsp)->hsp_set->param->match->query->self_func( \ (hsp)->hsp_set->param->match->query, \ (hsp)->hsp_set->query, (query_pos))) #define HSP_target_self(hsp, target_pos) \ ((hsp)->hsp_set->param->match->target->self_func( \ (hsp)->hsp_set->param->match->target, \ (hsp)->hsp_set->target, (target_pos))) #define HSPset_is_empty(hspset) ((hspset)->is_empty) typedef struct HSP_Param { ThreadRef *thread_ref; HSPset_ArgumentSet *has; Match *match; gint wordlen; gint seedlen; Match_Score dropoff; Match_Score threshold; Match_Score wordlimit; WordHood *wordhood; gboolean use_horizon; gint seed_repeat; RecycleBin *hsp_recycle; #ifdef USE_PTHREADS pthread_mutex_t hsp_recycle_lock; #endif /* USE_PTHREADS */ } HSP_Param; HSP_Param *HSP_Param_create(Match *match, gboolean use_horizon); void HSP_Param_destroy(HSP_Param *hsp_param); HSP_Param *HSP_Param_share(HSP_Param *hsp_param); HSP_Param *HSP_Param_swap(HSP_Param *hsp_param); void HSP_Param_set_wordlen(HSP_Param *hsp_param, gint wordlen); /**/ void HSP_Param_set_dna_hsp_threshold(HSP_Param *hsp_param, gint dna_hsp_threshold); void HSP_Param_set_protein_hsp_threshold(HSP_Param *hsp_param, gint protein_hsp_threshold); void HSP_Param_set_codon_hsp_threshold(HSP_Param *hsp_param, gint protein_hsp_threshold); /**/ void HSP_Param_set_dna_word_limit(HSP_Param *hsp_param, gint dna_word_limit); void HSP_Param_set_protein_word_limit(HSP_Param *hsp_param, gint protein_word_limit); void HSP_Param_set_codon_word_limit(HSP_Param *hsp_param, gint codon_word_limit); /**/ void HSP_Param_set_dna_hsp_dropoff(HSP_Param *hsp_param, gint dna_hsp_dropoff); void HSP_Param_set_protein_hsp_dropoff(HSP_Param *hsp_param, gint dna_hsp_dropoff); void HSP_Param_set_codon_hsp_dropoff(HSP_Param *hsp_param, gint dna_hsp_dropoff); /**/ void HSP_Param_set_hsp_threshold(HSP_Param *hsp_param, gint hsp_threshold); void HSP_Param_set_seed_repeat(HSP_Param *hsp_param, gint seed_repeat); typedef struct HSPset { gint ref_count; Sequence *query; Sequence *target; HSP_Param *param; gint ****horizon; GPtrArray *hsp_list; /**/ gboolean is_finalised; PQueue **filter; gboolean is_empty; PQueueSet *pqueue_set; } HSPset; HSPset *HSPset_create(Sequence *query, Sequence *target, HSP_Param *hsp_param); HSPset *HSPset_share(HSPset *hsp_set); void HSPset_destroy(HSPset *hsp_set); void HSPset_swap(HSPset *hsp_set, HSP_Param *hsp_param); void HSPset_revcomp(HSPset *hsp_set); void HSPset_seed_hsp(HSPset *hsp_set, guint query_start, guint target_start); void HSPset_add_known_hsp(HSPset *hsp_set, guint query_start, guint target_start, guint length); void HSPset_seed_all_hsps(HSPset *hsp_set, guint *seed_list, guint seed_list_len); /* HSPset_seed_all_hsps() can only be called once on the HSPset. * It automatically finalises the HSPset * position_list should contain (qpos,tpos) pairs, * and may be sorted in place. */ HSPset *HSPset_finalise(HSPset *hsp_set); void HSP_print(HSP *hsp, gchar *name); void HSPset_print(HSPset *hsp_set); void HSPset_filter_ungapped(HSPset *hsp_set); /* Remove HSPs which overlap by more than 50% of the score. * This is used for 3:3 ungapped alignments. */ /**/ typedef struct HSPset_SList_Node { struct HSPset_SList_Node *next; gint query_pos; gint target_pos; } HSPset_SList_Node; RecycleBin *HSPset_SList_RecycleBin_create(void); HSPset_SList_Node *HSPset_SList_append(RecycleBin *recycle_bin, HSPset_SList_Node *next, gint query_pos, gint target_pos); void HSPset_seed_all_qy_sorted(HSPset *hsp_set, HSPset_SList_Node *seed_list); /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_HSPSET_H */ exonerate-2.4.0/src/comparison/pcr.h0000644000175000017500000000740611162714277014345 00000000000000/****************************************************************\ * * * Library for PCR simulation * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_PCR_H #define INCLUDED_PCR_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "fsm.h" #include "sequence.h" #include "wordhood.h" #include "slist.h" typedef struct { SList *owned_probe_list; SList *borrowed_sensor_list; } PCR_Sensor; /* pcr->fsm nodes contain PCR_Sensor objects. * * This is to allow both cases where: * (a) Multiple probes are present with the same sequence * (b) One probe is a subsequence of another */ typedef struct { struct PCR_Primer *pcr_primer; Sequence_Strand strand; gint mismatch; } PCR_Probe; /* A derived primer to facilitate approximate primer matching */ typedef struct { PCR_Probe *pcr_probe; gint position; /* Position at probe end */ gint mismatch; } PCR_Match; /* A primer annealed to a position on a sequence */ typedef struct PCR_Primer { struct PCR_Experiment *pcr_experiment; gint length; gint probe_len; gchar *forward; gchar *revcomp; GPtrArray *probe_list; } PCR_Primer; typedef struct PCR_Experiment { struct PCR *pcr; gchar *id; PCR_Primer *primer_a; PCR_Primer *primer_b; gint min_product_len; gint max_product_len; SList *match_list; gint product_count; } PCR_Experiment; /* A pair of primers to be simulated in a PCR reaction */ typedef gboolean (*PCR_ReportFunc)(Sequence *sequence, PCR_Match *match_a, PCR_Match *match_b, gint product_length, gpointer user_data); /* Return TRUE to stop the simulation */ typedef struct PCR { SListSet *slist_set; /* Lists of PCR_Probes */ FSM *fsm; GPtrArray *experiment_list; gint mismatch_threshold; gint seed_length; WordHood *wordhood; PCR_ReportFunc report_func; gboolean is_prepared; gsize experiment_memory_usage; gint sensor_count; GMemChunk *sensor_mem_chunk; gpointer user_data; GStringChunk *string_chunk; GMemChunk *probe_mem_chunk; GMemChunk *match_mem_chunk; PCR_Match *match_recycle; } PCR; PCR *PCR_create(PCR_ReportFunc report_func, gpointer user_data, gint mismatch_threshold, gint seed_length); void PCR_destroy(PCR *pcr); gsize PCR_add_experiment(PCR *pcr, gchar *id, gchar *primer_a, gchar *primer_b, gint min_product_len, gint max_product_len); /* Returns the memory currently being used by PCR */ void PCR_prepare(PCR *pcr); /* Called after last call to PCR_add_primer_pair() */ void PCR_simulate(PCR *pcr, Sequence *sequence); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_IPCRESS_H */ exonerate-2.4.0/src/comparison/comparison.h0000644000175000017500000000576511222434442015727 00000000000000/****************************************************************\ * * * Comparison : A module for pairwise sequence comparisons * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_COMPARISON_H #define INCLUDED_COMPARISON_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "argument.h" #include "alphabet.h" #include "sequence.h" #include "translate.h" #include "submat.h" #include "hspset.h" #include "threadref.h" typedef struct Comparison_Param { ThreadRef *thread_ref; Alphabet_Type query_type; Alphabet_Type target_type; HSP_Param *dna_hsp_param; HSP_Param *protein_hsp_param; HSP_Param *codon_hsp_param; struct Comparison_Param *mirror; } Comparison_Param; Comparison_Param *Comparison_Param_create(Alphabet_Type query_type, Alphabet_Type target_type, HSP_Param *dna_hsp_param, HSP_Param *protein_hsp_param, HSP_Param *codon_hsp_param); void Comparison_Param_destroy(Comparison_Param *param); Comparison_Param *Comparison_Param_share(Comparison_Param *param); Comparison_Param *Comparison_Param_swap(Comparison_Param *param); /**/ typedef struct { ThreadRef *thread_ref; Comparison_Param *param; Sequence *query; Sequence *target; HSPset *dna_hspset; HSPset *protein_hspset; HSPset *codon_hspset; } Comparison; Comparison *Comparison_create(Comparison_Param *param, Sequence *query, Sequence *target); Comparison *Comparison_share(Comparison *comparison); void Comparison_destroy(Comparison *comparison); void Comparison_print(Comparison *comparison); HSPset_ArgumentSet *Comparison_Param_get_HSPSet_Argument_Set( Comparison_Param *param); gboolean Comparison_has_hsps(Comparison *comparison); void Comparison_finalise(Comparison *comparison); void Comparison_swap(Comparison *comparison); void Comparison_revcomp(Comparison *comparison); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_COMPARISON_H */ exonerate-2.4.0/src/comparison/match.h0000644000175000017500000001102111214471727014637 00000000000000/****************************************************************\ * * * Match : A module for pairwise symbol comparison * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_MATCH_H #define INCLUDED_MATCH_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "argument.h" #include "alphabet.h" #include "sequence.h" #include "translate.h" #include "submat.h" #include "codonsubmat.h" typedef struct { gboolean softmasked_query; gboolean softmasked_target; Submat *dna_submat; Submat *protein_submat; /* CodonSubmat *codon_submat; */ Translate *translate; } Match_ArgumentSet; Match_ArgumentSet *Match_ArgumentSet_create(Argument *arg); /**/ typedef enum { Match_Type_DNA2DNA, Match_Type_PROTEIN2PROTEIN, Match_Type_DNA2PROTEIN, Match_Type_PROTEIN2DNA, Match_Type_CODON2CODON, Match_Type_TOTAL } Match_Type; Match_Type Match_Type_find(Alphabet_Type query_type, Alphabet_Type target_type, gboolean translate_both); gchar *Match_Type_get_name(Match_Type type); gchar *Match_Type_get_score_macro(Match_Type type); /**/ typedef gint Match_Score; #define MATCH_IMPOSSIBLY_LOW_SCORE -987654321 typedef struct Match_Strand { Alphabet *alphabet; guint advance; gboolean is_translated; Match_Score (*self_func)(struct Match_Strand *strand, Sequence *seq, guint pos); gboolean (*mask_func)(struct Match_Strand *strand, Sequence *seq, guint pos); struct Match *match; } Match_Strand; /* get_raw returns the raw sequence from that position * * self_func returns score of comparison to self at given position. * * mask_func returns TRUE if position contains any (soft)masked symbols. * */ void Match_Strand_get_raw(Match_Strand *strand, Sequence *sequence, guint pos, gchar *result); typedef struct Match { Match_ArgumentSet *mas; Match_Type type; Match_Strand *query; Match_Strand *target; Alphabet *comparison_alphabet; Match_Score (*score_func)(struct Match *match, Sequence *query, Sequence *target, guint query_pos, guint target_pos); void (*display_func)(struct Match *match, Sequence *query, Sequence *target, guint query_pos, guint target_pos, gchar *display_str); Match_Score (*split_score_func)(struct Match *match, Sequence *query, Sequence *target, guint qp1, guint qp2, guint qp3, guint tp1, guint tp2, guint tp3); void (*split_display_func)(struct Match *match, Sequence *query, Sequence *target, guint qp1, guint qp2, guint qp3, guint tp1, guint tp2, guint tp3, gchar *display_str); struct Match *mirror; } Match; /* score_func returns the score for comparison at a given position * display_func returns the equivalence symbol for a given position * * Match is a singleton object : created by first call to Match_find() * and destroyed by call to Match_destroy_all() */ Match *Match_find(Match_Type type); void Match_destroy_all(void); Match *Match_swap(Match *match); Match_Score Match_max_score(Match *match); /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_MATCH_H */ exonerate-2.4.0/src/comparison/seeder.h0000644000175000017500000001353411162714277015027 00000000000000/****************************************************************\ * * * Seeder : A module for seeding pairwise alignments * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_SEEDER_H #define INCLUDED_SEEDER_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "argument.h" #include "recyclebin.h" #include "hspset.h" #include "fsm.h" #include "vfsm.h" #include "comparison.h" typedef struct { gsize fsm_memory_limit; gchar *force_fsm; gint word_jump; gint word_ambiguity; } Seeder_ArgumentSet; Seeder_ArgumentSet *Seeder_ArgumentSet_create(Argument *arg); /**/ typedef struct { HSP_Param *hsp_param; gint tpos_modifier; gdouble saturate_expectation; size_t hspset_offset; } Seeder_Loader; typedef struct { Sequence *query; Comparison *curr_comparison; } Seeder_QueryInfo; typedef struct { Seeder_Loader *loader; Seeder_QueryInfo *query_info; } Seeder_Context; /**/ #if 0 typedef struct { gint query_pos; Seeder_Context *context; } Seeder_Seed; typedef struct Seeder_Seed_List { struct Seeder_Seed_List *next; SeederSeed seed; } Seeder_Seed_List; typedef struct { union { Seeder_Seed *seed; CompoundFile_Pos *offset; } data; gint total; /* -ve when extmem */ } Seeder_Seed_Array; typedef struct Seeder_Neighbour_Array { union { struct Seeder_WordInfo *word_info; CompoundFile_Pos *offset; } data; gint total; /* -ve when extmem */ } Seeder_Neighbour_Array; typedef struct Seeder_Neighbour_List { struct Seeder_Neighbour_List *next; struct Seeder_WordInfo *word_info; } Seeder_Neighbour_List ; typedef struct Seeder_WordInfo { union { Seeder_Seed_List *list; Seeder_Seed_Array *array; } seed; union { Seeder_Neighbour_List *list; Seeder_Neighbour_Array *array; } neighbour; /* Absent w/o wordhood */ gint match_count; /* Absent w/o satn thold */ gint match_mailbox; /* Absent w/o satn thold */ } Seeder_WordInfo; #endif /* 0 */ /**/ typedef struct Seeder_Seed { /* For each occurence of the word in a query */ struct Seeder_Seed *next; gint query_pos; Seeder_Context *context; } Seeder_Seed; typedef struct Seeder_Neighbour { /* For each neighbour of the FSM word */ struct Seeder_Neighbour *next; struct Seeder_WordInfo *word_info; } Seeder_Neighbour; typedef struct Seeder_WordInfo_no_ST { Seeder_Seed *seed_list; Seeder_Neighbour *neighbour_list; } Seeder_WordInfo_no_ST; typedef struct Seeder_WordInfo { /* For each word in the FSM */ Seeder_Seed *seed_list; Seeder_Neighbour *neighbour_list; /* Absent w/o wordhood */ gint match_count; /* Absent w/o satn thold */ gint match_mailbox; /* Absent w/o satn thold */ } Seeder_WordInfo; /* FIXME: remove neighbour_list and match_{count,mailbox} when unnecessary */ /* add compact list formats for {seed,neighbour}_list * Seeder_Seed_Compact {SeederSeed *data, gint length} * Seeder_Neighbour_Compact {SeederSeed *data, gint length} */ /**/ typedef struct { FSM *fsm; } Seeder_FSM; typedef struct { VFSM *vfsm; Seeder_WordInfo **leaf; } Seeder_VFSM; /**/ typedef void (*Seeder_ReportFunc)(Comparison *comparison, gpointer user_data); typedef struct { Seeder_ArgumentSet *sas; gboolean is_prepared; gint verbosity; Seeder_ReportFunc report_func; gpointer user_data; gint total_query_length; gint comparison_count; Match_Score saturate_threshold; Match_Strand *target_strand; Comparison_Param *comparison_param; GPtrArray *query_info_list; RecycleBin *recycle_wordinfo; RecycleBin *recycle_seed; RecycleBin *recycle_neighbour; RecycleBin *recycle_context; Seeder_FSM *seeder_fsm; /* NULL when not used */ Seeder_VFSM *seeder_vfsm; /* NULL when not used */ Seeder_Loader *dna_loader; Seeder_Loader *protein_loader; Seeder_Loader *codon_loader; GPtrArray *active_queryinfo_list; HSP_Param *any_hsp_param; } Seeder; Seeder *Seeder_create(gint verbosity, Comparison_Param *comparison_param, Match_Score saturate_threshold, Seeder_ReportFunc report_func, gpointer user_data); void Seeder_destroy(Seeder *seeder); gsize Seeder_memory_usage(Seeder *seeder); void Seeder_memory_info(Seeder *seeder); gboolean Seeder_add_query(Seeder *seeder, Sequence *query); void Seeder_add_target(Seeder *seeder, Sequence *target); /* Must add all queries before adding any targets */ /* FIXME: add Seeder_create_external(); * to add a dataset to an empty seeder */ /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_SEEDER_H */ exonerate-2.4.0/src/database/0000777000175000017500000000000011222703703013044 500000000000000exonerate-2.4.0/src/database/Makefile.am0000644000175000017500000000616511222643305015025 00000000000000 TESTS = fastadb.test fastapipe.test dataset.test index.test # seqfmi.test noinst_PROGRAMS = $(TESTS) INCLUDES = -I$(top_srcdir)/src/sequence \ -I$(top_srcdir)/src/struct \ -I$(top_srcdir)/src/general \ -I$(top_srcdir)/src/comparison \ -DCUSTOM_GUINT64_FORMAT="\"@custom_guint64_format@\"" noinst_HEADERS = fastadb.h fastapipe.h dataset.h index.h # seqfmi.h SEQUENCE_OBJ = $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ -lm fastadb_test_SOURCES = fastadb.test.c fastadb.c fastadb_test_LDADD = $(top_srcdir)/src/general/compoundfile.o \ $(SEQUENCE_OBJ) fastapipe_test_SOURCES = fastapipe.test.c fastapipe.c fastapipe_test_LDADD = fastadb.o \ $(top_srcdir)/src/general/compoundfile.o \ $(SEQUENCE_OBJ) dataset_test_SOURCES = dataset.test.c dataset.c dataset_test_LDADD = fastadb.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/struct/bitarray.o \ $(SEQUENCE_OBJ) index_test_SOURCES = index.test.c index.c index_test_LDADD = fastadb.o dataset.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/threadref.o \ $(top_srcdir)/src/struct/bitarray.o \ $(top_srcdir)/src/struct/vfsm.o \ $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/splaytree.o \ $(top_srcdir)/src/struct/noitree.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(SEQUENCE_OBJ) # seqfmi_test_SOURCES = seqfmi.test.c seqfmi.c # seqfmi_test_LDADD = fastadb.o \ # $(top_srcdir)/src/general/compoundfile.o \ # $(top_srcdir)/src/struct/fmindex.o \ # $(top_srcdir)/src/struct/bitarray.o \ # $(top_srcdir)/src/struct/pqueue.o \ # $(top_srcdir)/src/struct/recyclebin.o \ # $(top_srcdir)/src/struct/huffman.o \ # $(SEQUENCE_OBJ) # Files to clear away MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/src/database/Makefile.in0000644000175000017500000003305511222703647015042 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ TESTS = fastadb.test fastapipe.test dataset.test index.test # seqfmi.test noinst_PROGRAMS = $(TESTS) INCLUDES = -I$(top_srcdir)/src/sequence -I$(top_srcdir)/src/struct -I$(top_srcdir)/src/general -I$(top_srcdir)/src/comparison -DCUSTOM_GUINT64_FORMAT="\"@custom_guint64_format@\"" noinst_HEADERS = fastadb.h fastapipe.h dataset.h index.h # seqfmi.h SEQUENCE_OBJ = $(top_srcdir)/src/struct/sparsecache.o $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o $(top_srcdir)/src/sequence/alphabet.o $(top_srcdir)/src/sequence/splice.o $(top_srcdir)/src/sequence/translate.o $(top_srcdir)/src/general/argument.o $(top_srcdir)/src/general/lineparse.o -lm fastadb_test_SOURCES = fastadb.test.c fastadb.c fastadb_test_LDADD = $(top_srcdir)/src/general/compoundfile.o $(SEQUENCE_OBJ) fastapipe_test_SOURCES = fastapipe.test.c fastapipe.c fastapipe_test_LDADD = fastadb.o $(top_srcdir)/src/general/compoundfile.o $(SEQUENCE_OBJ) dataset_test_SOURCES = dataset.test.c dataset.c dataset_test_LDADD = fastadb.o $(top_srcdir)/src/general/compoundfile.o $(top_srcdir)/src/struct/bitarray.o $(SEQUENCE_OBJ) index_test_SOURCES = index.test.c index.c index_test_LDADD = fastadb.o dataset.o $(top_srcdir)/src/general/compoundfile.o $(top_srcdir)/src/general/threadref.o $(top_srcdir)/src/struct/bitarray.o $(top_srcdir)/src/struct/vfsm.o $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/rangetree.o $(top_srcdir)/src/struct/splaytree.o $(top_srcdir)/src/struct/noitree.o $(top_srcdir)/src/comparison/wordhood.o $(top_srcdir)/src/comparison/hspset.o $(top_srcdir)/src/comparison/match.o $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/sequence/codonsubmat.o $(SEQUENCE_OBJ) # seqfmi_test_SOURCES = seqfmi.test.c seqfmi.c # seqfmi_test_LDADD = fastadb.o \ # $(top_srcdir)/src/general/compoundfile.o \ # $(top_srcdir)/src/struct/fmindex.o \ # $(top_srcdir)/src/struct/bitarray.o \ # $(top_srcdir)/src/struct/pqueue.o \ # $(top_srcdir)/src/struct/recyclebin.o \ # $(top_srcdir)/src/struct/huffman.o \ # $(SEQUENCE_OBJ) # Files to clear away MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = PROGRAMS = $(noinst_PROGRAMS) DEFS = @DEFS@ -I. -I$(srcdir) CPPFLAGS = @CPPFLAGS@ LDFLAGS = @LDFLAGS@ LIBS = @LIBS@ fastadb_test_OBJECTS = fastadb.test.o fastadb.o fastadb_test_DEPENDENCIES = $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o fastadb_test_LDFLAGS = fastapipe_test_OBJECTS = fastapipe.test.o fastapipe.o fastapipe_test_DEPENDENCIES = fastadb.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o fastapipe_test_LDFLAGS = dataset_test_OBJECTS = dataset.test.o dataset.o dataset_test_DEPENDENCIES = fastadb.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/struct/bitarray.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o dataset_test_LDFLAGS = index_test_OBJECTS = index.test.o index.o index_test_DEPENDENCIES = fastadb.o dataset.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/threadref.o \ $(top_srcdir)/src/struct/bitarray.o $(top_srcdir)/src/struct/vfsm.o \ $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/splaytree.o $(top_srcdir)/src/struct/noitree.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o index_test_LDFLAGS = CFLAGS = @CFLAGS@ COMPILE = $(CC) $(DEFS) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) CCLD = $(CC) LINK = $(CCLD) $(AM_CFLAGS) $(CFLAGS) $(LDFLAGS) -o $@ HEADERS = $(noinst_HEADERS) DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best SOURCES = $(fastadb_test_SOURCES) $(fastapipe_test_SOURCES) $(dataset_test_SOURCES) $(index_test_SOURCES) OBJECTS = $(fastadb_test_OBJECTS) $(fastapipe_test_OBJECTS) $(dataset_test_OBJECTS) $(index_test_OBJECTS) all: all-redirect .SUFFIXES: .SUFFIXES: .S .c .o .s $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps src/database/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status mostlyclean-noinstPROGRAMS: clean-noinstPROGRAMS: -test -z "$(noinst_PROGRAMS)" || rm -f $(noinst_PROGRAMS) distclean-noinstPROGRAMS: maintainer-clean-noinstPROGRAMS: .c.o: $(COMPILE) -c $< .s.o: $(COMPILE) -c $< .S.o: $(COMPILE) -c $< mostlyclean-compile: -rm -f *.o core *.core clean-compile: distclean-compile: -rm -f *.tab.c maintainer-clean-compile: fastadb.test: $(fastadb_test_OBJECTS) $(fastadb_test_DEPENDENCIES) @rm -f fastadb.test $(LINK) $(fastadb_test_LDFLAGS) $(fastadb_test_OBJECTS) $(fastadb_test_LDADD) $(LIBS) fastapipe.test: $(fastapipe_test_OBJECTS) $(fastapipe_test_DEPENDENCIES) @rm -f fastapipe.test $(LINK) $(fastapipe_test_LDFLAGS) $(fastapipe_test_OBJECTS) $(fastapipe_test_LDADD) $(LIBS) dataset.test: $(dataset_test_OBJECTS) $(dataset_test_DEPENDENCIES) @rm -f dataset.test $(LINK) $(dataset_test_LDFLAGS) $(dataset_test_OBJECTS) $(dataset_test_LDADD) $(LIBS) index.test: $(index_test_OBJECTS) $(index_test_DEPENDENCIES) @rm -f index.test $(LINK) $(index_test_LDFLAGS) $(index_test_OBJECTS) $(index_test_LDADD) $(LIBS) tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = src/database distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done check-TESTS: $(TESTS) @failed=0; all=0; \ srcdir=$(srcdir); export srcdir; \ for tst in $(TESTS); do \ if test -f $$tst; then dir=.; \ else dir="$(srcdir)"; fi; \ if $(TESTS_ENVIRONMENT) $$dir/$$tst; then \ all=`expr $$all + 1`; \ echo "PASS: $$tst"; \ elif test $$? -ne 77; then \ all=`expr $$all + 1`; \ failed=`expr $$failed + 1`; \ echo "FAIL: $$tst"; \ fi; \ done; \ if test "$$failed" -eq 0; then \ banner="All $$all tests passed"; \ else \ banner="$$failed of $$all tests failed"; \ fi; \ dashes=`echo "$$banner" | sed s/./=/g`; \ echo "$$dashes"; \ echo "$$banner"; \ echo "$$dashes"; \ test "$$failed" -eq 0 info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am $(MAKE) $(AM_MAKEFLAGS) check-TESTS check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall: uninstall-am all-am: Makefile $(PROGRAMS) $(HEADERS) all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-noinstPROGRAMS mostlyclean-compile \ mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-noinstPROGRAMS clean-compile clean-tags clean-generic \ mostlyclean-am clean: clean-am distclean-am: distclean-noinstPROGRAMS distclean-compile distclean-tags \ distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-noinstPROGRAMS \ maintainer-clean-compile maintainer-clean-tags \ maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: mostlyclean-noinstPROGRAMS distclean-noinstPROGRAMS \ clean-noinstPROGRAMS maintainer-clean-noinstPROGRAMS \ mostlyclean-compile distclean-compile clean-compile \ maintainer-clean-compile tags mostlyclean-tags distclean-tags \ clean-tags maintainer-clean-tags distdir check-TESTS info-am info \ dvi-am dvi check check-am installcheck-am installcheck install-exec-am \ install-exec install-data-am install-data install-am install \ uninstall-am uninstall all-redirect all-am all installdirs \ mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/src/database/fastadb.test.c0000644000175000017500000000323211162714300015504 00000000000000/****************************************************************\ * * * Library for manipulation of FASTA format databases * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "fastadb.h" gint Argument_main(Argument *arg){ register FastaDB *fdb; register FastaDB_Seq *fdbs; register gchar *path; register Alphabet *alphabet; if(arg->argc == 2){ path = arg->argv[1]; alphabet = Alphabet_create(Alphabet_Type_UNKNOWN, FALSE); fdb = FastaDB_open(path, alphabet); Alphabet_destroy(alphabet); g_message("opened file [%s]", path); while((fdbs = FastaDB_next(fdb, FastaDB_Mask_ALL))){ g_message("read [%s]", fdbs->seq->id); FastaDB_Seq_print(fdbs, stdout, FastaDB_Mask_ALL); FastaDB_Seq_destroy(fdbs); } FastaDB_close(fdb); g_message("closed file [%s]", path); } return 0; } exonerate-2.4.0/src/database/fastadb.c0000644000175000017500000012322111216740101014525 00000000000000/****************************************************************\ * * * Library for manipulation of FASTA format databases * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For BUFSIZ */ #include #include /* For strerror() */ #include /* For isspace(),isprint() */ #include /* For stat() */ #include /* For stat() */ #include /* For stat() */ #include /* For readdir() */ #include "fastadb.h" #ifndef ALPHABET_SIZE #define ALPHABET_SIZE (sizeof(gchar)<d_name)); closedir(dir); return list; } static void FastaDB_expand_path_list_recur(GPtrArray *path_list, gchar *path, gchar *suffix_filter, gboolean is_subdir){ register GPtrArray *dir_content; register gchar *name, *full_path; register gint i; if(FastaDB_file_is_directory(path)){ dir_content = FastaDB_get_dir_contents(path); for(i = 0; i < dir_content->len; i++){ name = dir_content->pdata[i]; full_path = g_strdup_printf("%s%c%s", path, G_DIR_SEPARATOR, name); FastaDB_expand_path_list_recur(path_list, full_path, suffix_filter, TRUE); g_free(full_path); g_free(name); } g_ptr_array_free(dir_content, TRUE); } else { if(is_subdir && suffix_filter){ for(i = strlen(path)-1; i >= 0; i--) if(path[i] == '.') break; if((!i) || strcmp(path+i+1, suffix_filter)) return; } g_ptr_array_add(path_list, g_strdup(path)); } return; } static GPtrArray *FastaDB_expand_path_list(GPtrArray *path_list, gchar *suffix_filter){ register GPtrArray *expanded_path_list = g_ptr_array_new(); register gchar *path; register gint i; if(suffix_filter && (suffix_filter[0] == '.')) suffix_filter++; /* Skip any leading dot on suffix_filter */ for(i = 0; i < path_list->len; i++){ path = path_list->pdata[i]; FastaDB_expand_path_list_recur(expanded_path_list, path, suffix_filter, FALSE); } if(!expanded_path_list->len){ for(i = 0; i < path_list->len; i++) g_message("Empty path: [%s]", (gchar*)path_list->pdata[i]); g_error("No valid file paths found"); } return expanded_path_list; } FastaDB *FastaDB_open_list(GPtrArray *path_list, Alphabet *alphabet){ register FastaDB *fdb = g_new(FastaDB, 1); register FastaDB_ArgumentSet *fas = FastaDB_ArgumentSet_create(NULL); register gint i; register gchar *path; register GPtrArray *full_path_list; g_assert(path_list); fdb->ref_count = 1; if(alphabet) fdb->alphabet = Alphabet_share(alphabet); else fdb->alphabet = Alphabet_create(Alphabet_Type_UNKNOWN, FALSE); full_path_list = FastaDB_expand_path_list(path_list, fas->suffix_filter); fdb->cf = CompoundFile_create(full_path_list, TRUE); for(i = 0; i < full_path_list->len; i++){ path = full_path_list->pdata[i]; g_free(path); } g_ptr_array_free(full_path_list, TRUE); FastaDB_rewind(fdb); fdb->out_buffer_alloc = FASTADB_OUT_BUFFER_CHUNK_SIZE; fdb->out_buffer = g_malloc(sizeof(gchar)*fdb->out_buffer_alloc); fdb->line_length = -1; return fdb; } FastaDB *FastaDB_open_list_with_limit(GPtrArray *path_list, Alphabet *alphabet, gint chunk_id, gint chunk_total){ register FastaDB *fdb = FastaDB_open_list(path_list, alphabet); register CompoundFile_Pos start, stop, total_length, chunk_size; register CompoundFile_Location *start_cfl, *stop_cfl; if(chunk_total){ g_assert(chunk_id); if(chunk_total < 1) g_error("Chunk total [%d] is too small", chunk_total); if((chunk_id < 1) || (chunk_id > chunk_total)) g_error("Chunk id should be between 1 and %d", chunk_total); total_length = CompoundFile_get_length(fdb->cf); chunk_size = total_length / chunk_total; start = (chunk_id-1) * chunk_size; start = FastaDB_find_next_start(fdb, start); if(chunk_id == chunk_total){ stop = total_length - 1; } else { stop = chunk_id * chunk_size; stop = FastaDB_find_next_start(fdb, stop); } start_cfl = CompoundFile_Location_from_pos(fdb->cf, start); stop_cfl = CompoundFile_Location_from_pos(fdb->cf, stop); CompoundFile_set_limits(fdb->cf, start_cfl, stop_cfl); CompoundFile_Location_destroy(start_cfl); CompoundFile_Location_destroy(stop_cfl); FastaDB_rewind(fdb); } return fdb; } FastaDB *FastaDB_open(gchar *path, Alphabet *alphabet){ register FastaDB *fdb; register GPtrArray *path_list = g_ptr_array_new(); g_ptr_array_add(path_list, path); fdb = FastaDB_open_list(path_list, alphabet); g_ptr_array_free(path_list, TRUE); return fdb; } #define FastaDB_putc(fdb, ch) \ if(fdb->out_buffer_pos == fdb->out_buffer_alloc) \ FastaDB_putc_realloc(fdb); \ fdb->out_buffer[fdb->out_buffer_pos++] = (ch) static void FastaDB_putc_realloc(FastaDB *fdb){ fdb->out_buffer_alloc += FASTADB_OUT_BUFFER_CHUNK_SIZE; fdb->out_buffer = g_realloc(fdb->out_buffer, fdb->out_buffer_alloc); return; } /**/ FastaDB *FastaDB_share(FastaDB *fdb){ g_assert(fdb); fdb->ref_count++; return fdb; } FastaDB *FastaDB_dup(FastaDB *fdb){ register FastaDB *nfdb = g_new(FastaDB, 1); nfdb->ref_count = 1; nfdb->alphabet = Alphabet_create(fdb->alphabet->type, fdb->alphabet->is_soft_masked); nfdb->cf = CompoundFile_dup(fdb->cf); nfdb->out_buffer_alloc = FASTADB_OUT_BUFFER_CHUNK_SIZE; nfdb->out_buffer = g_malloc(sizeof(gchar)*nfdb->out_buffer_alloc); nfdb->line_length = -1; FastaDB_rewind(nfdb); return nfdb; } void FastaDB_close(FastaDB *fdb){ g_assert(fdb); if(--fdb->ref_count) return; CompoundFile_destroy(fdb->cf); Alphabet_destroy(fdb->alphabet); g_free(fdb->out_buffer); g_free(fdb); return; } void FastaDB_rewind(FastaDB *fdb){ register gint ch, prev = '\n'; g_assert(fdb); CompoundFile_rewind(fdb->cf); while((ch = CompoundFile_getc(fdb->cf)) != EOF){ if((ch == '>') && (prev == '\n')) break; prev = ch; } return; } CompoundFile_Pos FastaDB_find_next_start(FastaDB *fdb, CompoundFile_Pos pos){ register gint ch, prev = '\n'; g_assert(fdb->cf->element_list->len == 1); CompoundFile_seek(fdb->cf, pos); while((ch = CompoundFile_getc(fdb->cf)) != EOF){ if((ch == '>') && (prev == '\n')) break; prev = ch; } return CompoundFile_ftell(fdb->cf)-1; } gboolean FastaDB_file_is_fasta(gchar *path){ register FILE *fp = fopen(path, "r"); register gint ch; register gboolean result = FALSE; if(!fp) g_error("Could not open file [%s]", path); while(((ch = getc(fp)) != EOF)){ if(isspace(ch)) continue; else if(ch == '>'){ /* Assume is fasta format */ result = TRUE; break; } else break; } fclose(fp); return result; } /* Assumes a file is fasta format * if the first non-whitespace character is '>' */ gboolean FastaDB_is_finished(FastaDB *fdb){ g_assert(fdb); return CompoundFile_is_finished(fdb->cf); } void FastaDB_traverse(FastaDB *fdb, FastaDB_Mask mask, FastaDB_TraverseFunc fdtf, gpointer user_data){ register FastaDB_Seq *fdbs; g_assert(fdb); g_assert(fdtf); while((fdbs = FastaDB_next(fdb, mask))){ if(fdtf(fdbs, user_data)){ FastaDB_Seq_destroy(fdbs); break; } FastaDB_Seq_destroy(fdbs); } return; } gsize FastaDB_memory_usage(FastaDB *fdb){ return sizeof(FastaDB) + sizeof(Alphabet) + sizeof(CompoundFile) + (sizeof(gchar)*fdb->out_buffer_alloc); } /**/ static FastaDB_Seq *FastaDB_Seq_create(FastaDB *fdb, Sequence *seq, CompoundFile_Location *cfl){ register FastaDB_Seq *fdbs = g_new0(FastaDB_Seq, 1); fdbs->ref_count = 1; fdbs->source = FastaDB_share(fdb); fdbs->location = CompoundFile_Location_share(cfl); fdbs->seq = Sequence_share(seq); return fdbs; } FastaDB_Seq *FastaDB_Seq_share(FastaDB_Seq *fdbs){ g_assert(fdbs); fdbs->ref_count++; return fdbs; } void FastaDB_Seq_destroy(FastaDB_Seq *fdbs){ g_assert(fdbs); if(--fdbs->ref_count) return; CompoundFile_Location_destroy(fdbs->location); FastaDB_close(fdbs->source); Sequence_destroy(fdbs->seq); g_free(fdbs); return; } FastaDB_Seq *FastaDB_Seq_revcomp(FastaDB_Seq *fdbs){ register FastaDB_Seq *revcomp_fdbs; g_assert(fdbs); revcomp_fdbs = FastaDB_Seq_create(fdbs->source, fdbs->seq, fdbs->location); Sequence_destroy(revcomp_fdbs->seq); revcomp_fdbs->seq = Sequence_revcomp(fdbs->seq); return revcomp_fdbs; } /**/ FastaDB_Seq *FastaDB_next(FastaDB *fdb, FastaDB_Mask mask){ register FastaDB_Seq *fdbs; register gint ch; register gint id_pos = -1, def_pos = -1, seq_pos = -1, seq_len = -1; register gint prev_len = -1, curr_length, line_start_seq_pos; register Sequence *seq; register CompoundFile_Location *location = CompoundFile_Location_tell(fdb->cf); fdb->out_buffer_pos = 0; /* Clear output buffer */ if(mask & FastaDB_Mask_ID){ /* Record ID */ id_pos = 0; while((ch = CompoundFile_getc(fdb->cf)) != EOF){ /* if((ch == '\n') || (ch == ' ') || (ch == '\t')) */ if(isspace(ch)) break; FastaDB_putc(fdb, ch); } FastaDB_putc(fdb, '\0'); } else { /* Skip ID */ while((ch = CompoundFile_getc(fdb->cf)) != EOF){ /* if((ch == '\n') || (ch == ' ') || (ch == '\t')) */ if(isspace(ch)) break; } } if(ch == EOF){ CompoundFile_Location_destroy(location); return NULL; } if(ch != '\n'){ /* If DEF is present */ if(mask & FastaDB_Mask_DEF){ /* Record DEF */ def_pos = fdb->out_buffer_pos; while((ch = CompoundFile_getc(fdb->cf)) != EOF){ if(ch == '\n') break; FastaDB_putc(fdb, ch); } FastaDB_putc(fdb, '\0'); } else { /* Skip DEF */ while((ch = CompoundFile_getc(fdb->cf)) != EOF){ if(ch == '\n') break; } } } if(ch == EOF){ CompoundFile_Location_destroy(location); return NULL; } if(mask & FastaDB_Mask_SEQ){ /* Record SEQ and LEN */ seq_pos = fdb->out_buffer_pos; line_start_seq_pos = seq_pos; while(((ch = CompoundFile_getc(fdb->cf)) != EOF) && (ch != '>')){ if(Alphabet_symbol_is_valid(fdb->alphabet, ch)){ FastaDB_putc(fdb, ch); } else if(isspace(ch)){ if(ch == '\n'){ curr_length = fdb->out_buffer_pos - line_start_seq_pos; line_start_seq_pos = fdb->out_buffer_pos; if(prev_len != -1){ if(fdb->line_length == -1){ fdb->line_length = prev_len; } else { if(prev_len != fdb->line_length) fdb->line_length = 0; } } prev_len = curr_length; /**/ } } else { g_error("Unrecognised symbol \'%c\' (ascii:%d)" " file:[%s] seq:[%s] pos:[%d]", isprint(ch)?ch:' ', ch, CompoundFile_current_path(fdb->cf), (id_pos == -1)?"unknown":&fdb->out_buffer[id_pos], (fdb->out_buffer_pos - seq_pos - 1)); } } FastaDB_putc(fdb, '\0'); seq_len = fdb->out_buffer_pos - seq_pos - 1; if(fdb->line_length >= 0) if(fdb->line_length < prev_len) fdb->line_length = 0; } else if(mask & FastaDB_Mask_LEN){ /* Skip SEQ and record LEN */ seq_len = 0; while(((ch = CompoundFile_getc(fdb->cf)) != EOF) && (ch != '>')){ if( ((ch >= 'A') && (ch <= 'Z')) || ((ch >= 'a') && (ch <= 'z'))){ seq_len++; } } } else { /* Just skip SEQ */ while((ch = CompoundFile_getc(fdb->cf)) != EOF){ if(ch == '>') break; } } seq = Sequence_create( ( id_pos == -1)?NULL:&fdb->out_buffer[id_pos], (def_pos == -1)?NULL:&fdb->out_buffer[def_pos], (seq_pos == -1)?NULL:&fdb->out_buffer[seq_pos], (seq_len == -1)?0:seq_len, (fdb->alphabet->type == Alphabet_Type_DNA) ? Sequence_Strand_FORWARD : Sequence_Strand_UNKNOWN, fdb->alphabet); fdbs = FastaDB_Seq_create(fdb, seq, location); CompoundFile_Location_destroy(location); Sequence_destroy(seq); if((mask & FastaDB_Mask_LEN) &&(!seq->len)) g_warning("Warning zero length sequence [%s]", seq->id); return fdbs; } /**/ FastaDB_Key *FastaDB_Key_create(FastaDB *source, CompoundFile_Location *location, Sequence_Strand strand, gint seq_offset, gint length){ register FastaDB_Key *fdbk = g_new(FastaDB_Key, 1); g_assert(source); fdbk->source = FastaDB_share(source); fdbk->location = CompoundFile_Location_share(location); fdbk->strand = strand; fdbk->seq_offset = seq_offset; fdbk->length = length; return fdbk; } void FastaDB_Key_destroy(FastaDB_Key *fdbk){ g_assert(fdbk); FastaDB_close(fdbk->source); CompoundFile_Location_destroy(fdbk->location); g_free(fdbk); return; } FastaDB_Key *FastaDB_Seq_get_key(FastaDB_Seq *fdbs){ register FastaDB_Key *fdbk; register gint seq_offset = strlen(fdbs->seq->id) + 2 + (fdbs->seq->def ? (strlen(fdbs->seq->def)+1) : 0); g_assert(fdbs); fdbk = FastaDB_Key_create(fdbs->source, fdbs->location, fdbs->seq->strand, seq_offset, fdbs->seq->len); return fdbk; } FastaDB_Seq *FastaDB_fetch(FastaDB *fdb, FastaDB_Mask mask, CompoundFile_Pos pos){ register FastaDB_Seq *fdbs = NULL; register CompoundFile_Pos orig_pos = CompoundFile_ftell(fdb->cf); g_assert(fdb); g_assert(fdb->cf->element_list->len == 1); CompoundFile_seek(fdb->cf, pos); fdbs = FastaDB_next(fdb, mask); CompoundFile_seek(fdb->cf, orig_pos); return fdbs; } FastaDB_Seq *FastaDB_Key_get_seq(FastaDB_Key *fdbk, FastaDB_Mask mask){ register FastaDB_Seq *fdbs; register CompoundFile_Location *orig; register Sequence *revcomp_sequence; g_assert(fdbk); orig = CompoundFile_Location_tell(fdbk->source->cf); CompoundFile_Location_seek(fdbk->location); fdbs = FastaDB_next(fdbk->source, mask); g_assert(fdbs); CompoundFile_Location_seek(orig); CompoundFile_Location_destroy(orig); if(fdbk->strand == Sequence_Strand_REVCOMP){ g_assert(fdbs->seq->strand == Sequence_Strand_FORWARD); fdbs->seq->strand = Sequence_Strand_REVCOMP; revcomp_sequence = Sequence_revcomp(fdbs->seq); Sequence_destroy(fdbs->seq); fdbs->seq = revcomp_sequence; } return fdbs; } gchar *FastaDB_Key_get_def(FastaDB_Key *fdbk){ register gint ch; register gchar *def = NULL; register GString *s; #ifdef USE_PTHREADS pthread_mutex_lock(&fdbk->source->cf->compoundfile_mutex); #endif /* USE_PTHREADS */ CompoundFile_Location_seek(fdbk->location); while((ch = CompoundFile_getc(fdbk->source->cf)) != EOF) if((ch == '\n') || (ch == ' ') || (ch == '\t')) break; if(ch == '\n'){ #ifdef USE_PTHREADS pthread_mutex_unlock(&fdbk->source->cf->compoundfile_mutex); #endif /* USE_PTHREADS */ return NULL; } s = g_string_sized_new(64); while((ch = CompoundFile_getc(fdbk->source->cf)) != EOF){ if(ch == '\n') break; s = g_string_append_c(s, ch); } #ifdef USE_PTHREADS pthread_mutex_unlock(&fdbk->source->cf->compoundfile_mutex); #endif /* USE_PTHREADS */ def = s->str; g_string_free(s, FALSE); return def; } /**/ static gpointer FastaDB_SparseCache_get_func_8_BIT(gint pos, gpointer page_data, gpointer user_data){ return GINT_TO_POINTER((gint)((gchar*)page_data)[pos]); } static gint FastaDB_SparseCache_copy_func_8_BIT(gint start, gint length, gchar *dst, gpointer page_data, gpointer user_data){ register gint i; register gchar *str = page_data; for(i = 0; i < length; i++) dst[i] = str[start+i]; return length; } static gpointer FastaDB_SparseCache_get_func_0_BIT_LC(gint pos, gpointer page_data, gpointer user_data){ return GINT_TO_POINTER((gint)'n'); } static gint FastaDB_SparseCache_copy_func_0_BIT_LC(gint start, gint length, gchar *dst, gpointer page_data, gpointer user_data){ register gint i; for(i = 0; i < length; i++) dst[i] = 'n'; return length; } static gpointer FastaDB_SparseCache_get_func_0_BIT_UC(gint pos, gpointer page_data, gpointer user_data){ return GINT_TO_POINTER((gint)'N'); } static gint FastaDB_SparseCache_copy_func_0_BIT_UC(gint start, gint length, gchar *dst, gpointer page_data, gpointer user_data){ register gint i; for(i = 0; i < length; i++) dst[i] = 'N'; return length; } typedef enum { FastaDB_SparseCache_Mask_0_BIT_LC, /* [n] */ FastaDB_SparseCache_Mask_0_BIT_UC, /* [N] */ FastaDB_SparseCache_Mask_2_BIT_LC, /* [acgt] */ FastaDB_SparseCache_Mask_2_BIT_UC, /* [ACGT] */ FastaDB_SparseCache_Mask_4_BIT, /* [ACGTNacgtn] */ FastaDB_SparseCache_Mask_8_BIT } FastaDB_SparseCache_Mask; static void FastaDB_SparseCache_fill_mask(FastaDB_SparseCache_Mask *mask_index){ register gint i; for(i = 0; i < ALPHABET_SIZE; i++) mask_index[i] = (1 << FastaDB_SparseCache_Mask_8_BIT); mask_index['N'] = (1 << FastaDB_SparseCache_Mask_0_BIT_UC); mask_index['n'] = (1 << FastaDB_SparseCache_Mask_0_BIT_LC); mask_index['A'] = mask_index['C'] = mask_index['G'] = mask_index['T'] = (1 << FastaDB_SparseCache_Mask_2_BIT_UC); mask_index['a'] = mask_index['c'] = mask_index['g'] = mask_index['t'] = (1 << FastaDB_SparseCache_Mask_2_BIT_LC); return; } /**/ static void FastaDB_SparseCache_fill_index(gint *index, gchar *alphabet){ register gint i; for(i = 0; i < ALPHABET_SIZE; i++) index[i] = -1; for(i = 0; alphabet[i]; i++) index[(guchar)alphabet[i]] = i; return; } static gpointer FastaDB_SparseCache_get_func_4_BIT(gint pos, gpointer page_data, gpointer user_data){ register gint ch = ((gchar*)page_data)[(pos >> 1)], num = (ch >> ((pos & 1) << 2)) & 15; static gchar *alphabet = "ACGTNacgtn------"; return GINT_TO_POINTER((gint)alphabet[num]); } static gint FastaDB_SparseCache_copy_func_4_BIT(gint start, gint length, gchar *dst, gpointer page_data, gpointer user_data){ #ifndef G_DISABLE_ASSERT register gint i; #endif /* G_DISABLE_ASSERT */ register gint pos = 0, end = length; register gchar *str = page_data, *word; static gchar *table[256] = { "AA", "CA", "GA", "TA", "NA", "aA", "cA", "gA", "tA", "nA", "-A", "-A", "-A", "-A", "-A", "-A", "AC", "CC", "GC", "TC", "NC", "aC", "cC", "gC", "tC", "nC", "-C", "-C", "-C", "-C", "-C", "-C", "AG", "CG", "GG", "TG", "NG", "aG", "cG", "gG", "tG", "nG", "-G", "-G", "-G", "-G", "-G", "-G", "AT", "CT", "GT", "TT", "NT", "aT", "cT", "gT", "tT", "nT", "-T", "-T", "-T", "-T", "-T", "-T", "AN", "CN", "GN", "TN", "NN", "aN", "cN", "gN", "tN", "nN", "-N", "-N", "-N", "-N", "-N", "-N", "Aa", "Ca", "Ga", "Ta", "Na", "aa", "ca", "ga", "ta", "na", "-a", "-a", "-a", "-a", "-a", "-a", "Ac", "Cc", "Gc", "Tc", "Nc", "ac", "cc", "gc", "tc", "nc", "-c", "-c", "-c", "-c", "-c", "-c", "Ag", "Cg", "Gg", "Tg", "Ng", "ag", "cg", "gg", "tg", "ng", "-g", "-g", "-g", "-g", "-g", "-g", "At", "Ct", "Gt", "Tt", "Nt", "at", "ct", "gt", "tt", "nt", "-t", "-t", "-t", "-t", "-t", "-t", "An", "Cn", "Gn", "Tn", "Nn", "an", "cn", "gn", "tn", "nn", "-n", "-n", "-n", "-n", "-n", "-n", "A-", "C-", "G-", "T-", "N-", "a-", "c-", "g-", "t-", "n-", "--", "--", "--", "--", "--", "--", "A-", "C-", "G-", "T-", "N-", "a-", "c-", "g-", "t-", "n-", "--", "--", "--", "--", "--", "--", "A-", "C-", "G-", "T-", "N-", "a-", "c-", "g-", "t-", "n-", "--", "--", "--", "--", "--", "--", "A-", "C-", "G-", "T-", "N-", "a-", "c-", "g-", "t-", "n-", "--", "--", "--", "--", "--", "--", "A-", "C-", "G-", "T-", "N-", "a-", "c-", "g-", "t-", "n-", "--", "--", "--", "--", "--", "--", "A-", "C-", "G-", "T-", "N-", "a-", "c-", "g-", "t-", "n-", "--", "--", "--", "--", "--", "--" }; g_assert(length); if(start&1){ /* If odd start, get first base */ dst[0] = GPOINTER_TO_INT(FastaDB_SparseCache_get_func_4_BIT(start, page_data, user_data)); pos = 1; } if((length > 2) && ((start+length)&1)){ /* If odd end, get last base */ dst[length-1] = GPOINTER_TO_INT(FastaDB_SparseCache_get_func_4_BIT( start+length-1, page_data, user_data)); end--; } while(pos < end){ word = table[(guchar)str[(start+pos)>>1]]; g_assert(word[0] != '-'); g_assert(word[1] != '-'); dst[pos++] = word[0]; dst[pos++] = word[1]; } #ifndef G_DISABLE_ASSERT for(i = 0; i < length; i++){ g_assert(dst[i] == GPOINTER_TO_INT(FastaDB_SparseCache_get_func_4_BIT( start+i, page_data, user_data))); } #endif /* G_DISABLE_ASSERT */ return length; } static void FastaDB_SparseCache_compress_4bit(SparseCache_Page *page, gint len){ register gint i; register gchar *seq = page->data; static gint index[ALPHABET_SIZE]; static gboolean index_is_filled = FALSE; if(!index_is_filled){ FastaDB_SparseCache_fill_index(index, "ACGTNacgtn"); index_is_filled = TRUE; } page->data = g_new0(gchar, (len >> 1)+1); page->data_size = sizeof(gchar)*((len >> 1)+1); for(i = 0; i < len; i++) ((gchar*)page->data)[(i >> 1)] |= (index[(guchar)seq[i]] << ((i&1) << 2)); g_free(seq); page->get_func = FastaDB_SparseCache_get_func_4_BIT; page->copy_func = FastaDB_SparseCache_copy_func_4_BIT; return; } /**/ static gpointer FastaDB_SparseCache_get_func_2_BIT_LC(gint pos, gpointer page_data, gpointer user_data){ register gint ch = ((gchar*)page_data)[(pos >> 2)], num = (ch >> ((pos & 3) << 1)) & 3; static gchar *alphabet = "acgt"; return GINT_TO_POINTER((gint)alphabet[num]); } static gint FastaDB_SparseCache_copy_func_2_BIT_LC(gint start, gint length, gchar *dst, gpointer page_data, gpointer user_data){ #ifndef G_DISABLE_ASSERT register gint i; #endif /* G_DISABLE_ASSERT */ register gint pos = 0, end = length; register gchar *str = page_data, *word; gchar *table[256] = { "aaaa", "caaa", "gaaa", "taaa", "acaa", "ccaa", "gcaa", "tcaa", "agaa", "cgaa", "ggaa", "tgaa", "ataa", "ctaa", "gtaa", "ttaa", "aaca", "caca", "gaca", "taca", "acca", "ccca", "gcca", "tcca", "agca", "cgca", "ggca", "tgca", "atca", "ctca", "gtca", "ttca", "aaga", "caga", "gaga", "taga", "acga", "ccga", "gcga", "tcga", "agga", "cgga", "ggga", "tgga", "atga", "ctga", "gtga", "ttga", "aata", "cata", "gata", "tata", "acta", "ccta", "gcta", "tcta", "agta", "cgta", "ggta", "tgta", "atta", "ctta", "gtta", "ttta", "aaac", "caac", "gaac", "taac", "acac", "ccac", "gcac", "tcac", "agac", "cgac", "ggac", "tgac", "atac", "ctac", "gtac", "ttac", "aacc", "cacc", "gacc", "tacc", "accc", "cccc", "gccc", "tccc", "agcc", "cgcc", "ggcc", "tgcc", "atcc", "ctcc", "gtcc", "ttcc", "aagc", "cagc", "gagc", "tagc", "acgc", "ccgc", "gcgc", "tcgc", "aggc", "cggc", "gggc", "tggc", "atgc", "ctgc", "gtgc", "ttgc", "aatc", "catc", "gatc", "tatc", "actc", "cctc", "gctc", "tctc", "agtc", "cgtc", "ggtc", "tgtc", "attc", "cttc", "gttc", "tttc", "aaag", "caag", "gaag", "taag", "acag", "ccag", "gcag", "tcag", "agag", "cgag", "ggag", "tgag", "atag", "ctag", "gtag", "ttag", "aacg", "cacg", "gacg", "tacg", "accg", "cccg", "gccg", "tccg", "agcg", "cgcg", "ggcg", "tgcg", "atcg", "ctcg", "gtcg", "ttcg", "aagg", "cagg", "gagg", "tagg", "acgg", "ccgg", "gcgg", "tcgg", "aggg", "cggg", "gggg", "tggg", "atgg", "ctgg", "gtgg", "ttgg", "aatg", "catg", "gatg", "tatg", "actg", "cctg", "gctg", "tctg", "agtg", "cgtg", "ggtg", "tgtg", "attg", "cttg", "gttg", "tttg", "aaat", "caat", "gaat", "taat", "acat", "ccat", "gcat", "tcat", "agat", "cgat", "ggat", "tgat", "atat", "ctat", "gtat", "ttat", "aact", "cact", "gact", "tact", "acct", "ccct", "gcct", "tcct", "agct", "cgct", "ggct", "tgct", "atct", "ctct", "gtct", "ttct", "aagt", "cagt", "gagt", "tagt", "acgt", "ccgt", "gcgt", "tcgt", "aggt", "cggt", "gggt", "tggt", "atgt", "ctgt", "gtgt", "ttgt", "aatt", "catt", "gatt", "tatt", "actt", "cctt", "gctt", "tctt", "agtt", "cgtt", "ggtt", "tgtt", "attt", "cttt", "gttt", "tttt", }; while((start+pos)&3){ dst[pos] = GPOINTER_TO_INT(FastaDB_SparseCache_get_func_2_BIT_LC( start+pos, page_data, user_data)); pos++; } if(length > 4){ while(end&3){ dst[end-start-1] = GPOINTER_TO_INT(FastaDB_SparseCache_get_func_2_BIT_LC( end-1, page_data, user_data)); end--; } } while(pos < end){ word = table[(guchar)str[(start+pos)>>2]]; dst[pos++] = word[0]; dst[pos++] = word[1]; dst[pos++] = word[2]; dst[pos++] = word[3]; } #ifndef G_DISABLE_ASSERT for(i = 0; i < length; i++){ g_assert(dst[i] == GPOINTER_TO_INT(FastaDB_SparseCache_get_func_2_BIT_LC( start+i, page_data, user_data))); } #endif /* G_DISABLE_ASSERT */ return length; } static void FastaDB_SparseCache_compress_2bit_lc(SparseCache_Page *page, gint len){ register gint i; register gchar *seq = page->data; static gint index[ALPHABET_SIZE]; static gboolean index_is_filled = FALSE; if(!index_is_filled){ FastaDB_SparseCache_fill_index(index, "acgt"); index_is_filled = TRUE; } page->data = g_new0(gchar, (len >> 2)+1); page->data_size = sizeof(gchar)*((len >> 2)+1); for(i = 0; i < len; i++) ((gchar*)page->data)[(i >> 2)] |= (index[(guchar)seq[i]] << ((i&3) << 1)); g_free(seq); page->get_func = FastaDB_SparseCache_get_func_2_BIT_LC; page->copy_func = FastaDB_SparseCache_copy_func_2_BIT_LC; return; } static gpointer FastaDB_SparseCache_get_func_2_BIT_UC(gint pos, gpointer page_data, gpointer user_data){ register gint ch = ((gchar*)page_data)[(pos >> 2)], num = (ch >> ((pos & 3) << 1)) & 3; static gchar *alphabet = "ACGT"; return GINT_TO_POINTER((gint)alphabet[num]); } static gint FastaDB_SparseCache_copy_func_2_BIT_UC(gint start, gint length, gchar *dst, gpointer page_data, gpointer user_data){ #ifndef G_DISABLE_ASSERT register gint i; #endif /* G_DISABLE_ASSERT */ register gint pos = 0, end = length; register gchar *str = page_data, *word; gchar *table[256] = { "AAAA", "CAAA", "GAAA", "TAAA", "ACAA", "CCAA", "GCAA", "TCAA", "AGAA", "CGAA", "GGAA", "TGAA", "ATAA", "CTAA", "GTAA", "TTAA", "AACA", "CACA", "GACA", "TACA", "ACCA", "CCCA", "GCCA", "TCCA", "AGCA", "CGCA", "GGCA", "TGCA", "ATCA", "CTCA", "GTCA", "TTCA", "AAGA", "CAGA", "GAGA", "TAGA", "ACGA", "CCGA", "GCGA", "TCGA", "AGGA", "CGGA", "GGGA", "TGGA", "ATGA", "CTGA", "GTGA", "TTGA", "AATA", "CATA", "GATA", "TATA", "ACTA", "CCTA", "GCTA", "TCTA", "AGTA", "CGTA", "GGTA", "TGTA", "ATTA", "CTTA", "GTTA", "TTTA", "AAAC", "CAAC", "GAAC", "TAAC", "ACAC", "CCAC", "GCAC", "TCAC", "AGAC", "CGAC", "GGAC", "TGAC", "ATAC", "CTAC", "GTAC", "TTAC", "AACC", "CACC", "GACC", "TACC", "ACCC", "CCCC", "GCCC", "TCCC", "AGCC", "CGCC", "GGCC", "TGCC", "ATCC", "CTCC", "GTCC", "TTCC", "AAGC", "CAGC", "GAGC", "TAGC", "ACGC", "CCGC", "GCGC", "TCGC", "AGGC", "CGGC", "GGGC", "TGGC", "ATGC", "CTGC", "GTGC", "TTGC", "AATC", "CATC", "GATC", "TATC", "ACTC", "CCTC", "GCTC", "TCTC", "AGTC", "CGTC", "GGTC", "TGTC", "ATTC", "CTTC", "GTTC", "TTTC", "AAAG", "CAAG", "GAAG", "TAAG", "ACAG", "CCAG", "GCAG", "TCAG", "AGAG", "CGAG", "GGAG", "TGAG", "ATAG", "CTAG", "GTAG", "TTAG", "AACG", "CACG", "GACG", "TACG", "ACCG", "CCCG", "GCCG", "TCCG", "AGCG", "CGCG", "GGCG", "TGCG", "ATCG", "CTCG", "GTCG", "TTCG", "AAGG", "CAGG", "GAGG", "TAGG", "ACGG", "CCGG", "GCGG", "TCGG", "AGGG", "CGGG", "GGGG", "TGGG", "ATGG", "CTGG", "GTGG", "TTGG", "AATG", "CATG", "GATG", "TATG", "ACTG", "CCTG", "GCTG", "TCTG", "AGTG", "CGTG", "GGTG", "TGTG", "ATTG", "CTTG", "GTTG", "TTTG", "AAAT", "CAAT", "GAAT", "TAAT", "ACAT", "CCAT", "GCAT", "TCAT", "AGAT", "CGAT", "GGAT", "TGAT", "ATAT", "CTAT", "GTAT", "TTAT", "AACT", "CACT", "GACT", "TACT", "ACCT", "CCCT", "GCCT", "TCCT", "AGCT", "CGCT", "GGCT", "TGCT", "ATCT", "CTCT", "GTCT", "TTCT", "AAGT", "CAGT", "GAGT", "TAGT", "ACGT", "CCGT", "GCGT", "TCGT", "AGGT", "CGGT", "GGGT", "TGGT", "ATGT", "CTGT", "GTGT", "TTGT", "AATT", "CATT", "GATT", "TATT", "ACTT", "CCTT", "GCTT", "TCTT", "AGTT", "CGTT", "GGTT", "TGTT", "ATTT", "CTTT", "GTTT", "TTTT", }; while((start+pos)&3){ dst[pos] = GPOINTER_TO_INT(FastaDB_SparseCache_get_func_2_BIT_UC( start+pos, page_data, user_data)); pos++; } if(length > 4){ while(end&3){ dst[end-start-1] = GPOINTER_TO_INT(FastaDB_SparseCache_get_func_2_BIT_UC(end-1, page_data, user_data)); end--; } } while(pos < end){ word = table[(guchar)str[(start+pos)>>2]]; dst[pos++] = word[0]; dst[pos++] = word[1]; dst[pos++] = word[2]; dst[pos++] = word[3]; } #ifndef G_DISABLE_ASSERT for(i = 0; i < length; i++){ g_assert(dst[i] == GPOINTER_TO_INT(FastaDB_SparseCache_get_func_2_BIT_UC( start+i, page_data, user_data))); } #endif /* G_DISABLE_ASSERT */ return length; } static void FastaDB_SparseCache_compress_2bit_uc(SparseCache_Page *page, gint len){ register gint i; register gchar *seq = page->data; static gint index[ALPHABET_SIZE]; static gboolean index_is_filled = FALSE; if(!index_is_filled){ FastaDB_SparseCache_fill_index(index, "ACGT"); index_is_filled = TRUE; } page->data = g_new0(gchar, (len >> 2)+1); page->data_size = sizeof(gchar)*((len >> 2)+1); for(i = 0; i < len; i++) ((gchar*)page->data)[(i >> 2)] |= (index[(guchar)seq[i]] << ((i&3) << 1)); g_free(seq); page->get_func = FastaDB_SparseCache_get_func_2_BIT_UC; page->copy_func = FastaDB_SparseCache_copy_func_2_BIT_UC; return; } /**/ void FastaDB_SparseCache_compress(SparseCache_Page *page, gint len){ register gint i, mask = 0; register gchar *seq = page->data; static FastaDB_SparseCache_Mask mask_index[ALPHABET_SIZE]; static gboolean mask_is_filled = FALSE; if(!mask_is_filled){ FastaDB_SparseCache_fill_mask(mask_index); mask_is_filled = TRUE; } for(i = 0; i < len; i++) mask |= mask_index[(guchar)seq[i]]; if(mask & (1 << FastaDB_SparseCache_Mask_8_BIT)){ page->copy_func = FastaDB_SparseCache_copy_func_8_BIT; return; /* Keep current 8 bit encoding */ } if(mask & (mask-1)){ /* more than one mask bit is set */ FastaDB_SparseCache_compress_4bit(page, len); return; } if(mask & (1 << FastaDB_SparseCache_Mask_2_BIT_LC)){ FastaDB_SparseCache_compress_2bit_lc(page, len); return; } if(mask & (1 << FastaDB_SparseCache_Mask_2_BIT_UC)){ FastaDB_SparseCache_compress_2bit_uc(page, len); return; } if(mask & (1 << FastaDB_SparseCache_Mask_0_BIT_LC)){ g_free(page->data); page->data = NULL; page->data_size = 0; page->get_func = FastaDB_SparseCache_get_func_0_BIT_LC; page->copy_func = FastaDB_SparseCache_copy_func_0_BIT_LC; return; } if(mask & (1 << FastaDB_SparseCache_Mask_0_BIT_UC)){ g_free(page->data); page->data = NULL; page->data_size = 0; page->get_func = FastaDB_SparseCache_get_func_0_BIT_UC; page->copy_func = FastaDB_SparseCache_copy_func_0_BIT_UC; return; } g_error("Bad FastaDB_SparseCache compression mask"); return; } static SparseCache_Page *FastaDB_SparseCache_fill_func(gint start, gpointer user_data){ register SparseCache_Page *page = g_new(SparseCache_Page, 1); register FastaDB_Key *fdbk = user_data; register gint ch, pos = 0, len; register gint db_start, db_end; if(fdbk->source->line_length < 1) g_error("Unknown or irregular fasta line length"); db_start = fdbk->seq_offset + start + (start / (fdbk->source->line_length)); len = MIN(SparseCache_PAGE_SIZE, fdbk->length-start); db_end = db_start + len + (len / (fdbk->source->line_length)); #ifdef USE_PTHREADS pthread_mutex_lock(&fdbk->source->cf->compoundfile_mutex); #endif /* USE_PTHREADS */ fdbk->location->pos += db_start; CompoundFile_Location_seek(fdbk->location); fdbk->location->pos -= db_start; page->data = g_new(gchar, len); page->get_func = FastaDB_SparseCache_get_func_8_BIT; page->copy_func = NULL; /* FIXME: temp */ page->data_size = sizeof(gchar)*len; do { ch = CompoundFile_getc(fdbk->source->cf); g_assert(ch != EOF); g_assert(pos < len); if(!isspace(ch)) ((gchar*)page->data)[pos++] = ch; } while(pos < len); #ifdef USE_PTHREADS pthread_mutex_unlock(&fdbk->source->cf->compoundfile_mutex); #endif /* USE_PTHREADS */ FastaDB_SparseCache_compress(page, len); return page; } /* FIXME: this is really inefficient: should use unbuffered read/write * do this by adding CompoundFile_read() */ /* FIXME add suppport for compressed pages etc */ SparseCache *FastaDB_Key_get_SparseCache(FastaDB_Key *fdbk){ register SparseCache *cache = SparseCache_create(fdbk->length, FastaDB_SparseCache_fill_func, NULL, NULL, fdbk); return cache; } /**/ FastaDB_Seq **FastaDB_all(gchar *path, Alphabet *alphabet, FastaDB_Mask mask, guint *total){ register FastaDB *fdb = FastaDB_open(path, alphabet); register GPtrArray *ptra = g_ptr_array_new(); register FastaDB_Seq *fdbs, **fdbsa; register guint numseqs = 0; while((fdbs = FastaDB_next(fdb, mask))){ g_ptr_array_add(ptra, fdbs); numseqs++; } g_ptr_array_add(ptra, NULL); /* NULL delimit */ if(total) *total = numseqs; /* Record number of sequences */ fdbsa = (FastaDB_Seq**)ptra->pdata; g_ptr_array_free(ptra, FALSE); /* Leave data intact */ FastaDB_close(fdb); return fdbsa; } void FastaDB_Seq_all_destroy(FastaDB_Seq **fdbs){ register gint i; g_assert(fdbs); for(i = 0; fdbs[i]; i++) FastaDB_Seq_destroy(fdbs[i]); g_free(fdbs); return; } gint FastaDB_Seq_print(FastaDB_Seq *fdbs, FILE *fp, FastaDB_Mask mask){ register gint written = 0; if(mask & (FastaDB_Mask_ID|FastaDB_Mask_DEF|FastaDB_Mask_LEN)){ written += fprintf(fp, ">%s", (mask & FastaDB_Mask_ID) ?fdbs->seq->id:"[unknown]"); if(fdbs->seq->def && (mask & FastaDB_Mask_DEF)) written += fprintf(fp, " %s", fdbs->seq->def); if(mask & FastaDB_Mask_LEN) written += fprintf(fp, " [len:%d]", fdbs->seq->len); written += fprintf(fp, "\n"); } if(mask & FastaDB_Mask_SEQ) /* to print seq, length must be set */ written += Sequence_print_fasta_block(fdbs->seq, fp); return written; } gint FastaDB_Seq_all_print(FastaDB_Seq **fdbs, FILE *fp, FastaDB_Mask mask){ register int i, written = 0; for(i = 0; fdbs[i]; i++) written += FastaDB_Seq_print(fdbs[i], fp, mask); return written; } FastaDB_Seq *FastaDB_get_single(gchar *path, Alphabet *alphabet){ register FastaDB *fdb = FastaDB_open(path, alphabet); register FastaDB_Seq *fdbs = FastaDB_next(fdb, FastaDB_Mask_ALL); if(!fdbs) g_error("No sequences found in [%s]\n", path); FastaDB_close(fdb); return fdbs; } Alphabet_Type FastaDB_guess_type(gchar *path){ register Alphabet *alphabet = Alphabet_create(Alphabet_Type_UNKNOWN, FALSE); register FastaDB_Seq *fdbs = FastaDB_get_single(path, alphabet); register gchar *str = Sequence_get_str(fdbs->seq); register Alphabet_Type type = Alphabet_Type_guess(str); g_free(str); FastaDB_Seq_destroy(fdbs); Alphabet_destroy(alphabet); return type; } exonerate-2.4.0/src/database/fastapipe.test.c0000644000175000017500000000643111162714300016060 00000000000000/****************************************************************\ * * * Library for All-vs-All Fasta Database Comparisons * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "fastapipe.h" static void test_init_func(gpointer user_data){ register gint *count = user_data; g_message("Initiate pipeline"); (*count) = 0; return; } static void test_prep_func(gpointer user_data){ g_message("Prepare pipeline"); return; } static void test_term_func(gpointer user_data){ g_message("Terminate pipeline"); return; } static gboolean test_query_func(FastaDB_Seq *fdbs, gpointer user_data){ register gint *count = user_data; g_message("Have query [%s]", fdbs->seq->id); if((++(*count)) < 2) /* Try to use 2 queries per pipeline */ return FALSE; return TRUE; } static gboolean test_target_func(FastaDB_Seq *fdbs, gpointer user_data){ g_message("Have target [%s]", fdbs->seq->id); return TRUE; } gint Argument_main(Argument *arg){ register FastaPipe *fasta_pipe; register gchar *query_path, *target_path; register GPtrArray *query_path_list = g_ptr_array_new(), *target_path_list = g_ptr_array_new(); register Alphabet *alphabet; register FastaDB *query_fdb, *target_fdb; gint count; if(arg->argc == 3){ query_path = arg->argv[1]; target_path = arg->argv[2]; g_ptr_array_add(query_path_list, query_path); g_ptr_array_add(target_path_list, target_path); g_message("Processing test FastaPipe with:\nQ:[%s]\nT:[%s]", query_path, target_path); alphabet = Alphabet_create(Alphabet_Type_UNKNOWN, FALSE); query_fdb = FastaDB_open_list(query_path_list, alphabet); target_fdb = FastaDB_open_list(target_path_list, alphabet); fasta_pipe = FastaPipe_create(query_fdb, target_fdb, test_init_func, test_prep_func, test_term_func, test_query_func, test_target_func, FastaDB_Mask_ALL, FALSE, TRUE); Alphabet_destroy(alphabet); while(FastaPipe_process(fasta_pipe, &count)){ g_message("Processing pipeline"); } FastaPipe_destroy(fasta_pipe); FastaDB_close(query_fdb); FastaDB_close(target_fdb); } else { g_warning("Test [%s] does nothing without arguments", __FILE__); } g_ptr_array_free(query_path_list, TRUE); g_ptr_array_free(target_path_list, TRUE); return 0; } exonerate-2.4.0/src/database/fastapipe.c0000644000175000017500000001300611162714300015076 00000000000000/****************************************************************\ * * * Library for All-vs-All Fasta Database Comparisons * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "fastapipe.h" FastaPipe *FastaPipe_create(FastaDB *query_fdb, FastaDB *target_fdb, FastaPipe_Boundary_Func init_func, FastaPipe_Boundary_Func prep_func, FastaPipe_Boundary_Func term_func, FastaPipe_NextSeq_Func query_func, FastaPipe_NextSeq_Func target_func, FastaDB_Mask mask, gboolean translate_both, gboolean use_revcomp){ register FastaPipe *fasta_pipe = g_new(FastaPipe, 1); g_assert(init_func); g_assert(prep_func); g_assert(term_func); g_assert(query_func); g_assert(target_func); fasta_pipe->query_db = FastaDB_share(query_fdb); fasta_pipe->target_db = FastaDB_share(target_fdb); fasta_pipe->init_func = init_func; fasta_pipe->prep_func = prep_func; fasta_pipe->term_func = term_func; fasta_pipe->query_func = query_func; fasta_pipe->target_func = target_func; fasta_pipe->mask = mask; fasta_pipe->is_full = FALSE; if(use_revcomp){ fasta_pipe->revcomp_query = (query_fdb->alphabet->type == Alphabet_Type_DNA); fasta_pipe->revcomp_target = (((query_fdb->alphabet->type == Alphabet_Type_PROTEIN) && (target_fdb->alphabet->type == Alphabet_Type_DNA)) || translate_both); } else { fasta_pipe->revcomp_query = FALSE; fasta_pipe->revcomp_target = FALSE; } fasta_pipe->prev_query = NULL; fasta_pipe->prev_target = NULL; return fasta_pipe; } void FastaPipe_destroy(FastaPipe *fasta_pipe){ FastaDB_close(fasta_pipe->query_db); FastaDB_close(fasta_pipe->target_db); g_free(fasta_pipe); return; } static FastaDB_Seq *FastaPipe_next_seq(FastaDB *fdb, FastaDB_Mask mask, gboolean use_revcomp, FastaDB_Seq **prev_seq){ register FastaDB_Seq *fdbs, *revcomp_fdbs; if(*prev_seq){ fdbs = FastaDB_Seq_revcomp(*prev_seq); FastaDB_Seq_destroy(*prev_seq); (*prev_seq) = NULL; } else { fdbs = FastaDB_next(fdb, mask); if(use_revcomp && fdbs){ if((fdbs->seq->annotation) && (fdbs->seq->annotation->strand == Sequence_Strand_REVCOMP)){ revcomp_fdbs = FastaDB_Seq_revcomp(fdbs); FastaDB_Seq_destroy(fdbs); return revcomp_fdbs; } if((!fdbs->seq->annotation) || (fdbs->seq->annotation->strand == Sequence_Strand_REVCOMP)) (*prev_seq) = FastaDB_Seq_share(fdbs); } } return fdbs; } static gboolean FastaPipe_process_database(FastaDB *fdb, FastaPipe_NextSeq_Func next_seq_func, FastaDB_Mask mask, gboolean use_revcomp, FastaDB_Seq **prev_seq, gpointer user_data){ register FastaDB_Seq *fdbs; register gboolean stop_requested = FALSE; while((fdbs = FastaPipe_next_seq(fdb, mask, use_revcomp, prev_seq))){ stop_requested = next_seq_func(fdbs, user_data); FastaDB_Seq_destroy(fdbs); if(stop_requested) break; } return stop_requested; } /* Will return FALSE at end of database file */ gboolean FastaPipe_process(FastaPipe *fasta_pipe, gpointer user_data){ register gboolean target_stop = FALSE; do { if(!fasta_pipe->is_full){ if(FastaDB_is_finished(fasta_pipe->query_db) && (!fasta_pipe->prev_query)) break; fasta_pipe->init_func(user_data); FastaPipe_process_database(fasta_pipe->query_db, fasta_pipe->query_func, fasta_pipe->mask, fasta_pipe->revcomp_query, &fasta_pipe->prev_query, user_data); fasta_pipe->is_full = TRUE; fasta_pipe->prep_func(user_data); } target_stop = FastaPipe_process_database( fasta_pipe->target_db, fasta_pipe->target_func, fasta_pipe->mask, fasta_pipe->revcomp_target, &fasta_pipe->prev_target, user_data); if(!target_stop){ FastaDB_rewind(fasta_pipe->target_db); fasta_pipe->is_full = FALSE; fasta_pipe->term_func(user_data); } } while(!target_stop); return target_stop; } exonerate-2.4.0/src/database/dataset.test.c0000644000175000017500000000323211162714277015542 00000000000000/****************************************************************\ * * * Library for manipulation of FASTA format databases * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "fastadb.h" gint Argument_main(Argument *arg){ register FastaDB *fdb; register FastaDB_Seq *fdbs; register gchar *path; register Alphabet *alphabet; if(arg->argc == 2){ path = arg->argv[1]; alphabet = Alphabet_create(Alphabet_Type_UNKNOWN, FALSE); fdb = FastaDB_open(path, alphabet); Alphabet_destroy(alphabet); g_message("opened file [%s]", path); while((fdbs = FastaDB_next(fdb, FastaDB_Mask_ALL))){ g_message("read [%s]", fdbs->seq->id); FastaDB_Seq_print(fdbs, stdout, FastaDB_Mask_ALL); FastaDB_Seq_destroy(fdbs); } FastaDB_close(fdb); g_message("closed file [%s]", path); } return 0; } exonerate-2.4.0/src/database/dataset.c0000644000175000017500000005350511220137603014557 00000000000000/****************************************************************\ * * * Library for manipulation of exonerate dataset files * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For qsort() */ #include /* For strcmp() */ #include "dataset.h" #include "fastadb.h" #include "bitarray.h" /**/ #define DATASET_HEADER_MAGIC (('e' << 16)|('s' << 8)|('d')) #define DATASET_HEADER_VERSION 3 static Dataset_Header *Dataset_Header_create(Alphabet_Type alphabet_type, gboolean softmask_input){ register Dataset_Header *header = g_new0(Dataset_Header, 1); header->magic = DATASET_HEADER_MAGIC; header->version = DATASET_HEADER_VERSION; header->type = ((alphabet_type == Alphabet_Type_DNA)?1:0) | ((softmask_input?1:0) << 1); /* other fields are filled during seq parsing or Dataset_finalise */ return header; } static void Dataset_Header_destroy(Dataset_Header *header){ g_free(header); return; } #if 0 static void Dataset_Header_info(Dataset_Header *header){ g_message("Dataset_Header:\n" " magic [%lld]\n" " version [%lld]\n" " type [%lld]\n" " line_length [%lld]\n" /**/ " number_of_dbs [%lld]\n" " max_db_len [%lld]\n" " total_db_len [%lld]\n" /**/ " number_of_seqs [%lld]\n" " max_seq_len [%lld]\n" " total_seq_len [%lld]\n" /**/ " path_data_offset [%lld]\n" " seq_data_offset [%lld]\n" " seq_info_offset [%lld]\n" " total_file_length [%lld]\n", header->magic, header->version, header->type, header->line_length, /**/ header->number_of_dbs, header->max_db_len, header->total_db_len, /**/ header->number_of_seqs, header->max_seq_len, header->total_seq_len, /**/ header->path_data_offset, header->seq_data_offset, header->seq_info_offset, header->total_file_length); return; } #endif /* 0 */ static void Dataset_Header_write(Dataset_Header *header, FILE *fp){ /* Dataset_Header_info(header); */ /**/ BitArray_write_int(header->magic, fp); BitArray_write_int(header->version, fp); BitArray_write_int(header->type, fp); BitArray_write_int(header->line_length, fp); /**/ BitArray_write_int(header->number_of_dbs, fp); BitArray_write_int(header->max_db_len, fp); BitArray_write_int(header->total_db_len, fp); /**/ BitArray_write_int(header->number_of_seqs, fp); BitArray_write_int(header->max_seq_len, fp); BitArray_write_int(header->total_seq_len, fp); /**/ BitArray_write_int(header->path_data_offset, fp); BitArray_write_int(header->seq_data_offset, fp); BitArray_write_int(header->seq_info_offset, fp); BitArray_write_int(header->total_file_length, fp); /**/ return; } static Dataset_Header *Dataset_Header_read(FILE *fp){ register Dataset_Header *header = g_new(Dataset_Header, 1); /**/ header->magic = BitArray_read_int(fp); if(header->magic != DATASET_HEADER_MAGIC) g_error("Bad magic number in dataset file"); header->version = BitArray_read_int(fp); if(header->version != DATASET_HEADER_VERSION) g_error("Incompatable dataset file version"); header->type = BitArray_read_int(fp); header->line_length = BitArray_read_int(fp); /**/ header->number_of_dbs = BitArray_read_int(fp); header->max_db_len = BitArray_read_int(fp); header->total_db_len = BitArray_read_int(fp); /**/ header->number_of_seqs = BitArray_read_int(fp); header->max_seq_len = BitArray_read_int(fp); header->total_seq_len = BitArray_read_int(fp); /**/ header->path_data_offset = BitArray_read_int(fp); header->seq_data_offset = BitArray_read_int(fp); header->seq_info_offset = BitArray_read_int(fp); header->total_file_length = BitArray_read_int(fp); /**/ /* Dataset_Header_info(header); */ return header; } /**/ static Dataset_Sequence *Dataset_Sequence_create(FastaDB_Seq *fdbs){ register Dataset_Sequence *seq = g_new0(Dataset_Sequence, 1); seq->key = FastaDB_Seq_get_key(fdbs); seq->gcg_checksum = Sequence_checksum(fdbs->seq); seq->id = g_strdup(fdbs->seq->id); seq->def = fdbs->seq->def?g_strdup(fdbs->seq->def):NULL; seq->cache_seq = NULL; return seq; } static void Dataset_Sequence_destroy(Dataset_Sequence *seq){ if(seq->cache_seq) Sequence_destroy(seq->cache_seq); FastaDB_Key_destroy(seq->key); g_free(seq->id); if(seq->def) g_free(seq->def); g_free(seq); return; } static gsize Dataset_Sequence_memory_usage(Dataset_Sequence *ds){ return sizeof(Dataset_Sequence) + sizeof(FastaDB_Key) /* FIXME: count FastaDB_Key memory properly */ + (sizeof(gchar)*strlen(ds->id)) + (sizeof(gchar)*(ds->def?strlen(ds->id):0)) + (ds->cache_seq?Sequence_memory_usage(ds->cache_seq):0); } static int Dataset_Sequence_compare_by_id_uniq(const void *a, const void *b){ register Dataset_Sequence **seq_a = (Dataset_Sequence**)a, **seq_b = (Dataset_Sequence**)b; register gint retval = strcmp((*seq_a)->id, (*seq_b)->id); if(!retval) g_error("Dataset has duplicate sequence id: [%s]", (*seq_a)->id); return retval; } static int Dataset_Sequence_compare_by_id(const void *a, const void *b){ register Dataset_Sequence **seq_a = (Dataset_Sequence**)a, **seq_b = (Dataset_Sequence**)b; register gint retval = strcmp((*seq_a)->id, (*seq_b)->id); return retval; } /**/ static Dataset_Width *Dataset_Width_create(Dataset_Header *header){ register Dataset_Width *width = g_new0(Dataset_Width, 1); register guint64 n; for(n = header->number_of_dbs; n; n >>= 1) width->num_db_width++; for(n = header->max_db_len; n; n >>= 1) width->max_db_len_width++; for(n = header->max_seq_len; n; n >>= 1) width->max_seq_len_width++; n = width->num_db_width + width->max_db_len_width + width->max_seq_len_width + 14; /* for gcg checksum */ width->seq_data_item_size = (n >> 3) + ((n % 8)?1:0); return width; } static void Dataset_Width_destroy(Dataset_Width *width){ g_free(width); return; } /**/ Dataset *Dataset_create(GPtrArray *path_list, Alphabet_Type alphabet_type, gboolean softmask_input){ register Dataset *dataset = g_new(Dataset, 1); register FastaDB_Seq *fdbs; register Dataset_Sequence *ds; register gint i; register gsize path_data_size, seq_info_size, seq_data_size; register gchar *path; dataset->ref_count = 1; if(alphabet_type == Alphabet_Type_UNKNOWN){ alphabet_type = FastaDB_guess_type((gchar*)path_list->pdata[0]); g_message("Guessed alphabet type as [%s]", Alphabet_Type_get_name(alphabet_type)); } dataset->alphabet = Alphabet_create(alphabet_type, softmask_input); dataset->header = Dataset_Header_create(alphabet_type, softmask_input); dataset->fdb = FastaDB_open_list(path_list, dataset->alphabet); /**/ dataset->seq_list = g_ptr_array_new(); while((fdbs = FastaDB_next(dataset->fdb, FastaDB_Mask_ALL))){ ds = Dataset_Sequence_create(fdbs); g_ptr_array_add(dataset->seq_list, ds); FastaDB_Seq_destroy(fdbs); dataset->header->number_of_seqs++; if(dataset->header->max_seq_len < ds->key->length) dataset->header->max_seq_len = ds->key->length; dataset->header->total_seq_len += ds->key->length; } dataset->header->line_length = dataset->fdb->line_length; if(dataset->header->line_length < 1) g_error("Input is not a regular FASTA file, use fastareformat"); dataset->header->number_of_dbs = path_list->len; dataset->header->max_db_len = CompoundFile_get_max_element_length(dataset->fdb->cf); dataset->header->total_db_len = CompoundFile_get_length(dataset->fdb->cf); qsort(dataset->seq_list->pdata, dataset->seq_list->len, sizeof(gpointer), Dataset_Sequence_compare_by_id_uniq); dataset->width = Dataset_Width_create(dataset->header); /**/ path_data_size = 0; for(i = 0; i < path_list->len; i++){ path = path_list->pdata[i]; path_data_size += (strlen(path) + 1); } seq_data_size = 0; for(i = 0; i < dataset->seq_list->len; i++){ ds = dataset->seq_list->pdata[i]; ds->pos = i; seq_data_size += (strlen(ds->id) + 1); if(ds->def) seq_data_size += (strlen(ds->def) + 1); } seq_info_size = dataset->seq_list->len * dataset->width->seq_data_item_size; /**/ dataset->header->path_data_offset = sizeof(Dataset_Header); dataset->header->seq_data_offset = dataset->header->path_data_offset + path_data_size; dataset->header->seq_info_offset = dataset->header->seq_data_offset + seq_data_size; dataset->header->total_file_length = dataset->header->seq_info_offset + seq_info_size; #ifdef USE_PTHREADS pthread_mutex_init(&dataset->dataset_mutex, NULL); #endif /* USE_PTHREADS */ return dataset; } Dataset *Dataset_share(Dataset *dataset){ dataset->ref_count++; return dataset; } gsize Dataset_memory_usage(Dataset *dataset){ register gint i; register gsize dataset_sequence_memory = 0; register Dataset_Sequence *ds; for(i = 0; i < dataset->seq_list->len; i++){ ds = dataset->seq_list->pdata[i]; dataset_sequence_memory += Dataset_Sequence_memory_usage(ds); } return sizeof(Dataset) + sizeof(Alphabet) + sizeof(Dataset_Header) + sizeof(Dataset_Width) + (sizeof(gpointer)*dataset->seq_list->len) + FastaDB_memory_usage(dataset->fdb) + dataset_sequence_memory; } void Dataset_info(Dataset *dataset){ g_print("Sequence Dataset:\n" "----------------\n" " version: %d\n" " type: %s, %s\n" " total size: %d Mb\n" " number of sequences: %lld\n" " longest sequence: %lld\n\n", (gint)dataset->header->version, (dataset->header->type & 1)?"DNA":"PROTEIN", (dataset->header->type & 2)?"masked":"unmasked", (gint)(dataset->header->total_db_len >> 20), dataset->header->number_of_seqs, dataset->header->max_seq_len); return; } void Dataset_destroy(Dataset *dataset){ register gint i; register Dataset_Sequence *seq; if(--dataset->ref_count) return; for(i = 0; i < dataset->seq_list->len; i++){ seq = dataset->seq_list->pdata[i]; Dataset_Sequence_destroy(seq); } FastaDB_close(dataset->fdb); g_ptr_array_free(dataset->seq_list, TRUE); Dataset_Header_destroy(dataset->header); if(dataset->width) Dataset_Width_destroy(dataset->width); Alphabet_destroy(dataset->alphabet); #ifdef USE_PTHREADS pthread_mutex_destroy(&dataset->dataset_mutex); #endif /* USE_PTHREADS */ g_free(dataset); return; } gboolean Dataset_check_filetype(gchar *path){ register FILE *fp = fopen(path, "r"); register guint64 magic; if(!fp) g_error("Could not open file [%s]", path); magic = BitArray_read_int(fp); fclose(fp); return (magic == DATASET_HEADER_MAGIC); } static void Dataset_write_path_data(Dataset *dataset, FILE *fp){ register gint i; register CompoundFile_Element *cfe; for(i = 0; i < dataset->fdb->cf->element_list->len; i++){ cfe = dataset->fdb->cf->element_list->pdata[i]; fprintf(fp, "%s\n", cfe->path); } return; } static void Dataset_read_path_data(Dataset *dataset, FILE *fp){ register gint i; gchar buf[1024]; register GPtrArray *path_list = g_ptr_array_new(); register gchar *path; for(i = 0; i < dataset->header->number_of_dbs; i++){ if(!fgets(buf, 1024, fp)) g_error("Problem parsing file data"); path = g_strndup(buf, strlen(buf)-1); g_ptr_array_add(path_list, path); } dataset->fdb = FastaDB_open_list(path_list, dataset->alphabet); dataset->fdb->line_length = dataset->header->line_length; for(i = 0; i < path_list->len; i++){ path = path_list->pdata[i]; g_free(path); } g_ptr_array_free(path_list, TRUE); return; } static void Dataset_write_seq_data(Dataset *dataset, FILE *fp){ register gint i; register Dataset_Sequence *ds; for(i = 0; i < dataset->seq_list->len; i++){ ds = dataset->seq_list->pdata[i]; fprintf(fp, "%s%s%s\n", ds->id, ds->def?" ":"", ds->def?ds->def:""); } return; } static void Dataset_read_seq_data(Dataset *dataset, FILE *fp){ register gint64 i, j = 0, ipos, dpos; register Dataset_Sequence *ds; register guint64 len = dataset->header->seq_info_offset - dataset->header->seq_data_offset; register gchar *buf = g_new(gchar, len+1); if(!fread(buf, sizeof(gchar), len, fp)) g_error("Problem reading seq data"); for(i = 0; i < dataset->header->number_of_seqs; i++){ ds = g_new0(Dataset_Sequence, 1); g_ptr_array_add(dataset->seq_list, ds); ipos = j; dpos = -1; while(buf[j]){ if((buf[j] == ' ') && (dpos == -1)){ buf[j] = '\0'; dpos = j+1; } if(buf[j] == '\n'){ buf[j] = '\0'; ds->id = g_strdup(buf+ipos); if(dpos != -1) ds->def = g_strdup(buf+dpos); j++; break; } j++; } g_assert(ds->id); } g_free(buf); return; } static void Dataset_write_seq_info(Dataset *dataset, FILE *fp){ register gint i; register Dataset_Sequence *sequence; register BitArray *ba = BitArray_create(); for(i = 0; i < dataset->seq_list->len; i++){ sequence = dataset->seq_list->pdata[i]; /**/ BitArray_append(ba, sequence->key->location->element_id, dataset->width->num_db_width); BitArray_append(ba, sequence->key->location->pos, dataset->width->max_db_len_width); BitArray_append(ba, sequence->key->length, dataset->width->max_seq_len_width); BitArray_append(ba, sequence->gcg_checksum, 14); BitArray_write(ba, fp); BitArray_empty(ba); } BitArray_destroy(ba); return; } static void Dataset_read_seq_info(Dataset *dataset, FILE *fp){ register gint i, start, element_id, length, seq_offset; register Dataset_Sequence *ds; register BitArray *ba; register CompoundFile_Pos offset; register Sequence_Strand strand = (dataset->alphabet->type == Alphabet_Type_DNA) ? Sequence_Strand_FORWARD : Sequence_Strand_UNKNOWN; register CompoundFile_Location *location; for(i = 0; i < dataset->seq_list->len; i++){ ds = dataset->seq_list->pdata[i]; ba = BitArray_read(fp, dataset->width->seq_data_item_size); start = 0; element_id = BitArray_get(ba, start, dataset->width->num_db_width); start += dataset->width->num_db_width; offset = BitArray_get(ba, start, dataset->width->max_db_len_width); start += dataset->width->max_db_len_width; length = BitArray_get(ba, start, dataset->width->max_seq_len_width); start += dataset->width->max_seq_len_width; location = CompoundFile_Location_create(dataset->fdb->cf, offset, element_id); seq_offset = 1 + strlen(ds->id) + (ds->def?strlen(ds->def)+1:0); ds->key = FastaDB_Key_create(dataset->fdb, location, strand, seq_offset, length); CompoundFile_Location_destroy(location); ds->gcg_checksum = BitArray_get(ba, start, 14); ds->pos = i; BitArray_destroy(ba); } return; } void Dataset_write(Dataset *dataset, gchar *path){ register FILE *fp = fopen(path, "r"); if(fp) g_error("Output file [%s] already exists", path); fp = fopen(path, "w"); Dataset_Header_write(dataset->header, fp); Dataset_write_path_data(dataset, fp); Dataset_write_seq_data(dataset, fp); Dataset_write_seq_info(dataset, fp); fclose(fp); return; } Dataset *Dataset_read(gchar *path){ register Dataset *dataset = g_new(Dataset, 1); register FILE *fp = fopen(path, "r"); if(!fp) g_error("Could not open esd file [%s]", path); dataset->ref_count = 1; dataset->header = Dataset_Header_read(fp); dataset->alphabet = Alphabet_create( ((dataset->header->type&1) ?Alphabet_Type_DNA :Alphabet_Type_PROTEIN), (dataset->header->type&2)); dataset->width = Dataset_Width_create(dataset->header); dataset->seq_list = g_ptr_array_new(); Dataset_read_path_data(dataset, fp); Dataset_read_seq_data(dataset, fp); Dataset_read_seq_info(dataset, fp); #ifdef USE_PTHREADS pthread_mutex_init(&dataset->dataset_mutex, NULL); #endif /* USE_PTHREADS */ fclose(fp); return dataset; } gint Dataset_lookup_id(Dataset *dataset, gchar *id){ register Dataset_Sequence *result, **result_ptr; Dataset_Sequence key_seq, *key_ptr; key_seq.id = id; key_ptr = &key_seq; result_ptr = bsearch(&key_ptr, dataset->seq_list->pdata, dataset->seq_list->len, sizeof(gpointer), Dataset_Sequence_compare_by_id); if(!result_ptr) return -1; result = *result_ptr; return result->pos; } Sequence *Dataset_get_sequence(Dataset *dataset, gint dataset_pos){ register Sequence *seq = NULL; register Dataset_Sequence *ds; register gchar *def; register SparseCache *cache; register Sequence_Strand strand; g_assert(dataset_pos >= 0); g_assert(dataset_pos < dataset->seq_list->len); ds = dataset->seq_list->pdata[dataset_pos]; #ifdef USE_PTHREADS pthread_mutex_lock(&dataset->dataset_mutex); #endif /* USE_PTHREADS */ if(ds->cache_seq) seq = Sequence_share(ds->cache_seq); #ifdef USE_PTHREADS pthread_mutex_unlock(&dataset->dataset_mutex); #endif /* USE_PTHREADS */ if(seq) return seq; def = FastaDB_Key_get_def(ds->key); strand = (dataset->alphabet->type == Alphabet_Type_DNA) ? Sequence_Strand_FORWARD : Sequence_Strand_UNKNOWN; cache = FastaDB_Key_get_SparseCache(ds->key); seq = Sequence_create_extmem(ds->id, def, ds->key->length, strand, dataset->alphabet, cache); g_free(def); SparseCache_destroy(cache); g_assert(Sequence_checksum(seq) == ds->gcg_checksum); /* FIXME: slow */ return seq; } /* FIXME: should keep a single compound file for whole dataset, * and use that for making keys, to prevent more than * one file being opened simultaneously. * ?? will not work with cf sorting ... cf needs refactoring */ static int Dataset_Sequence_preload_compare(const void *a, const void *b){ register Dataset_Sequence **ds_a = (Dataset_Sequence**)a, **ds_b = (Dataset_Sequence**)b; g_assert(*ds_a); g_assert(*ds_b); g_assert((*ds_a)->key); g_assert((*ds_b)->key); if((*ds_a)->key->location->element_id == (*ds_b)->key->location->element_id) return (*ds_a)->key->location->pos - (*ds_b)->key->location->pos; return (*ds_a)->key->location->element_id - (*ds_b)->key->location->element_id; } void Dataset_preload_seqs(Dataset *dataset){ register gint i; register Dataset_Sequence *ds; register Dataset_Sequence **ds_list = g_new(Dataset_Sequence*, dataset->header->number_of_seqs); for(i = 0; i < dataset->header->number_of_seqs; i++){ ds = dataset->seq_list->pdata[i]; ds->cache_seq = Dataset_get_sequence(dataset, i); ds_list[i] = ds; } qsort(ds_list, dataset->header->number_of_seqs, sizeof(Dataset_Sequence*), Dataset_Sequence_preload_compare); /* Preload the sequence in the order they appear on disk */ for(i = 0; i < dataset->header->number_of_seqs; i++){ ds = ds_list[i]; Sequence_preload_extmem(ds->cache_seq); } g_free(ds_list); return; } /**/ exonerate-2.4.0/src/database/index.test.c0000644000175000017500000000675511162714300015224 00000000000000/****************************************************************\ * * * Library for manipulation of FASTA format databases * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "index.h" gint Argument_main(Argument *arg){ register Index *index; register Alphabet *alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE); register Sequence *query = Sequence_create("query", NULL, /* "ATGCAGCCCCGG", */ /* "ACGTAGAGGCCGAGGATGCCTCCCATTCTCAGCT", */ /* "ATGCAGCCCCGGGTACTCCTTGTTGTTGCCCTCCTGGCGCTCCTGGCCTCTGCCC", */ "ATGCANCCCCNGGTACTCCTTGTTNTTNCCCTCNTGGCGNTCCTGGNCTCTNCCC", /* "CCGGNAAGCTCANCTTGGACCACCGACTCTCGANTGNNTCGCCGCGGGAGCCGGNTGGAN" "AACCTGAGCGGGACTGGNAGAAGGAGCAGAGGGAGGCAGCACCCGGCGTGACGGNAGTGT" "GTGGGGCACTCAGGCCTTCCGCAGTGTCATCTGCCACACGGAAGGCACGGCCACGGGCAG" "GGGGGTCTATGATCTTCTGCATGCCCAGCTGGCATGGCCCCACGTAGAGTGGNNTGGCGT" "CTCGGTGCTGGTCAGCGACACGTTGTCCTGGCTGGGCAGGTCCAGCTCCCGGAGGACCTG" "GGGCTTCAGCTTCCCGTAGCGCTGGCTGCAGTGACGGATGCTCTTGCGCTGCCATTTCTG" "GGTGCTGTCACTGTCCTTGCTCACTCCAAACCAGTTCGGCGGTCCCCCTGCGGATGGTCT" "GTGTTGATGGACGTTTGGGCTTTGCAGCACCGGCCGCCGAGTTCATGGTNGGGTNAAGAG" "ATTTGGGTTTTTTCN", */ 0, Sequence_Strand_FORWARD, alphabet); register Match *match; register HSP_Param *hsp_param; register ArgumentSet *as = ArgumentSet_create("Input options"); register GPtrArray *index_hsp_set_list; register Index_HSPset *index_hsp_set; register gint i; gchar *path; ArgumentSet_add_option(as, 'i', "index", "path", "exonerate sequence index file (.esi)", "none", Argument_parse_string, &path); Argument_absorb_ArgumentSet(arg, as); HSPset_ArgumentSet_create(arg); Match_ArgumentSet_create(arg); Argument_process(arg, "index.test", NULL, NULL); if(!strcmp(path, "none")){ /* To ensure 'make check' does not fail */ g_warning("No path set for test index file"); return 0; } match = Match_find(Match_Type_DNA2DNA); hsp_param = HSP_Param_create(match, FALSE); /**/ index = Index_open(path); index_hsp_set_list = Index_get_HSPsets(index, hsp_param, query, FALSE); if(index_hsp_set_list){ for(i = 0; i < index_hsp_set_list->len; i++){ index_hsp_set = index_hsp_set_list->pdata[i]; HSPset_print(index_hsp_set->hsp_set); Index_HSPset_destroy(index_hsp_set); } g_ptr_array_free(index_hsp_set_list, TRUE); } Index_destroy(index); /**/ HSP_Param_destroy(hsp_param); Alphabet_destroy(alphabet); Sequence_destroy(query); return 0; } exonerate-2.4.0/src/database/index.c0000644000175000017500000024025411220123251014232 00000000000000/****************************************************************\ * * * Library for manipulation of exonerate index files * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include /* For pow() */ #include /* For strlen() */ #include /* For qsort() */ #include "index.h" #include "submat.h" #include "wordhood.h" #include "pqueue.h" #include "rangetree.h" #include "noitree.h" #define INDEX_HEADER_MAGIC (('e' << 16)|('s' << 8)|('i')) #define INDEX_HEADER_VERSION 3 static off_t Index_ftell(FILE *fp){ return ftello(fp); } /* FIXME: need alternatives in absence of ftello() * lseek(fileno(fp), 0, SEEK_CUR); ?? */ static off_t Index_fseek(FILE *fp, off_t offset, int whence){ return fseeko(fp, offset, whence); } /* FIXME: need alternatives in absence of fseeko() * lseek(fileno(fp), offset, whence); ?? */ static guint64 Index_Strand_memory_usage(Index_Strand *strand, VFSM *vfsm){ g_assert(strand); return sizeof(Index_Strand) + (sizeof(gint)*vfsm->lrw) /* for word_table */ + (sizeof(Index_Word)*strand->header.word_list_length) /* word_list */ + (strand->index_cache?BitArray_memory_usage(strand->index_cache):0); } guint64 Index_memory_usage(Index *index){ g_assert(index); return sizeof(Index) + sizeof(Index_Header) + (sizeof(gchar)*index->header->dataset_path_len) + Dataset_memory_usage(index->dataset) + sizeof(Index_Width) + sizeof(VFSM) + (index->forward?Index_Strand_memory_usage(index->forward, index->vfsm):0) + (index->revcomp?Index_Strand_memory_usage(index->forward, index->vfsm):0); } static gsize Index_Word_get_address_list_space(Index_Word *index_word, Index *index){ g_assert(index_word); if(index_word->freq_count) return BitArray_get_size(index_word->freq_count *(index->width->number_of_seqs_width +index->dataset->width->max_seq_len_width)); return 0; } /**/ static Index_Header *Index_Header_create(gint dataset_path_len, gboolean is_translated, gint word_length, gint word_jump, gint word_ambiguity, gint saturate_threshold){ register Index_Header *index_header = g_new0(Index_Header, 1); index_header->magic = INDEX_HEADER_MAGIC; index_header->version = INDEX_HEADER_VERSION; index_header->type = is_translated?1:0; index_header->dataset_path_len = dataset_path_len; /**/ index_header->word_length = word_length; index_header->word_jump = word_jump; index_header->word_ambiguity = word_ambiguity; index_header->saturate_threshold = saturate_threshold; return index_header; } static void Index_Header_destroy(Index_Header *index_header){ g_free(index_header); return; } #if 0 static void Index_Header_info(Index_Header *index_header){ g_message("Index_Header:\n" " magic [%lld]\n" " version [%lld]\n" " type [%lld]\n" " dataset_path_len [%lld]\n" "\n" " word_length [%lld]\n" " word_jump [%lld]\n" " word_ambiguity [%lld]\n" " saturate_threshold [%lld]\n", index_header->magic, index_header->version, index_header->type, index_header->dataset_path_len, index_header->word_length, index_header->word_jump, index_header->word_ambiguity, index_header->saturate_threshold); return; } #endif /* 0 */ static void Index_Header_write(Index_Header *index_header, FILE *fp){ /* Index_Header_info(index_header); */ BitArray_write_int(index_header->magic, fp); BitArray_write_int(index_header->version, fp); BitArray_write_int(index_header->type, fp); BitArray_write_int(index_header->dataset_path_len, fp); /**/ BitArray_write_int(index_header->word_length, fp); BitArray_write_int(index_header->word_jump, fp); BitArray_write_int(index_header->word_ambiguity, fp); BitArray_write_int(index_header->saturate_threshold, fp); /**/ return; } static Index_Header *Index_Header_read(FILE *fp){ register Index_Header *index_header = g_new(Index_Header, 1); index_header->magic = BitArray_read_int(fp); if(index_header->magic != INDEX_HEADER_MAGIC) g_error("Bad magic number in index file"); index_header->version = BitArray_read_int(fp); if(index_header->version != INDEX_HEADER_VERSION) g_error("Incompatible index file version"); index_header->type = BitArray_read_int(fp); index_header->dataset_path_len = BitArray_read_int(fp); /**/ index_header->word_length = BitArray_read_int(fp); index_header->word_jump = BitArray_read_int(fp); index_header->word_ambiguity = BitArray_read_int(fp); index_header->saturate_threshold = BitArray_read_int(fp); /* Index_Header_info(index_header); */ return index_header; } /**/ static Index_Width *Index_Width_create(VFSM *vfsm, Dataset *dataset){ register Index_Width *index_width = g_new0(Index_Width, 1); register guint64 n; for(n = vfsm->lrw; n; n >>= 1) index_width->max_word_width++; for(n = dataset->header->number_of_seqs; n; n >>= 1) index_width->number_of_seqs_width++; return index_width; } #if 0 static void Index_Width_info(Index_Width *index_width){ g_message("Index_Width:\n" " max_word_width [%d]\n" " number_of_seqs_width [%d]\n", index_width->max_word_width, index_width->number_of_seqs_width); return; } #endif /* 0 */ static void Index_Width_destroy(Index_Width *index_width){ g_free(index_width); return; } /**/ typedef void (*Index_WordVisit_Func)(Index *index, Index_Strand *strand, gint seq_id, gint seq_pos, gint leaf_id, gpointer user_data); static void Index_visit_seq_words_single(Index *index, Index_Strand *index_strand, gint seq_id, Sequence *seq, Index_WordVisit_Func iwvf, gint frame, gpointer user_data){ register gint i, jump_ctr = 0, pos; register gchar *str = Sequence_get_str(seq); register VFSM_Int state = 0, leaf; for(i = 0; str[i]; i++){ if(!index->vfsm->index[(guchar)str[i]]){ state = 0; continue; } state = VFSM_change_state_M(index->vfsm, state, (guchar)str[i]); if(jump_ctr--) continue; jump_ctr = index->header->word_jump - 1; if(VFSM_state_is_leaf(index->vfsm, state)){ leaf = VFSM_state2leaf(index->vfsm, state); pos = i-(index->vfsm->depth-1); if(frame) pos = (pos * 3) + frame - 1; iwvf(index, index_strand, seq_id, pos, leaf, user_data); } } g_free(str); return; } /* FIXME: needs to work with softmasked sequences */ #ifndef Swap #define Swap(x,y,temp) ((temp)=(x),(x)=(y),(y)=(temp)) #endif /* Swap */ static void Index_visit_seq_words_ambig(Index *index, Index_Strand *index_strand, gint seq_id, Sequence *seq, Index_WordVisit_Func iwvf, gint frame, gpointer user_data){ register gint i, j, k, jump_ctr = 0, pos; register gchar *ambig, *str = Sequence_get_str(seq); register VFSM_Int state, next_state, leaf; register VFSM_Int *temp_state_list, *curr_state_list = g_new0(VFSM_Int, index->header->word_ambiguity), *next_state_list = g_new0(VFSM_Int, index->header->word_ambiguity); register gint curr_state_list_len = 1, next_state_list_len = 0; for(i = 0; str[i]; i++){ if(jump_ctr--) continue; jump_ctr = index->header->word_jump - 1; if((!(ambig = Alphabet_nt2ambig(str[i]))) || ((strlen(ambig) * curr_state_list_len) > index->header->word_ambiguity)){ next_state_list_len = 0; /* Clear next state list */ curr_state_list_len = 1; /* Set single null state */ curr_state_list[0] = 0; continue; } for(j = 0; j < curr_state_list_len; j++){ state = curr_state_list[j]; for(k = 0; ambig[k]; k++){ g_assert(index->vfsm->index[(guchar)ambig[k]]); next_state = VFSM_change_state_M(index->vfsm, state, (guchar)ambig[k]); if(next_state_list_len && ((next_state == next_state_list[0]) || (next_state == next_state_list[next_state_list_len-1]))) break; next_state_list[next_state_list_len++] = next_state; if(VFSM_state_is_leaf(index->vfsm, next_state)){ leaf = VFSM_state2leaf(index->vfsm, next_state); pos = i-(index->vfsm->depth-1); if(frame) pos = (pos * 3) + frame - 1; iwvf(index, index_strand, seq_id, pos, leaf, user_data); } } } curr_state_list_len = next_state_list_len; next_state_list_len = 0; Swap(next_state_list, curr_state_list, temp_state_list); } g_free(str); g_free(curr_state_list); g_free(next_state_list); return; } /* FIXME: needs to work with softmasked sequences */ /* FIXME: optimisation: remove strlen() call */ static void Index_visit_seq_words(Index *index, Index_Strand *index_strand, gint seq_id, Sequence *seq, Index_WordVisit_Func iwvf, gint frame, gpointer user_data){ if(index->header->word_ambiguity > 1) Index_visit_seq_words_ambig(index, index_strand, seq_id, seq, iwvf, frame, user_data); else Index_visit_seq_words_single(index, index_strand, seq_id, seq, iwvf, frame, user_data); return; } /* FIXME: needs to work with softmasked sequences */ static void Index_visit_words(Index *index, Index_Strand *index_strand, Index_WordVisit_Func iwvf, gboolean is_forward, gpointer user_data){ register gint i, j; register Dataset_Sequence *dataset_seq; register Sequence *ds_seq, *seq, *rc_seq, *aa_seq; register Translate *translate = Translate_create(FALSE); /* FIXME: move Translate_create() outside of this function */ for(i = 0; i < index->dataset->seq_list->len; i++){ dataset_seq = index->dataset->seq_list->pdata[i]; ds_seq = Dataset_get_sequence(index->dataset, i); seq = Sequence_mask(ds_seq); Sequence_destroy(ds_seq); if(!is_forward){ /* revcomp */ rc_seq = Sequence_revcomp(seq); Sequence_destroy(seq); seq = rc_seq; } if(index->header->type & 1){ /* is_translated */ for(j = 0; j < 3; j++){ aa_seq = Sequence_translate(seq, translate, j+1); Index_visit_seq_words(index, index_strand, i, aa_seq, iwvf, j+1, user_data); Sequence_destroy(aa_seq); } } else { Index_visit_seq_words(index, index_strand, i, seq, iwvf, 0, user_data); } Sequence_destroy(seq); } Translate_destroy(translate); return; } /**/ static void Index_freq_count_visit(Index *index, Index_Strand *index_strand, gint seq_id, gint seq_pos, gint leaf_id, gpointer user_data){ g_assert(leaf_id < index->vfsm->lrw); index_strand->word_table[leaf_id]++; return; } static void Index_freq_count(Index *index, Index_Strand *index_strand, gboolean is_forward){ Index_visit_words(index, index_strand, Index_freq_count_visit, is_forward, NULL); return; } /**/ static gint Index_get_expect(Index *index, Index_Strand *index_strand){ register gdouble expect = 1.0 / pow(index->vfsm->alphabet_size, index->header->word_length); register gint i; register gint64 observed_words = 0; for(i = 0; i < index->vfsm->lrw; i++) observed_words += index_strand->word_table[i]; return (expect * observed_words) + index->header->saturate_threshold; } static void Index_desaturate(Index *index, Index_Strand *index_strand){ register VFSM_Int i; register gint64 removed_words = 0, removed_instances = 0; register gint target_expect = Index_get_expect(index, index_strand); if(!index->header->saturate_threshold) return; g_message("Desaturating (expecting [%d] target words)", target_expect); for(i = 0; i < index->vfsm->lrw; i++) if(index_strand->word_table[i] >= target_expect){ removed_words++; removed_instances += index_strand->word_table[i]; index_strand->word_table[i] = -1; } g_message("Removed [%" CUSTOM_GUINT64_FORMAT "] words, [%" CUSTOM_GUINT64_FORMAT "] instances", removed_words, removed_instances); return; } /* state at function exit: * count for present words * zero for absent words * -1 for desaturated words */ static void Index_survey_word_list(Index *index, Index_Strand *index_strand){ register gint i, pos = 0; register Index_Word *word; for(i = 0; i < index->vfsm->lrw; i++){ if(index_strand->word_table[i] > 0) index_strand->header.word_list_length++; } index_strand->word_list = g_new(Index_Word, index_strand->header.word_list_length); for(i = 0; i < index->vfsm->lrw; i++){ if(index_strand->word_table[i] > 0){ word = &index_strand->word_list[pos]; word->freq_count = index_strand->word_table[i]; word->index_offset = 0; if(index_strand->header.max_index_length < word->freq_count) index_strand->header.max_index_length = word->freq_count; index_strand->word_table[i] = pos++; } else { if(index_strand->word_table[i] == 0) index_strand->word_table[i] = -2; } } return; } /* state at function exit: * word_list offset for present * -1 for desaturated words * -2 for absent words */ gboolean Index_check_filetype(gchar *path){ register FILE *fp = fopen(path, "r"); register guint64 magic; if(!fp) g_error("Could not open file [%s]", path); magic = BitArray_read_int(fp); fclose(fp); return (magic == INDEX_HEADER_MAGIC); } /**/ typedef struct { Index_Address *address_list; gint found; } Index_AddressList; static Index_AddressList *Index_AddressList_create(gint freq_count){ register Index_AddressList *address_list = g_new(Index_AddressList, 1); address_list->found = 0; address_list->address_list = g_new(Index_Address, freq_count); return address_list; } static void Index_AddressList_destroy(Index_AddressList *address_list){ g_free(address_list->address_list); g_free(address_list); return; } static void Index_AddressList_append(Index_AddressList *address_list, gint seq_id, gint position){ register Index_Address *address = &address_list->address_list[address_list->found]; address->sequence_id = seq_id; address->position = position; address_list->found++; return; } static void Index_AddressList_write(Index_AddressList *address_list, Index *index){ register gint i; register BitArray *ba = BitArray_create(); register Index_Address *address; for(i = 0; i < address_list->found; i++){ address = &address_list->address_list[i]; BitArray_append(ba, address->sequence_id, index->width->number_of_seqs_width); BitArray_append(ba, address->position, index->dataset->width->max_seq_len_width); } BitArray_write(ba, index->fp); BitArray_destroy(ba); return; } /* FIXME: optimisation: clear and reuse BitArray */ static int Index_Address_compare(const void *a, const void *b){ register Index_Address *addr_a = (Index_Address*)a, *addr_b = (Index_Address*)b; if(addr_a->sequence_id == addr_b->sequence_id) return addr_a->position - addr_b->position; return addr_a->sequence_id - addr_b->sequence_id; } static void Index_AddressList_sort(Index_AddressList *address_list){ qsort(address_list->address_list, address_list->found, sizeof(Index_Address), Index_Address_compare); return; } /* FIXME: optimisation: avoid this sort, by traversing all frames together, * or using more efficient 3-way merge */ /**/ static GArray *Index_find_pass_boundaries(Index *index, Index_Strand *index_strand, gsize available_space){ register GArray *pass_boundary_list = g_array_new(FALSE, FALSE, sizeof(gint)); register gsize usage = 0; gint i; register Index_Word *index_word; for(i = 0; i < index_strand->header.word_list_length; i++){ index_word = &index_strand->word_list[i]; usage += sizeof(gpointer) + sizeof(Index_AddressList) + (sizeof(Index_Address)*index_word->freq_count); if(usage >= available_space){ g_array_append_val(pass_boundary_list, i); usage = 0; } } g_array_append_val(pass_boundary_list, index_strand->header.word_list_length); return pass_boundary_list; } /* * 0123-4567-89 * 1234 5678 90 */ /**/ typedef struct { gint first_word; gint interval_len; Index_AddressList **address_list_array; } Index_AddressData; static Index_AddressData *Index_AddressData_create(Index *index, Index_Strand *index_strand, gint first_word, gint interval_len){ register Index_AddressData *address_data = g_new(Index_AddressData, 1); register gint i; register Index_Word *index_word; address_data->first_word = first_word; address_data->interval_len = interval_len; address_data->address_list_array = g_new(Index_AddressList*, interval_len); for(i = 0; i < interval_len; i++){ index_word = &index_strand->word_list[first_word+i]; address_data->address_list_array[i] = Index_AddressList_create(index_word->freq_count); } return address_data; } /* FIXME: optimisation: use single alloc for all address_lists */ static void Index_AddressData_destroy(Index_AddressData *address_data){ register gint i; for(i = 0; i < address_data->interval_len; i++) Index_AddressList_destroy(address_data->address_list_array[i]); g_free(address_data->address_list_array); g_free(address_data); return; } static void Index_AddressData_write(Index_AddressData *address_data, Index *index, Index_Strand *index_strand){ register gint i, word_id; register Index_Word *index_word; register Index_AddressList *address_list; for(i = 0; i < address_data->interval_len; i++){ word_id = address_data->first_word+i; index_word = &index_strand->word_list[word_id]; address_list = address_data->address_list_array[i]; g_assert(index_word->freq_count == address_list->found); if(index->header->type & 1) /* sort address list when is_translated */ Index_AddressList_sort(address_list); Index_AddressList_write(address_list, index); } return; } /**/ static void Index_report_words_visit(Index *index, Index_Strand *index_strand, gint seq_id, gint seq_pos, gint leaf_id, gpointer user_data){ register gint word_id = index_strand->word_table[leaf_id]; register Index_Word *index_word; register Index_AddressData *address_data = user_data; register Index_AddressList *address_list; /* Skip word which are outside the interval for this pass */ if((word_id < address_data->first_word) || (word_id >= (address_data->first_word+address_data->interval_len))) return; index_word = &index_strand->word_list[word_id]; address_list = address_data->address_list_array[word_id-address_data->first_word]; Index_AddressList_append(address_list, seq_id, seq_pos); return; } /* FIXME: optimisation : replace allocation with RecycleBins */ static void Index_report_word_list(Index *index, Index_Strand *index_strand, gboolean is_forward, gint memory_limit){ register Index_AddressList **address_list_array = g_new0(Index_AddressList*, index_strand->header.word_list_length); register guint64 available_memory = ((guint64)memory_limit << 20); /* Mb */ register guint64 used_memory = Index_memory_usage(index) + Index_Strand_memory_usage(index_strand, index->vfsm); register GArray *pass_boundary_list; register gint i, curr, prev = 0; register Index_AddressData *address_data; /**/ if(used_memory > available_memory) g_error("Memory limit (%d Mb) already exceeded usage:[%d Mb]", memory_limit, (gint)(used_memory >> 20)); g_message("Currently using [%d Mb] of [%d Mb]", (gint)(used_memory >> 20), memory_limit); /* Record strand offset */ index_strand->strand_offset = Index_ftell(index->fp); /**/ pass_boundary_list = Index_find_pass_boundaries(index, index_strand, (available_memory-used_memory)); g_message("Using [%d] %s pass%s", pass_boundary_list->len, is_forward?"forward":"revcomp", (pass_boundary_list->len == 1)?"":"es"); for(i = 0; i < pass_boundary_list->len; i++){ curr = g_array_index(pass_boundary_list, gint, i); g_message("Sequence pass [%d] for word interval [%d] to [%d]", i+1, prev, curr); address_data = Index_AddressData_create(index, index_strand, prev, curr-prev); Index_visit_words(index, index_strand, Index_report_words_visit, is_forward, address_data); Index_AddressData_write(address_data, index, index_strand); Index_AddressData_destroy(address_data); prev = curr; } g_array_free(pass_boundary_list, TRUE); g_free(address_list_array); return; } /* Revised algorithm: * Calculate how much space can be used * Calculate how many passes are required * for each pass * collect words within word interval * write out words */ static gsize Index_get_word_data_space(Index *index, Index_Strand *index_strand){ register gint word_data_bits = index->width->max_word_width + index_strand->width.max_index_len_width + index_strand->width.total_index_len_width; return BitArray_get_size(word_data_bits * index_strand->header.word_list_length); } static void Index_find_offsets(Index *index, Index_Strand *index_strand){ register gint i; register gint64 offset = 0; register Index_Word *index_word; for(i = 0; i < index_strand->header.word_list_length; i++){ index_word = &index_strand->word_list[i]; index_word->index_offset = offset; offset += Index_Word_get_address_list_space(index_word, index); } index_strand->header.total_index_length = offset; return; } static void Index_Strand_Header_fill(Index_Strand_Header *header, FILE *fp){ header->max_index_length = BitArray_read_int(fp); header->word_list_length = BitArray_read_int(fp); header->total_index_length = BitArray_read_int(fp); return; } static void Index_Strand_Header_print(Index_Strand_Header *header, FILE *fp){ BitArray_write_int(header->max_index_length, fp); BitArray_write_int(header->word_list_length, fp); BitArray_write_int(header->total_index_length, fp); return; } static void Index_Strand_write(Index *index, Index_Strand *index_strand){ register BitArray *word_ba = BitArray_create(); register gint i, word_id, word_count = 0; register Index_Word *index_word; /* Create a bitarray from word_list and write it out */ Index_Strand_Header_print(&index_strand->header, index->fp); for(i = 0; i < index->vfsm->lrw; i++){ word_id = index_strand->word_table[i]; if(word_id >= 0){ BitArray_append(word_ba, i, index->width->max_word_width); index_word = &index_strand->word_list[word_id]; BitArray_append(word_ba, index_word->freq_count, index_strand->width.max_index_len_width); BitArray_append(word_ba, index_word->index_offset, index_strand->width.total_index_len_width); word_count++; } } g_assert(word_count == index_strand->header.word_list_length); BitArray_write(word_ba, index->fp); BitArray_destroy(word_ba); return; } static void Index_Strand_Width_fill(Index_Strand *index_strand){ register guint64 n; for(n = index_strand->header.max_index_length; n; n >>= 1) index_strand->width.max_index_len_width++; for(n = index_strand->header.total_index_length; n; n >>= 1) index_strand->width.total_index_len_width++; return; } static Index_Strand *Index_Strand_create(Index *index, gboolean is_forward, gint memory_limit){ register Index_Strand *index_strand = g_new0(Index_Strand, 1); g_assert(index); index_strand->word_table = g_new0(gint, index->vfsm->lrw); g_assert(index_strand->word_table); /* 1st sequence pass */ Index_freq_count(index, index_strand, is_forward); /* Remove words occuring too frequently */ Index_desaturate(index, index_strand); Index_survey_word_list(index, index_strand); Index_find_offsets(index, index_strand); Index_Strand_Width_fill(index_strand); Index_Strand_write(index, index_strand); /* 2nd seq pass */ Index_report_word_list(index, index_strand, is_forward, memory_limit); index_strand->index_cache = NULL; return index_strand; } static void Index_Strand_destroy(Index_Strand *index_strand){ g_assert(index_strand); g_free(index_strand->word_table); g_free(index_strand->word_list); if(index_strand->index_cache) BitArray_destroy(index_strand->index_cache); g_free(index_strand); return; } Index *Index_create(Dataset *dataset, gboolean is_translated, gint word_length, gint word_jump, gint word_ambiguity, gint saturate_threshold, gchar *index_path, gchar *dataset_path, gint memory_limit){ register Index *index = g_new0(Index, 1); register gchar *member; register Alphabet *alphabet; index->ref_count = 1; /* Open index path for reading */ index->fp = fopen(index_path, "r"); if(index->fp){ fclose(index->fp); g_error("Index file [%s] already exists", index_path); } /* Open index path for writing */ index->fp = fopen(index_path, "w"); if(!index->fp) g_error("Could not open [%s] to write index file", index_path); index->dataset_path = g_strdup(dataset_path); index->dataset = Dataset_share(dataset); index->header = Index_Header_create(strlen(dataset_path), is_translated, word_length, word_jump, word_ambiguity, saturate_threshold); if(is_translated){ alphabet = Alphabet_create(Alphabet_Type_PROTEIN, FALSE); member = (gchar*)alphabet->member; Alphabet_destroy(alphabet); } else { member = (gchar*)dataset->alphabet->member; } index->vfsm = VFSM_create(member, word_length); index->width = Index_Width_create(index->vfsm, dataset); /* Index_Width_info(index->width); */ /* Write out header and dataset path */ Index_Header_write(index->header, index->fp); g_assert(Index_ftell(index->fp) == sizeof(Index_Header)); fwrite(index->dataset_path, sizeof(gchar), index->header->dataset_path_len, index->fp); g_assert(Index_ftell(index->fp) == (sizeof(Index_Header) + index->header->dataset_path_len)); /* Create forward and revcomp Index_Strand */ index->forward = Index_Strand_create(index, TRUE, memory_limit); if(is_translated) index->revcomp = Index_Strand_create(index, FALSE, memory_limit); /* Close index->fp and reopen read-only */ fclose(index->fp); index->fp = fopen(index_path, "r"); if(!index->fp) g_error("Could not repoen [%s] for reading", index_path); #ifdef USE_PTHREADS pthread_mutex_init(&index->index_mutex, NULL); #endif /* USE_PTHREADS */ return index; } /* Indexing algorithm: * * word_table: +ve for word_list, -ve for absent_word_list * * data:[absent_list|neighbour_list|address_list,address_count] * * - count word frequencies in word_table (SEQ PASS 1) * - desaturate word_table * - convert word_table to pos in word_list and fill freq_count * - calculate neighbours (WORDHOOD TRAVERSE) * fill neighbour_count and store list in build_data * when freq_count == 0 * add AbsentWord to build_data * - expand word_list with absent words * (update word_list ordering in word_table) * (the data array is now empty) * - write out the data for each word * - write out and free neighbour lists and padding for index addresses * * - parse seqs again, on each word: (SEQ PASS 2) * fill Index_Data (in data) * when (index_len == freq_count) * write the Index_Data and free */ /* Algorithm overview: * * +count word freqs * +desaturate words * +convert word freqs to pos in word_list and fill index_word->freq_count * +survey word neighbourhood * +expand word_list with absent words * +Index_Width_create() * +write out zero-padded file * +calc and write out neighbourhoods * +calc and write out word addresses */ Index *Index_share(Index *index){ index->ref_count++; return index; } void Index_destroy(Index *index){ if(--index->ref_count) return; if(index->fp) fclose(index->fp); g_free(index->dataset_path); Dataset_destroy(index->dataset); Index_Header_destroy(index->header); VFSM_destroy(index->vfsm); Index_Width_destroy(index->width); if(index->forward) Index_Strand_destroy(index->forward); if(index->revcomp) Index_Strand_destroy(index->revcomp); #ifdef USE_PTHREADS pthread_mutex_destroy(&index->index_mutex); #endif /* USE_PTHREADS */ g_free(index); return; } void Index_info(Index *index){ g_print("Sequence Index:\n" "--------------\n" " version: %d\n" " type: %s\n" " word length: %d\n" " word jump: %d\n" " word ambiguity: %d\n" " saturate threshold: %d\n\n", (gint)index->header->version, (index->header->type & 1) ? "translated" : "normal", (gint)index->header->word_length, (gint)index->header->word_jump, (gint)index->header->word_ambiguity, (gint)index->header->saturate_threshold); return; } static Index_Word *Index_Strand_read_word_list(Index *index, Index_Strand *index_strand){ register Index_Word *word_list = g_new(Index_Word, index_strand->header.word_list_length); register BitArray *word_list_ba = BitArray_read(index->fp, Index_get_word_data_space(index, index_strand)); register gint i, start = 0; register guint64 leaf; register Index_Word *index_word; for(i = 0; i < index->vfsm->lrw; i++) index_strand->word_table[i] = -1; for(i = 0; i < index_strand->header.word_list_length; i++){ leaf = BitArray_get(word_list_ba, start, index->width->max_word_width); start += index->width->max_word_width; index_strand->word_table[leaf] = i; index_word = &word_list[i]; g_assert(index_word); /**/ index_word->freq_count = BitArray_get(word_list_ba, start, index_strand->width.max_index_len_width); start += index_strand->width.max_index_len_width; /**/ index_word->index_offset = BitArray_get(word_list_ba, start, index_strand->width.total_index_len_width); start += index_strand->width.total_index_len_width; } BitArray_destroy(word_list_ba); return word_list; } static Index_Strand *Index_Strand_read(Index *index, gboolean is_forward){ register Index_Strand *index_strand = g_new0(Index_Strand, 1); Index_Strand_Header_fill(&index_strand->header, index->fp); Index_Strand_Width_fill(index_strand); /**/ index_strand->word_table = g_new(gint, index->vfsm->lrw); index_strand->word_list = Index_Strand_read_word_list(index, index_strand); index_strand->index_cache = NULL; /**/ index_strand->strand_offset = Index_ftell(index->fp); /* Seek to end of strand index */ Index_fseek(index->fp, index_strand->header.total_index_length, SEEK_CUR); return index_strand; } Index *Index_open(gchar *index_path){ register Index *index = g_new(Index, 1); register gchar *member; register Alphabet *alphabet; index->ref_count = 1; index->fp = fopen(index_path, "r"); if(!index->fp) g_error("Could not open index [%s]", index_path); index->header = Index_Header_read(index->fp); index->dataset_path = g_new(gchar, index->header->dataset_path_len+1); index->dataset_path[index->header->dataset_path_len] = '\0'; fread(index->dataset_path, sizeof(gchar), index->header->dataset_path_len, index->fp); index->dataset = Dataset_read(index->dataset_path); /**/ if(index->header->type & 1){ /* is_translated */ alphabet = Alphabet_create(Alphabet_Type_PROTEIN, FALSE); member = (gchar*)alphabet->member; Alphabet_destroy(alphabet); } else { member = (gchar*)index->dataset->alphabet->member; } index->vfsm = VFSM_create(member, index->header->word_length); index->width = Index_Width_create(index->vfsm, index->dataset); /* Index_Width_info(index->width); */ /**/ index->forward = Index_Strand_read(index, TRUE); if(index->header->type == 1) /* is_translated */ index->revcomp = Index_Strand_read(index, FALSE); else index->revcomp = NULL; #ifdef USE_PTHREADS pthread_mutex_init(&index->index_mutex, NULL); #endif /* USE_PTHREADS */ return index; } typedef struct { VFSM_Int leaf; gint query_pos; } Index_WordSeed; static void Index_get_query_word_list(Index *index, Index_Strand *index_strand, Sequence *query, GArray *word_seed_list, gint frame, HSP_Param *hsp_param){ register gint i; register gchar *str = Sequence_get_str(query); register VFSM_Int state = 0; Index_WordSeed seed; for(i = 0; str[i]; i++){ if(!index->vfsm->index[(guchar)str[i]]){ state = 0; continue; } state = VFSM_change_state_M(index->vfsm, state, (guchar)str[i]); if(VFSM_state_is_leaf(index->vfsm, state)){ seed.leaf = VFSM_state2leaf(index->vfsm, state); if((index_strand->word_table[seed.leaf] >= 0) || hsp_param->wordhood){ seed.query_pos = i - (index->header->word_length-1); if(frame) seed.query_pos = (seed.query_pos * 3) + frame - 1; g_array_append_val(word_seed_list, seed); } } } g_free(str); return; } /* FIXME: needs to work with softmasked sequences */ typedef struct { gint start; gint length; gint target_id; } Index_Interval; static gboolean Index_Address_lt_Interval_start(Index_Address *address, Index_Interval *interval){ if(address->sequence_id == interval->target_id) return address->position < interval->start; return address->sequence_id < interval->target_id; } static gboolean Index_Address_lt_Interval_end(Index_Address *address, Index_Interval *interval){ if(address->sequence_id == interval->target_id) return address->position < (interval->start + interval->length); return address->sequence_id < interval->target_id; } static gboolean Index_Address_gt_Interval_end(Index_Address *address, Index_Interval *interval){ if(address->sequence_id == interval->target_id) return address->position > (interval->start + interval->length); return address->sequence_id > interval->target_id; } static gint Index_Address_list_trim_start(Index_Address *address_list, gint address_list_start, gint address_list_len, Index_Interval *limit){ register gint left = address_list_start, right = address_list_start+address_list_len-1, mid; while(left < right){ mid = (left+right) >> 1; if(Index_Address_lt_Interval_start(&address_list[mid], limit)){ if((mid == (address_list_start+address_list_len)) || (!Index_Address_lt_Interval_start(&address_list[mid+1], limit))) return mid+1-address_list_start; left = mid + 1; } else { right = mid; } } return address_list_len; } static gint Index_Address_list_trim_end(Index_Address *address_list, gint address_list_start, gint address_list_len, Index_Interval *limit){ register gint left = address_list_start, right = address_list_start+address_list_len-1, mid; while(left < right){ mid = (left+right) >> 1; if(Index_Address_lt_Interval_end(&address_list[mid], limit)){ left = mid + 1; } else { if((mid == address_list_start) || (Index_Address_lt_Interval_end(&address_list[mid-1], limit))) return address_list_start+address_list_len-mid; right = mid; } } return address_list_len; } static void Index_Address_list_refine_recur( Index_Address *address_list, gint address_list_start, gint address_list_len, Index_Interval *interval_list, gint interval_list_start, gint interval_list_len, Index_Address *refined_address_list, gint *refined_address_list_len){ register gint i, move, interval_mid; if(Index_Address_lt_Interval_start(&address_list[address_list_start], &interval_list[interval_list_start])){ move = Index_Address_list_trim_start(address_list, address_list_start, address_list_len, &interval_list[interval_list_start]); address_list_start += move; address_list_len -= move; } if(Index_Address_gt_Interval_end( &address_list[address_list_start+address_list_len-1], &interval_list[interval_list_start+interval_list_len-1])){ move = Index_Address_list_trim_end(address_list, address_list_start, address_list_len, &interval_list[interval_list_start+interval_list_len-1]); address_list_len -= move; } if(!address_list_len) return; if(interval_list_len == 1){ for(i = 0; i < address_list_len; i++){ refined_address_list[(*refined_address_list_len)].sequence_id = address_list[address_list_start+i].sequence_id; refined_address_list[(*refined_address_list_len)].position = address_list[address_list_start+i].position; (*refined_address_list_len)++; } } else { interval_mid = interval_list_len >> 1; if(interval_mid) Index_Address_list_refine_recur(address_list, address_list_start, address_list_len, interval_list, interval_list_start, interval_mid, refined_address_list, refined_address_list_len); if(interval_list_len-interval_mid) Index_Address_list_refine_recur(address_list, address_list_start, address_list_len, interval_list, interval_list_start+interval_mid, interval_list_len-interval_mid, refined_address_list, refined_address_list_len); } return; } /* Algorithm: search address_list * * input: * address_list (ordered positions) * interval_list (ordered intervals) * * output: * list of regions in address_list * * algorithm: * get_addressses(address_list, region_list) * if(address_list starts before region_list) * trim address_list start to region list start * if(address_list ends after region_list) * trim address_list end to region list end * if(address list is empty) * return * if(region_list->len == 1){ * add address_list to output_list * } else { * region_mid = region+_list_len / 2 * get_addressses(address_list, region_start, region_mid); * get_addresses(address_list, region_mid, region_end); * } * */ /**/ static gint Index_Address_list_refine(Index_Address *address_list, gint address_list_len, GArray *interval_list){ register Index_Address *refined_address_list = g_new(Index_Address, address_list_len); register gint i; gint refined_address_list_len = 0; Index_Address_list_refine_recur(address_list, 0, address_list_len, (Index_Interval*)interval_list->data, 0, interval_list->len, refined_address_list, &refined_address_list_len); g_assert(refined_address_list_len <= address_list_len); /* Copy refined_address_list to address_list */ /* FIXME: optimisation: replace this */ for(i = 0; i < refined_address_list_len; i++){ address_list[i].sequence_id = refined_address_list[i].sequence_id; address_list[i].position = refined_address_list[i].position; } g_free(refined_address_list); return refined_address_list_len; } /* Index_Address_list_refine shortens address_list in place, * returning the new_address_list_len, which may be zero * if there is no intersection between the address_list and interval_list * * input: interval_list {target_id, start, length} * addresslist {sequence_id, position} */ static gint Index_Word_get_address_list(Index *index, Index_Strand *index_strand, Index_Word *index_word, Index_Address *address_list, GArray *interval_list){ register BitArray *ba; register gint i, pos = 0; register guint64 start = 0; register gint address_list_len = index_word->freq_count; g_assert(index_word->freq_count); if(index_strand->index_cache){ ba = index_strand->index_cache; start = index_word->index_offset * CHAR_BIT; } else { #ifdef USE_PTHREADS pthread_mutex_lock(&index->index_mutex); #endif /* USE_PTHREADS */ Index_fseek(index->fp, index_strand->strand_offset+index_word->index_offset, SEEK_SET); ba = BitArray_read(index->fp, Index_Word_get_address_list_space(index_word, index)); #ifdef USE_PTHREADS pthread_mutex_unlock(&index->index_mutex); #endif /* USE_PTHREADS */ } for(i = 0; i < index_word->freq_count; i++){ address_list[pos].sequence_id = BitArray_get(ba, start, index->width->number_of_seqs_width); start += index->width->number_of_seqs_width; address_list[pos++].position = BitArray_get(ba, start, index->dataset->width->max_seq_len_width); start += index->dataset->width->max_seq_len_width; } if(!index_strand->index_cache) BitArray_destroy(ba); if(interval_list) address_list_len = Index_Address_list_refine( address_list, index_word->freq_count, interval_list); return address_list_len; } /* FIXME: optimisation: use BitArray in pq_seed, to avoid duplication */ typedef struct { Index *index; Index_Strand *index_strand; Index_WordSeed seed; GArray *word_seed_list; } Index_Word_Collect_Traverse_Data; static gboolean Index_WordHood_collect_traverse_func(gchar *word, gint score, gpointer user_data){ register VFSM_Int state; VFSM_Int leaf; register Index_Word_Collect_Traverse_Data *iwctd = user_data; state = VFSM_word2state(iwctd->index->vfsm, word); leaf = VFSM_state2leaf(iwctd->index->vfsm, state); iwctd->seed.leaf = leaf; if(iwctd->index_strand->word_table[leaf] >= 0) g_array_append_val(iwctd->word_seed_list, iwctd->seed); return FALSE; } /* FIXME: optimisation: change to VFSM-based wordhoods */ static void Index_expand_word_seed_list(Index *index, Index_Strand *index_strand, WordHood *wh, GArray *word_seed_list){ register gint i; register VFSM_Int state; /* FIXME: set threshold, use_dropoff elsewhere */ /* FIXME: set word limit for dna/protein/codon and from default or client */ register gchar *word = g_new(gchar, index->header->word_length+1); register gint init_seed_list_len = word_seed_list->len; register Index_WordSeed *init_seed_list = g_new(Index_WordSeed, init_seed_list_len); register Index_WordSeed *orig_seed; for(i = 0; i < word_seed_list->len; i++){ orig_seed = &g_array_index(word_seed_list, Index_WordSeed, i); init_seed_list[i].leaf = orig_seed->leaf; init_seed_list[i].query_pos = orig_seed->query_pos; } g_array_set_size(word_seed_list, 0); Index_Word_Collect_Traverse_Data iwctd; iwctd.index = index; iwctd.index_strand = index_strand; iwctd.word_seed_list = word_seed_list; /**/ for(i = 0; i < init_seed_list_len; i++){ state = VFSM_leaf2state(index->vfsm, init_seed_list[i].leaf); VFSM_state2word(index->vfsm, state, word); iwctd.seed.query_pos = init_seed_list[i].query_pos; WordHood_traverse(wh, Index_WordHood_collect_traverse_func, word, index->header->word_length, &iwctd); } g_free(init_seed_list); g_free(word); return; } /* FIXME: also need ability to catch redundancy over a whole set of queries ? * - do this by doing Index vs Index matching * - merge word lists * - at each matching word pair * - get query seed word * - get target seed word * - expand EITHER query OR target with its neighbourlist * - for each query seed address * - for each target seed address * - seed */ /* FIXME: skip storage requirement for intermediate results, * by using PQ to reuse results. * This can be done for both the single and joint algorithms. */ static HSPset *Index_get_HSPset(Index *index, gint target_id, HSP_Param *hsp_param, Sequence *query, HSPset_SList_Node *node, gboolean revcomp_target){ register Sequence *target, *tmp_seq; register HSPset *hsp_set; register HSPset_SList_Node *curr; register gint i; /* Check there are enough seeds */ curr = node; for(i = 0; i < hsp_param->has->seed_repeat; i++){ if(!curr) return NULL; curr = curr->next; } /* Make HSPset for current target */ target = Dataset_get_sequence(index->dataset, target_id); /**/ if(revcomp_target){ tmp_seq = target; target = Sequence_revcomp(target); Sequence_destroy(tmp_seq); } hsp_set = HSPset_create(query, target, hsp_param); HSPset_seed_all_qy_sorted(hsp_set, node); Sequence_destroy(target); if(HSPset_is_empty(hsp_set)){ HSPset_destroy(hsp_set); return NULL; } /* HSPset_print(hsp_set); */ return hsp_set; } /**/ static Index_HSPset *Index_HSPset_create(HSPset *hsp_set, gint target_id){ register Index_HSPset *index_hsp_set = g_new(Index_HSPset, 1); index_hsp_set->hsp_set = HSPset_share(hsp_set); index_hsp_set->target_id = target_id; return index_hsp_set; } void Index_HSPset_destroy(Index_HSPset *index_hsp_set){ HSPset_destroy(index_hsp_set->hsp_set); g_free(index_hsp_set); return; } /**/ static GPtrArray *Index_seed_HSPsets(Index *index, Index_Strand *index_strand, GArray *seed_list, HSP_Param *hsp_param, Sequence *query, gboolean revcomp_target, GArray *interval_list){ register GPtrArray *hsp_set_list = g_ptr_array_new(); register gint i, j, word_id, address_list_len; gint target_id; register Index_WordSeed *seed; register Index_Address *address_list; register Index_Word *index_word; register HSPset_SList_Node *node, **target_bin = g_new0(HSPset_SList_Node*, index->dataset->header->number_of_seqs); /* FIXME: allocate target_bin outside of this function * (and clear after use) (per-thread) */ register GArray *target_id_list = g_array_new(FALSE, FALSE, sizeof(gint)); register HSPset *hsp_set; register Index_HSPset *index_hsp_set; register RecycleBin *hsp_slist_recycle = HSPset_SList_RecycleBin_create(); /**/ for(i = 0; i < seed_list->len; i++){ seed = &g_array_index(seed_list, Index_WordSeed, i); word_id = index_strand->word_table[seed->leaf]; index_word = &index_strand->word_list[word_id]; g_assert(word_id >= 0); /* FIXME: optimisation: reuse address list to avoid allocation */ address_list = g_new(Index_Address, index_word->freq_count); address_list_len = Index_Word_get_address_list(index, index_strand, index_word, address_list, interval_list); if(!address_list_len){ g_free(address_list); continue; /* May be absent when using interval_list */ } g_assert(index_word->freq_count); g_assert(address_list_len); for(j = 0; j < address_list_len; j++){ target_id = address_list[j].sequence_id; if(!target_bin[target_id]) g_array_append_val(target_id_list, target_id); target_bin[target_id] = HSPset_SList_append(hsp_slist_recycle, target_bin[target_id], seed->query_pos, address_list[j].position); } g_free(address_list); } for(i = 0; i < target_id_list->len; i++){ target_id = g_array_index(target_id_list, gint, i); node = target_bin[target_id]; g_assert(node); hsp_set = Index_get_HSPset(index, target_id, hsp_param, query, node, revcomp_target); if(hsp_set){ index_hsp_set = Index_HSPset_create(hsp_set, target_id); HSPset_destroy(hsp_set); /* g_message("Have [%d] seeds", hsp_set->hsp_list->len); */ g_ptr_array_add(hsp_set_list, index_hsp_set); } } g_free(target_bin); g_array_free(target_id_list, TRUE); RecycleBin_destroy(hsp_slist_recycle); if(!hsp_set_list->len){ g_ptr_array_free(hsp_set_list, TRUE); return NULL; } return hsp_set_list; } static GArray *Index_get_word_seed_list(Index*index, Sequence *query, Index_Strand *index_strand, HSP_Param *hsp_param){ register GArray *word_seed_list = g_array_new(FALSE, FALSE, sizeof(Index_WordSeed)); register gint i; register Sequence *aa_seq; if(hsp_param->match->query->is_translated){ g_assert(hsp_param->match->mas->translate); for(i = 0; i < 3; i++){ aa_seq = Sequence_translate(query, hsp_param->match->mas->translate, i+1); Index_get_query_word_list(index, index_strand, aa_seq, word_seed_list, i+1, hsp_param); Sequence_destroy(aa_seq); } } else { Index_get_query_word_list(index, index_strand, query, word_seed_list, 0, hsp_param); } /**/ /* g_message("Using [%d] initial word seeds", word_seed_list->len); */ if(hsp_param->wordhood){ Index_expand_word_seed_list(index, index_strand, hsp_param->wordhood, word_seed_list); /* g_message("... expanded to [%d] words with wordhood", word_seed_list->len); */ } return word_seed_list; } static Index_Strand *Index_get_index_strand(Index *index, gboolean revcomp_target){ if(revcomp_target){ if(!index->revcomp) g_error("No revcomp strand available in Index"); return index->revcomp; } return index->forward; } static GPtrArray *Index_get_HSPsets_interval(Index *index, HSP_Param *hsp_param, Sequence *query, gboolean revcomp_target, GArray *interval_list, GArray *word_seed_list){ register GPtrArray *hsp_set_list; register Index_Strand *index_strand = Index_get_index_strand(index, revcomp_target); register gboolean free_word_seed_list = FALSE; /* Traverse the query using VSFM */ g_assert(index_strand); if(!word_seed_list){ word_seed_list = Index_get_word_seed_list(index, query, index_strand, hsp_param); free_word_seed_list = TRUE; } hsp_set_list = Index_seed_HSPsets(index, index_strand, word_seed_list, hsp_param, query, revcomp_target, interval_list); if(free_word_seed_list) g_array_free(word_seed_list, TRUE); return hsp_set_list; } /* FIXME: optimisation: detect repeated words with two query passes */ GPtrArray *Index_get_HSPsets(Index *index, HSP_Param *hsp_param, Sequence *query, gboolean revcomp_target){ return Index_get_HSPsets_interval(index, hsp_param, query, revcomp_target, NULL, NULL); } /**/ typedef struct { gboolean go_fwd; gboolean go_rev; HSP *hsp; } Index_Subseed; typedef struct { RangeTree *keeper_hsp_tree; /* accepted HSPs */ RangeTree *cand_hsp_tree; /* candidate HSPs */ HSP *max_cobs_cand_hsp; /* candidate HSP with largest cobs */ NOI_Tree *coverage; /* searched regions of target */ GPtrArray *hspset_list; /* contributing HSPsets */ GPtrArray *subseed_list; GPtrArray *next_subseed_list; gint target_id; } Index_Geneseed; static void Index_Subseed_add_to_subseed_list(GPtrArray *subseed_list, HSP *hsp, gboolean go_fwd, gboolean go_rev){ register Index_Subseed *index_subseed = g_new(Index_Subseed, 1); index_subseed->hsp = hsp; index_subseed->go_fwd = go_fwd; index_subseed->go_rev = go_rev; g_ptr_array_add(subseed_list, index_subseed); return; } static Index_Geneseed *Index_Geneseed_create(Index_HSPset *index_hspset, NOI_Tree_Set *nts){ register Index_Geneseed *index_geneseed = g_new(Index_Geneseed, 1); register gint i; register HSP *hsp; index_geneseed->keeper_hsp_tree = RangeTree_create(); index_geneseed->cand_hsp_tree = RangeTree_create(); index_geneseed->max_cobs_cand_hsp = NULL; index_geneseed->coverage = NOI_Tree_create(nts); index_geneseed->hspset_list = g_ptr_array_new(); index_geneseed->subseed_list = g_ptr_array_new(); index_geneseed->next_subseed_list = g_ptr_array_new(); index_geneseed->target_id = index_hspset->target_id; for(i = 0; i < index_hspset->hsp_set->hsp_list->len; i++){ hsp = index_hspset->hsp_set->hsp_list->pdata[i]; RangeTree_add(index_geneseed->keeper_hsp_tree, HSP_query_cobs(hsp), HSP_target_cobs(hsp), hsp); Index_Subseed_add_to_subseed_list(index_geneseed->subseed_list, hsp, TRUE, TRUE); } g_ptr_array_add(index_geneseed->hspset_list, HSPset_share(index_hspset->hsp_set)); return index_geneseed; } static void Index_Geneseed_destroy(Index_Geneseed *index_geneseed, NOI_Tree_Set *nts){ register gint i; register HSPset *hspset; for(i = 0; i < index_geneseed->hspset_list->len; i++){ hspset = index_geneseed->hspset_list->pdata[i]; HSPset_destroy(hspset); } RangeTree_destroy(index_geneseed->keeper_hsp_tree, NULL, NULL); RangeTree_destroy(index_geneseed->cand_hsp_tree, NULL, NULL); NOI_Tree_destroy(index_geneseed->coverage, nts); g_free(index_geneseed); return; } static gboolean Index_Geneseed_collect_hspset_Rangetree_traverse(gint x, gint y, gpointer info, gpointer user_data){ register HSP *hsp = info; register HSPset *hsp_set = user_data; HSPset_add_known_hsp(hsp_set, hsp->query_start, hsp->target_start, hsp->length); return FALSE; } static Index_HSPset *Index_Geneseed_collect_hspset( Index_Geneseed *index_geneseed){ register Index_HSPset *index_hspset; register HSPset *hsp_set, *source_hspset; if(RangeTree_is_empty(index_geneseed->keeper_hsp_tree)) return NULL; g_assert(index_geneseed->hspset_list->len); source_hspset = index_geneseed->hspset_list->pdata[0]; hsp_set = HSPset_create(source_hspset->query, source_hspset->target, source_hspset->param); RangeTree_traverse(index_geneseed->keeper_hsp_tree, Index_Geneseed_collect_hspset_Rangetree_traverse, hsp_set); g_assert(!HSPset_is_empty(hsp_set)); HSPset_finalise(hsp_set); /* g_message("collected [%d] hsps", hsp_set->hsp_list->len); */ index_hspset = Index_HSPset_create(hsp_set, index_geneseed->target_id); HSPset_destroy(hsp_set); return index_hspset; } /**/ typedef struct { GPtrArray *index_geneseed_list; NOI_Tree_Set *nts; /**/ GArray *curr_interval_list; gint curr_target_id; /**/ gint max_query_span; gint max_target_span; } Index_Geneseed_List; static void Index_Interval_NOI_Tree_traverse(gint start, gint length, gpointer user_data){ register Index_Geneseed_List *geneseed_list = user_data; Index_Interval interval; /**/ interval.start = start; interval.length = length; interval.target_id = geneseed_list->curr_target_id; g_array_append_val(geneseed_list->curr_interval_list, interval); /* g_message("add interval [%d,%d,%d] to interval_list", start, length, geneseed_list->curr_target_id); */ return; } /* Need to have interval_list, interval, target_id */ static int Index_Geneseed_Subseed_compare(const void *a, const void *b){ register Index_Subseed **subseed_a = (Index_Subseed**)a, **subseed_b = (Index_Subseed**)b; return (*subseed_a)->hsp->target_start - (*subseed_b)->hsp->target_start; } #if 0 static void Index_Geneseed_get_regions(Index_Geneseed *index_geneseed, NOI_Tree_Set *nts, gint max_target_span){ register gint i, start, length; register gint target_range; register Index_Subseed *subseed; register HSP *hsp; NOI_Tree_delta_init(index_geneseed->coverage, nts); /* FIXME: optimisation replace qsort of HSPset with radix type sort */ qsort(index_geneseed->subseed_list->pdata, index_geneseed->subseed_list->len, sizeof(gpointer), Index_Geneseed_Subseed_compare); for(i = 0; i < index_geneseed->subseed_list->len; i++){ subseed = index_geneseed->subseed_list->pdata[i]; hsp = subseed->hsp; target_range = max_target_span + ((HSP_target_cobs(hsp)-hsp->target_start) * 2); if(subseed->go_rev){ start = hsp->target_start - target_range; if(start < 0) start = 0; length = HSP_target_end(hsp) + target_range - start; if((start+length) >= hsp->hsp_set->target->len) length = hsp->hsp_set->target->len - start; NOI_Tree_insert(index_geneseed->coverage, nts, start, length); } if(subseed->go_fwd){ start = HSP_target_cobs(hsp); length = start + target_range; if((start+length) >= hsp->hsp_set->target->len) length = hsp->hsp_set->target->len - start; NOI_Tree_insert(index_geneseed->coverage, nts, start, length); } } NOI_Tree_delta_traverse(index_geneseed->coverage, nts, Index_Interval_NOI_Tree_traverse); return; } #endif /* 0 */ static void Index_Geneseed_get_regions(Index_Geneseed *index_geneseed, NOI_Tree_Set *nts, gint max_target_span){ register gint i, start, length; register gint target_range; register Index_Subseed *subseed; register HSP *hsp; NOI_Tree_delta_init(index_geneseed->coverage, nts); /* FIXME: optimisation replace qsort of HSPset with radix type sort */ qsort(index_geneseed->subseed_list->pdata, index_geneseed->subseed_list->len, sizeof(gpointer), Index_Geneseed_Subseed_compare); for(i = 0; i < index_geneseed->subseed_list->len; i++){ subseed = index_geneseed->subseed_list->pdata[i]; hsp = subseed->hsp; target_range = max_target_span + ((HSP_target_cobs(hsp)-hsp->target_start) * 2); if(subseed->go_rev){ start = HSP_target_cobs(hsp) - target_range; if(start < 0) start = 0; length = HSP_target_cobs(hsp) - start; g_assert((start+length) <= hsp->hsp_set->target->len); NOI_Tree_insert(index_geneseed->coverage, nts, start, length); } if(subseed->go_fwd){ start = HSP_target_cobs(hsp); length = start + target_range; if((start+length) >= hsp->hsp_set->target->len) length = hsp->hsp_set->target->len - start; NOI_Tree_insert(index_geneseed->coverage, nts, start, length); } } NOI_Tree_delta_traverse(index_geneseed->coverage, nts, Index_Interval_NOI_Tree_traverse); return; } /**/ static int Index_Geneseed_compare(const void *a, const void *b){ register Index_Geneseed **geneseed_a = (Index_Geneseed**)a, **geneseed_b = (Index_Geneseed**)b; return (*geneseed_a)->target_id - (*geneseed_b)->target_id; } static Index_Geneseed_List *Index_Geneseed_List_create(GPtrArray *hspset_list, gint max_query_span, gint max_target_span){ register Index_Geneseed_List *geneseed_list = g_new(Index_Geneseed_List, hspset_list->len); register Index_HSPset *index_hsp_set; register Index_Geneseed *index_geneseed; register gint i; geneseed_list->curr_interval_list = NULL; geneseed_list->curr_target_id = -1; /**/ geneseed_list->max_query_span = max_query_span; geneseed_list->max_target_span = max_target_span; /**/ geneseed_list->nts = NOI_Tree_Set_create(geneseed_list); geneseed_list->index_geneseed_list = g_ptr_array_new(); for(i = 0; i < hspset_list->len; i++){ index_hsp_set = hspset_list->pdata[i]; index_geneseed = Index_Geneseed_create(index_hsp_set, geneseed_list->nts); g_ptr_array_add(geneseed_list->index_geneseed_list, index_geneseed); } g_assert(hspset_list->len == geneseed_list->index_geneseed_list->len); qsort(geneseed_list->index_geneseed_list->pdata, geneseed_list->index_geneseed_list->len, sizeof(gpointer), Index_Geneseed_compare); return geneseed_list; } static void Index_Geneseed_List_destroy(Index_Geneseed_List *geneseed_list){ register gint i; register Index_Geneseed *index_geneseed; for(i = 0; i < geneseed_list->index_geneseed_list->len; i++){ index_geneseed = geneseed_list->index_geneseed_list->pdata[i]; Index_Geneseed_destroy(index_geneseed, geneseed_list->nts); } g_ptr_array_free(geneseed_list->index_geneseed_list, TRUE); NOI_Tree_Set_destroy(geneseed_list->nts); g_free(geneseed_list); return; } static GArray *Index_Geneseed_List_get_regions( Index_Geneseed_List *geneseed_list){ register gint i; register Index_Geneseed *index_geneseed; register GArray *interval_list = g_array_new(FALSE, FALSE, sizeof(Index_Interval)); geneseed_list->curr_interval_list = interval_list; for(i = 0; i < geneseed_list->index_geneseed_list->len; i++){ index_geneseed = geneseed_list->index_geneseed_list->pdata[i]; geneseed_list->curr_target_id = index_geneseed->target_id; if(index_geneseed->subseed_list->len) Index_Geneseed_get_regions(index_geneseed, geneseed_list->nts, geneseed_list->max_target_span); } geneseed_list->curr_interval_list = NULL; geneseed_list->curr_target_id = -1; if(!interval_list->len){ g_array_free(interval_list, TRUE); return NULL; } return interval_list; } /**/ static GPtrArray *Index_Geneseed_List_get_subseeds(Index *index, HSP_Param *hsp_param, Sequence *query, gboolean revcomp_target, GArray *interval_list, GArray *word_seed_list){ return Index_get_HSPsets_interval(index, hsp_param, query, revcomp_target, interval_list, word_seed_list); } /**/ static gboolean Index_Geneseed_refine_RangeTree_report_fwd(gint x, gint y, gpointer info, gpointer user_data){ /* Called for each selected new HSP */ register HSP *hsp = info; register Index_Geneseed *index_geneseed = user_data; /* Add to keeper rangetree if not already present */ if(!RangeTree_check_pos(index_geneseed->keeper_hsp_tree, x, y)){ RangeTree_add(index_geneseed->keeper_hsp_tree, x, y, hsp); Index_Subseed_add_to_subseed_list(index_geneseed->next_subseed_list, hsp, TRUE, FALSE); } return FALSE; } static gboolean Index_Geneseed_refine_RangeTree_report_rev(gint x, gint y, gpointer info, gpointer user_data){ /* Called for each selected new HSP */ register HSP *hsp = info; register Index_Geneseed *index_geneseed = user_data; /* Add to keeper rangetree if not already present */ if(!RangeTree_check_pos(index_geneseed->keeper_hsp_tree, x, y)){ RangeTree_add(index_geneseed->keeper_hsp_tree, x, y, hsp); Index_Subseed_add_to_subseed_list(index_geneseed->next_subseed_list, hsp, FALSE, TRUE); } return FALSE; } static int Index_HSPset_compare(const void *a, const void *b){ register Index_HSPset **index_hspset_a = (Index_HSPset**)a, **index_hspset_b = (Index_HSPset**)b; return (*index_hspset_a)->target_id - (*index_hspset_b)->target_id; } static void Index_Geneseed_refine_subseeds( Index_Geneseed_List *index_geneseed_list, GPtrArray *subseed_hsp_list){ register gint i, j, ig_pos = 0, query_range, target_range; register Index_HSPset *index_hspset; register Index_Geneseed *index_geneseed; register HSP *hsp; register GPtrArray *swap_list; register Index_Subseed *subseed; /* Sort subseed_hsp_list by target_id */ qsort(subseed_hsp_list->pdata, subseed_hsp_list->len, sizeof(gpointer), Index_HSPset_compare); /* For each subseed HSPset */ for(i = 0; i < subseed_hsp_list->len; i++){ index_hspset = subseed_hsp_list->pdata[i]; /* Find the corresponding Index_Geneseed */ index_geneseed = NULL; while(ig_pos < index_geneseed_list->index_geneseed_list->len){ index_geneseed = index_geneseed_list->index_geneseed_list->pdata[ig_pos]; if(index_hspset->target_id == index_geneseed->target_id) break; index_geneseed = NULL; ig_pos++; } g_assert(index_geneseed); if(!index_geneseed->subseed_list->len) continue; /* Put new HSPs into cand RangeTree */ for(j = 0; j < index_hspset->hsp_set->hsp_list->len; j++){ hsp = index_hspset->hsp_set->hsp_list->pdata[j]; if(!RangeTree_check_pos(index_geneseed->cand_hsp_tree, HSP_query_cobs(hsp), HSP_target_cobs(hsp))){ RangeTree_add(index_geneseed->cand_hsp_tree, HSP_query_cobs(hsp), HSP_target_cobs(hsp), hsp); } /* Store max cobs candidate hsp */ if((!index_geneseed->max_cobs_cand_hsp) || (index_geneseed->max_cobs_cand_hsp->cobs < hsp->cobs)) index_geneseed->max_cobs_cand_hsp = hsp; } /* For each old HSP */ for(j = 0; j < index_geneseed->subseed_list->len; j++){ subseed = index_geneseed->subseed_list->pdata[j]; hsp = subseed->hsp; query_range = index_geneseed_list->max_query_span + (((HSP_query_end(hsp) - HSP_query_cobs(hsp)) + (HSP_query_cobs(index_geneseed->max_cobs_cand_hsp) - index_geneseed->max_cobs_cand_hsp->query_start)) * 2); target_range = index_geneseed_list->max_target_span + (((HSP_target_end(hsp) - HSP_target_cobs(hsp)) + (HSP_target_cobs(index_geneseed->max_cobs_cand_hsp) - index_geneseed->max_cobs_cand_hsp->target_start)) * 2); /* Select HSP within range forwards from cand RangeTree */ if(subseed->go_fwd) RangeTree_find(index_geneseed->cand_hsp_tree, HSP_query_cobs(hsp), query_range, HSP_target_cobs(hsp), target_range, Index_Geneseed_refine_RangeTree_report_fwd, index_geneseed); /* Select HSP within range backwards from cand RangeTree */ if(subseed->go_rev) RangeTree_find(index_geneseed->cand_hsp_tree, HSP_query_cobs(hsp)-query_range, query_range, HSP_target_cobs(hsp)-target_range, target_range, Index_Geneseed_refine_RangeTree_report_rev, index_geneseed); g_free(subseed); } /* Add new HSPset to hspset_list */ g_ptr_array_add(index_geneseed->hspset_list, HSPset_share(index_hspset->hsp_set)); /* Empty subseed_list and swap with next_subseed_list */ g_ptr_array_set_size(index_geneseed->subseed_list, 0); swap_list = index_geneseed->subseed_list; index_geneseed->subseed_list = index_geneseed->next_subseed_list; index_geneseed->next_subseed_list = swap_list; } return; } static GPtrArray *Index_Geneseed_collect_hsps( Index_Geneseed_List *index_geneseed_list){ register GPtrArray *hsp_list = g_ptr_array_new(); register gint i; register Index_Geneseed *index_geneseed; register Index_HSPset *index_hspset; for(i = 0; i < index_geneseed_list->index_geneseed_list->len; i++){ index_geneseed = index_geneseed_list->index_geneseed_list->pdata[i]; index_hspset = Index_Geneseed_collect_hspset(index_geneseed); if(index_hspset) g_ptr_array_add(hsp_list, index_hspset); } if(!hsp_list->len){ g_ptr_array_free(hsp_list, TRUE); return NULL; } return hsp_list; } static void Index_HSPset_List_destroy(GPtrArray *index_hspset_list){ register gint i; register Index_HSPset *index_hsp_set; for(i = 0; i < index_hspset_list->len; i++){ index_hsp_set = index_hspset_list->pdata[i]; Index_HSPset_destroy(index_hsp_set); } g_ptr_array_free(index_hspset_list, TRUE); return; } GPtrArray *Index_get_HSPsets_geneseed(Index *index, HSP_Param *hsp_param, Sequence *query, gboolean revcomp_target, gint geneseed_threshold, gint geneseed_repeat, gint max_query_span, gint max_target_span){ register gint original_hsp_threshold = hsp_param->threshold, original_seed_repeat = hsp_param->seed_repeat; register GPtrArray *geneseed_hsp_list, *subseed_hsp_list, *final_hsp_list; register Index_Geneseed_List *index_geneseed_list; register GArray *interval_list; register Index_Strand *index_strand = Index_get_index_strand(index, revcomp_target); register GArray *word_seed_list = Index_get_word_seed_list(index, query, index_strand, hsp_param); /* g_message("have [%d] words", word_seed_list->len); */ /* Find geneseed HSPs */ HSP_Param_set_hsp_threshold(hsp_param, geneseed_threshold); HSP_Param_set_seed_repeat(hsp_param, geneseed_repeat); geneseed_hsp_list = Index_get_HSPsets_interval(index, hsp_param, query, revcomp_target, NULL, word_seed_list); HSP_Param_set_hsp_threshold(hsp_param, original_hsp_threshold); HSP_Param_set_seed_repeat(hsp_param, original_seed_repeat); /**/ if(!geneseed_hsp_list){ /* if no hsps */ g_array_free(word_seed_list, TRUE); return NULL; } /* g_message("start with [%d] geneseed hspsets", geneseed_hsp_list->len); */ index_geneseed_list = Index_Geneseed_List_create(geneseed_hsp_list, max_query_span, max_target_span); Index_HSPset_List_destroy(geneseed_hsp_list); /* Make geneseed list */ do { interval_list = Index_Geneseed_List_get_regions(index_geneseed_list); if(!interval_list) break; subseed_hsp_list = Index_Geneseed_List_get_subseeds(index, hsp_param, query, revcomp_target, interval_list, word_seed_list); g_array_free(interval_list, TRUE); if(!subseed_hsp_list) break; Index_Geneseed_refine_subseeds(index_geneseed_list, subseed_hsp_list); Index_HSPset_List_destroy(subseed_hsp_list); } while(TRUE); /* Collect kept hsps to return */ final_hsp_list = Index_Geneseed_collect_hsps(index_geneseed_list); Index_Geneseed_List_destroy(index_geneseed_list); g_array_free(word_seed_list, TRUE); return final_hsp_list; } /**/ static void Index_preload_index_Strand(Index *index, Index_Strand *index_strand){ if(index_strand->index_cache) return; Index_fseek(index->fp, index_strand->strand_offset, SEEK_SET); index_strand->index_cache = BitArray_read(index->fp, index_strand->header.total_index_length); return; } void Index_preload_index(Index *index){ g_message("Preloading index"); if(index->forward) Index_preload_index_Strand(index, index->forward); if(index->revcomp) Index_preload_index_Strand(index, index->revcomp); return; } /* seq vs index algorithm: * * for each word in query * append(data[word]->qpos_list, qpos) * for each neighbour of word * append(data[neighbour]->qpos_list, qpos) * for each word with a qpos_list * PQ_push(word->address_list, qpos_list) * while(PQ_pop) order: * seed each query_pos against all addresss in current target * if not at end of address_list * push back onto address_list * else * free address_list * if changing target * HSPset_seed_all_hsps(seed_list) * * Summary: word,qpos_list:PQ_push(address_list,qpos_list):pop(tid order) */ /* index vs index algorithm: * for each query word * if corresponding target word exists * PQ_push word pair * for each neighbour word * if corresponding target word exists * PQpush word_pair * while(PQpop) order: * collate all seeds of * if sequence pair * HSPset_seed_all_hsps(seed_list) * Summary: PQ_push(word_pairs),PQ_pop() */ /**/ /* FIXME: optimisation: rice coding ?? */ exonerate-2.4.0/src/database/fastadb.h0000644000175000017500000001165011216412403014535 00000000000000/****************************************************************\ * * * Library for manipulation of FASTA format databases * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_FASTADB_H #define INCLUDED_FASTADB_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include #include "compoundfile.h" #include "sequence.h" #include "argument.h" #include "sparsecache.h" typedef struct { gchar *suffix_filter; } FastaDB_ArgumentSet; FastaDB_ArgumentSet *FastaDB_ArgumentSet_create(Argument *arg); typedef enum { FastaDB_Mask_ID = (1<<1), FastaDB_Mask_DEF = (1<<2), FastaDB_Mask_SEQ = (1<<3), FastaDB_Mask_LEN = (1<<4), FastaDB_Mask_ALL = (~0) } FastaDB_Mask; typedef struct FastaDB { guint ref_count; Alphabet *alphabet; CompoundFile *cf; gchar *out_buffer; guint out_buffer_pos; guint out_buffer_alloc; gint line_length; } FastaDB; /* line_length is used for fasta file random-access * it is set to zero for irregular line lengths * is should not be used until the entire file has been parsed. */ typedef struct { guint ref_count; FastaDB *source; CompoundFile_Location *location; Sequence *seq; } FastaDB_Seq; typedef gboolean (*FastaDB_TraverseFunc)(FastaDB_Seq *fdbs, gpointer user_data); /* Return TRUE to stop the traversal */ FastaDB *FastaDB_open_list(GPtrArray *path_list, Alphabet *alphabet); FastaDB *FastaDB_open_list_with_limit(GPtrArray *path_list, Alphabet *alphabet, gint chunk_id, gint chunk_total); FastaDB *FastaDB_open(gchar *path, Alphabet *alphabet); FastaDB *FastaDB_share(FastaDB *fdb); FastaDB *FastaDB_dup(FastaDB *fdb); /* For use in a separate thread */ void FastaDB_close(FastaDB *fdb); void FastaDB_rewind(FastaDB *fdb); gboolean FastaDB_is_finished(FastaDB *fdb); void FastaDB_traverse(FastaDB *fdb, FastaDB_Mask mask, FastaDB_TraverseFunc fdtf, gpointer user_data); gsize FastaDB_memory_usage(FastaDB *fdb); FastaDB_Seq *FastaDB_next(FastaDB *fdb, FastaDB_Mask mask); CompoundFile_Pos FastaDB_find_next_start(FastaDB *fdb, CompoundFile_Pos pos); gboolean FastaDB_file_is_fasta(gchar *path); /* Returns true if first non-whitespace character in file is '>' */ typedef struct { FastaDB *source; CompoundFile_Location *location; Sequence_Strand strand; gint seq_offset; /* for random access */ gint length; /* for random access */ } FastaDB_Key; FastaDB_Seq *FastaDB_fetch(FastaDB *fdb, FastaDB_Mask mask, CompoundFile_Pos pos); FastaDB_Key *FastaDB_Key_create(FastaDB *source, CompoundFile_Location *location, Sequence_Strand strand, gint seq_offset, gint length); FastaDB_Key *FastaDB_Seq_get_key(FastaDB_Seq *fdbs); FastaDB_Seq *FastaDB_Key_get_seq(FastaDB_Key *fdbk, FastaDB_Mask mask); void FastaDB_Key_destroy(FastaDB_Key *fdbk); gchar *FastaDB_Key_get_def(FastaDB_Key *fdbk); SparseCache *FastaDB_Key_get_SparseCache(FastaDB_Key *fdbk); void FastaDB_SparseCache_compress(SparseCache_Page *page, gint len); FastaDB_Seq **FastaDB_all(gchar *path, Alphabet *alphabet, FastaDB_Mask mask, guint *total); FastaDB_Seq *FastaDB_Seq_share(FastaDB_Seq *fdbs); void FastaDB_Seq_destroy(FastaDB_Seq *fdbs); FastaDB_Seq *FastaDB_Seq_revcomp(FastaDB_Seq *fdbs); void FastaDB_Seq_all_destroy(FastaDB_Seq **fdbs); gint FastaDB_Seq_print(FastaDB_Seq *fdbs, FILE *fp, FastaDB_Mask mask); gint FastaDB_Seq_all_print(FastaDB_Seq **fdbs, FILE *fp, FastaDB_Mask mask); FastaDB_Seq *FastaDB_get_single(gchar *path, Alphabet *alphabet); Alphabet_Type FastaDB_guess_type(gchar *path); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_FASTADB_H */ exonerate-2.4.0/src/database/fastapipe.h0000644000175000017500000000621311162714300015105 00000000000000/****************************************************************\ * * * Library for All-vs-All Fasta Database Comparisons * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_FASTAPIPE_H #define INCLUDED_FASTAPIPE_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "fastadb.h" typedef gboolean (*FastaPipe_NextSeq_Func)(FastaDB_Seq *fdbs, gpointer user_data); typedef void (*FastaPipe_Boundary_Func)(gpointer user_data); typedef struct { FastaDB *query_db; FastaDB *target_db; FastaDB_Mask mask; FastaPipe_Boundary_Func init_func; /* pre_query_load */ FastaPipe_Boundary_Func prep_func; /* post_query_load */ FastaPipe_Boundary_Func term_func; /* post_target_pass */ FastaPipe_NextSeq_Func query_func; FastaPipe_NextSeq_Func target_func; gboolean is_full; gboolean revcomp_query; gboolean revcomp_target; FastaDB_Seq *prev_query; FastaDB_Seq *prev_target; } FastaPipe; FastaPipe *FastaPipe_create(FastaDB *query_fdb, FastaDB *target_fdb, FastaPipe_Boundary_Func init_func, FastaPipe_Boundary_Func prep_func, FastaPipe_Boundary_Func term_func, FastaPipe_NextSeq_Func query_func, FastaPipe_NextSeq_Func target_func, FastaDB_Mask mask, gboolean translate_both, gboolean use_revcomp); /* query_func can return TRUE to finish loading current pipeline * target_func can return TRUE to finish processing of the pipeline */ /* init_func is called before query pipeline loading * prep_func is called at the end of query pipeline loading * term_func is called at the end of query pipeline analysis */ void FastaPipe_destroy(FastaPipe *fasta_pipe); gboolean FastaPipe_process(FastaPipe *fasta_pipe, gpointer user_data); /* FastaPipe_process() * will call query_func until end of db, or TRUE is returned, * then calls target_func until end of db, or TRUE is returned. * If not at end of target_db, will rewind query_func and continue. * Returns TRUE when true is returned from the target_func */ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_FASTAPIPE_H */ exonerate-2.4.0/src/database/dataset.h0000644000175000017500000000661111220115064014555 00000000000000/****************************************************************\ * * * Library for manipulation of exonerate dataset files * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_DATASET_H #define INCLUDED_DATASET_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include #ifdef USE_PTHREADS #include #endif /* USE_PTHREADS */ #include "alphabet.h" #include "sequence.h" #include "fastadb.h" /* File format: Header path_data: for each file path def\n seq_data: for each seq (in alphabetical order of id) id\n seq_info: for each seq database offset length gcg_checksum <14> */ typedef struct { guint64 magic; guint64 version; guint64 type; guint64 line_length; /**/ guint64 number_of_dbs; guint64 max_db_len; guint64 total_db_len; /**/ guint64 number_of_seqs; guint64 max_seq_len; guint64 total_seq_len; /**/ guint64 path_data_offset; guint64 seq_data_offset; guint64 seq_info_offset; guint64 total_file_length; } Dataset_Header; typedef struct { gint num_db_width; gint max_db_len_width; gint max_seq_len_width; gsize seq_data_item_size; } Dataset_Width; typedef struct { FastaDB_Key *key; guint64 gcg_checksum; gchar *id; gchar *def; gint pos; Sequence *cache_seq; } Dataset_Sequence; typedef struct { guint ref_count; Alphabet *alphabet; Dataset_Header *header; Dataset_Width *width; GPtrArray *seq_list; /* containing Dataset_Sequence objects */ FastaDB *fdb; #ifdef USE_PTHREADS pthread_mutex_t dataset_mutex; #endif /* USE_PTHREADS */ } Dataset; Dataset *Dataset_create(GPtrArray *path_list, Alphabet_Type alphabet_type, gboolean softmask_input); Dataset *Dataset_share(Dataset *dataset); void Dataset_destroy(Dataset *dataset); void Dataset_info(Dataset *dataset); gsize Dataset_memory_usage(Dataset *dataset); gboolean Dataset_check_filetype(gchar *path); /* Returns TRUE when magic number is correct for this filetype */ void Dataset_preload_sequences(Dataset *dataset); void Dataset_write(Dataset *dataset, gchar *path); Dataset *Dataset_read(gchar *path); void Dataset_preload_seqs(Dataset *dataset); gint Dataset_lookup_id(Dataset *dataset, gchar *id); Sequence *Dataset_get_sequence(Dataset *dataset, gint dataset_pos); /* Sequence returned will be Sequence_Type_EXTMEM */ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_DATASET_H */ exonerate-2.4.0/src/database/index.h0000644000175000017500000001121711220147444014244 00000000000000/****************************************************************\ * * * Library for manipulation of exonerate index files * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_INDEX_H #define INCLUDED_INDEX_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include #include #include #ifdef USE_PTHREADS #include #endif /* USE_PTHREADS */ #include "dataset.h" #include "vfsm.h" #include "hspset.h" #include "bitarray.h" /* File format: Header dataset path\n FW Strand RV Strand (if translated) Strand: Strand header WordList: For each word word_id freq_count index_offset Index: sequence pos (from dataset) */ typedef struct { guint64 magic; guint64 version; guint64 type; /* plain | trans */ guint64 dataset_path_len; /**/ guint64 word_length; guint64 word_jump; guint64 word_ambiguity; guint64 saturate_threshold; } Index_Header; typedef struct { gint max_word_width; /* From vfsm->lrw : MW */ gint number_of_seqs_width; /* From dataset->header->number_of_seqs : NS */ } Index_Width; typedef struct { gint sequence_id; gint position; } Index_Address; typedef struct { gint freq_count; gint64 index_offset; } Index_Word; typedef struct { guint64 max_index_length; /* Filled by Index_survey_word_list() */ guint64 word_list_length; /* Filled by Index_survey_word_list() */ guint64 total_index_length; /* Filled by Index_find_offsets() */ } Index_Strand_Header; typedef struct { gint max_index_len_width; /* From max_index_length : MI */ gint total_index_len_width; /* From total_index_length : TI */ } Index_Strand_Width; typedef struct { Index_Strand_Header header; Index_Strand_Width width; /**/ gint *word_table; /* VFSM array */ Index_Word *word_list; off_t strand_offset; /* Offset to strand header */ BitArray *index_cache; } Index_Strand; typedef struct { guint ref_count; FILE *fp; gchar *dataset_path; Dataset *dataset; Index_Header *header; VFSM *vfsm; Index_Width *width; /**/ Index_Strand *forward; Index_Strand *revcomp; /* Only used when index is translated */ #ifdef USE_PTHREADS pthread_mutex_t index_mutex; #endif /* USE_PTHREADS */ } Index; /**/ Index *Index_create(Dataset *dataset, gboolean is_translated, gint word_length, gint word_jump, gint word_ambiguity, gint saturate_threshold, gchar *index_path, gchar *dataset_path, gint memory_limit); Index *Index_share(Index *index); void Index_destroy(Index *index); void Index_info(Index *index); Index *Index_open(gchar *path); guint64 Index_memory_usage(Index *index); void Index_preload_index(Index *index); gboolean Index_check_filetype(gchar *path); /* Returns TRUE when magic number is correct for this filetype */ typedef struct { HSPset *hsp_set; gint target_id; } Index_HSPset; void Index_HSPset_destroy(Index_HSPset *index_hsp_set); GPtrArray *Index_get_HSPsets(Index *index, HSP_Param *hsp_param, Sequence *query, gboolean revcomp_target); /* Returns a GPtrArray containing Index_HSPset structs */ GPtrArray *Index_get_HSPsets_geneseed(Index *index, HSP_Param *hsp_param, Sequence *query, gboolean revcomp_target, gint geneseed_threshold, gint geneseed_repeat, gint max_query_span, gint max_target_span); /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_INDEX_H */ exonerate-2.4.0/src/c4/0000777000175000017500000000000011222703703011606 500000000000000exonerate-2.4.0/src/c4/Makefile.am0000644000175000017500000000700611222243241013555 00000000000000 TESTS = region.test c4.test alignment.test codegen.test optimal.test \ layout.test viterbi.test opair.test subopt.test cgutil.test noinst_PROGRAMS = $(TESTS) INCLUDES = -I$(top_srcdir)/src/struct \ -I$(top_srcdir)/src/comparison \ -I$(top_srcdir)/src/sequence \ -I$(top_srcdir)/src/general \ -DSOURCE_ROOT_DIR="\"@source_root_dir@\"" \ -DGLIB_CFLAGS="\"@glib_cflags@\"" \ -DHOSTTYPE="\"@host@\"" noinst_HEADERS = region.h c4.h alignment.h codegen.h optimal.h \ layout.h viterbi.h opair.h subopt.h \ cgutil.h region_test_SOURCES = region.test.c region.c c4_test_SOURCES = c4.test.c c4.c region.c c4_test_LDADD = $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/general/threadref.o SEQUENCE_OBJ = $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ -lm ALIGNMENT_OBJ = $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/general/threadref.o \ $(SEQUENCE_OBJ) alignment_test_SOURCES = alignment.test.c alignment.c c4.c region.c alignment_test_LDADD = $(ALIGNMENT_OBJ) codegen_test_SOURCES = codegen.test.c codegen.c codegen_test_LDADD = $(top_srcdir)/src/general/argument.o cgutil_test_SOURCES = cgutil.test.c cgutil.c codegen.c c4.c region.c cgutil_test_LDADD = $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/general/threadref.o viterbi_test_SOURCES = viterbi.test.c viterbi.c c4.c region.c \ alignment.c codegen.c layout.c subopt.c \ cgutil.c viterbi_test_LDADD = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(ALIGNMENT_OBJ) optimal_test_SOURCES = optimal.test.c optimal.c c4.c alignment.c \ codegen.c region.c layout.c viterbi.c \ subopt.c cgutil.c optimal_test_LDADD = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(ALIGNMENT_OBJ) layout_test_SOURCES = layout.test.c layout.c layout_test_LDADD = $(top_srcdir)/src/struct/matrix.o opair_test_SOURCES = opair.test.c opair.c optimal.c c4.c alignment.c \ codegen.c region.c layout.c viterbi.c subopt.c \ cgutil.c opair_test_LDADD = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(ALIGNMENT_OBJ) subopt_test_SOURCES = subopt.test.c subopt.c region.c subopt_test_LDADD = $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o # Files to clear away MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/src/c4/Makefile.in0000644000175000017500000004106711222703647013606 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ TESTS = region.test c4.test alignment.test codegen.test optimal.test layout.test viterbi.test opair.test subopt.test cgutil.test noinst_PROGRAMS = $(TESTS) INCLUDES = -I$(top_srcdir)/src/struct -I$(top_srcdir)/src/comparison -I$(top_srcdir)/src/sequence -I$(top_srcdir)/src/general -DSOURCE_ROOT_DIR="\"@source_root_dir@\"" -DGLIB_CFLAGS="\"@glib_cflags@\"" -DHOSTTYPE="\"@host@\"" noinst_HEADERS = region.h c4.h alignment.h codegen.h optimal.h layout.h viterbi.h opair.h subopt.h cgutil.h region_test_SOURCES = region.test.c region.c c4_test_SOURCES = c4.test.c c4.c region.c c4_test_LDADD = $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/general/threadref.o SEQUENCE_OBJ = $(top_srcdir)/src/struct/sparsecache.o $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o $(top_srcdir)/src/sequence/alphabet.o $(top_srcdir)/src/sequence/splice.o $(top_srcdir)/src/sequence/translate.o $(top_srcdir)/src/general/argument.o $(top_srcdir)/src/general/lineparse.o -lm ALIGNMENT_OBJ = $(top_srcdir)/src/comparison/match.o $(top_srcdir)/src/sequence/codonsubmat.o $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/general/threadref.o $(SEQUENCE_OBJ) alignment_test_SOURCES = alignment.test.c alignment.c c4.c region.c alignment_test_LDADD = $(ALIGNMENT_OBJ) codegen_test_SOURCES = codegen.test.c codegen.c codegen_test_LDADD = $(top_srcdir)/src/general/argument.o cgutil_test_SOURCES = cgutil.test.c cgutil.c codegen.c c4.c region.c cgutil_test_LDADD = $(top_srcdir)/src/general/argument.o $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/general/threadref.o viterbi_test_SOURCES = viterbi.test.c viterbi.c c4.c region.c alignment.c codegen.c layout.c subopt.c cgutil.c viterbi_test_LDADD = $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/rangetree.o $(ALIGNMENT_OBJ) optimal_test_SOURCES = optimal.test.c optimal.c c4.c alignment.c codegen.c region.c layout.c viterbi.c subopt.c cgutil.c optimal_test_LDADD = $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/rangetree.o $(ALIGNMENT_OBJ) layout_test_SOURCES = layout.test.c layout.c layout_test_LDADD = $(top_srcdir)/src/struct/matrix.o opair_test_SOURCES = opair.test.c opair.c optimal.c c4.c alignment.c codegen.c region.c layout.c viterbi.c subopt.c cgutil.c opair_test_LDADD = $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/rangetree.o $(ALIGNMENT_OBJ) subopt_test_SOURCES = subopt.test.c subopt.c region.c subopt_test_LDADD = $(top_srcdir)/src/struct/rangetree.o $(top_srcdir)/src/struct/recyclebin.o # Files to clear away MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = PROGRAMS = $(noinst_PROGRAMS) DEFS = @DEFS@ -I. -I$(srcdir) CPPFLAGS = @CPPFLAGS@ LDFLAGS = @LDFLAGS@ LIBS = @LIBS@ region_test_OBJECTS = region.test.o region.o region_test_LDADD = $(LDADD) region_test_DEPENDENCIES = region_test_LDFLAGS = c4_test_OBJECTS = c4.test.o c4.o region.o c4_test_DEPENDENCIES = $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/general/threadref.o c4_test_LDFLAGS = alignment_test_OBJECTS = alignment.test.o alignment.o c4.o region.o alignment_test_DEPENDENCIES = $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/general/threadref.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o alignment_test_LDFLAGS = codegen_test_OBJECTS = codegen.test.o codegen.o codegen_test_DEPENDENCIES = $(top_srcdir)/src/general/argument.o codegen_test_LDFLAGS = optimal_test_OBJECTS = optimal.test.o optimal.o c4.o alignment.o \ codegen.o region.o layout.o viterbi.o subopt.o cgutil.o optimal_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/general/threadref.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o optimal_test_LDFLAGS = layout_test_OBJECTS = layout.test.o layout.o layout_test_DEPENDENCIES = $(top_srcdir)/src/struct/matrix.o layout_test_LDFLAGS = viterbi_test_OBJECTS = viterbi.test.o viterbi.o c4.o region.o \ alignment.o codegen.o layout.o subopt.o cgutil.o viterbi_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/general/threadref.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o viterbi_test_LDFLAGS = opair_test_OBJECTS = opair.test.o opair.o optimal.o c4.o alignment.o \ codegen.o region.o layout.o viterbi.o subopt.o cgutil.o opair_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/general/threadref.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o opair_test_LDFLAGS = subopt_test_OBJECTS = subopt.test.o subopt.o region.o subopt_test_DEPENDENCIES = $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o subopt_test_LDFLAGS = cgutil_test_OBJECTS = cgutil.test.o cgutil.o codegen.o c4.o region.o cgutil_test_DEPENDENCIES = $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/general/threadref.o cgutil_test_LDFLAGS = CFLAGS = @CFLAGS@ COMPILE = $(CC) $(DEFS) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) CCLD = $(CC) LINK = $(CCLD) $(AM_CFLAGS) $(CFLAGS) $(LDFLAGS) -o $@ HEADERS = $(noinst_HEADERS) DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best SOURCES = $(region_test_SOURCES) $(c4_test_SOURCES) $(alignment_test_SOURCES) $(codegen_test_SOURCES) $(optimal_test_SOURCES) $(layout_test_SOURCES) $(viterbi_test_SOURCES) $(opair_test_SOURCES) $(subopt_test_SOURCES) $(cgutil_test_SOURCES) OBJECTS = $(region_test_OBJECTS) $(c4_test_OBJECTS) $(alignment_test_OBJECTS) $(codegen_test_OBJECTS) $(optimal_test_OBJECTS) $(layout_test_OBJECTS) $(viterbi_test_OBJECTS) $(opair_test_OBJECTS) $(subopt_test_OBJECTS) $(cgutil_test_OBJECTS) all: all-redirect .SUFFIXES: .SUFFIXES: .S .c .o .s $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps src/c4/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status mostlyclean-noinstPROGRAMS: clean-noinstPROGRAMS: -test -z "$(noinst_PROGRAMS)" || rm -f $(noinst_PROGRAMS) distclean-noinstPROGRAMS: maintainer-clean-noinstPROGRAMS: .c.o: $(COMPILE) -c $< .s.o: $(COMPILE) -c $< .S.o: $(COMPILE) -c $< mostlyclean-compile: -rm -f *.o core *.core clean-compile: distclean-compile: -rm -f *.tab.c maintainer-clean-compile: region.test: $(region_test_OBJECTS) $(region_test_DEPENDENCIES) @rm -f region.test $(LINK) $(region_test_LDFLAGS) $(region_test_OBJECTS) $(region_test_LDADD) $(LIBS) c4.test: $(c4_test_OBJECTS) $(c4_test_DEPENDENCIES) @rm -f c4.test $(LINK) $(c4_test_LDFLAGS) $(c4_test_OBJECTS) $(c4_test_LDADD) $(LIBS) alignment.test: $(alignment_test_OBJECTS) $(alignment_test_DEPENDENCIES) @rm -f alignment.test $(LINK) $(alignment_test_LDFLAGS) $(alignment_test_OBJECTS) $(alignment_test_LDADD) $(LIBS) codegen.test: $(codegen_test_OBJECTS) $(codegen_test_DEPENDENCIES) @rm -f codegen.test $(LINK) $(codegen_test_LDFLAGS) $(codegen_test_OBJECTS) $(codegen_test_LDADD) $(LIBS) optimal.test: $(optimal_test_OBJECTS) $(optimal_test_DEPENDENCIES) @rm -f optimal.test $(LINK) $(optimal_test_LDFLAGS) $(optimal_test_OBJECTS) $(optimal_test_LDADD) $(LIBS) layout.test: $(layout_test_OBJECTS) $(layout_test_DEPENDENCIES) @rm -f layout.test $(LINK) $(layout_test_LDFLAGS) $(layout_test_OBJECTS) $(layout_test_LDADD) $(LIBS) viterbi.test: $(viterbi_test_OBJECTS) $(viterbi_test_DEPENDENCIES) @rm -f viterbi.test $(LINK) $(viterbi_test_LDFLAGS) $(viterbi_test_OBJECTS) $(viterbi_test_LDADD) $(LIBS) opair.test: $(opair_test_OBJECTS) $(opair_test_DEPENDENCIES) @rm -f opair.test $(LINK) $(opair_test_LDFLAGS) $(opair_test_OBJECTS) $(opair_test_LDADD) $(LIBS) subopt.test: $(subopt_test_OBJECTS) $(subopt_test_DEPENDENCIES) @rm -f subopt.test $(LINK) $(subopt_test_LDFLAGS) $(subopt_test_OBJECTS) $(subopt_test_LDADD) $(LIBS) cgutil.test: $(cgutil_test_OBJECTS) $(cgutil_test_DEPENDENCIES) @rm -f cgutil.test $(LINK) $(cgutil_test_LDFLAGS) $(cgutil_test_OBJECTS) $(cgutil_test_LDADD) $(LIBS) tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = src/c4 distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done check-TESTS: $(TESTS) @failed=0; all=0; \ srcdir=$(srcdir); export srcdir; \ for tst in $(TESTS); do \ if test -f $$tst; then dir=.; \ else dir="$(srcdir)"; fi; \ if $(TESTS_ENVIRONMENT) $$dir/$$tst; then \ all=`expr $$all + 1`; \ echo "PASS: $$tst"; \ elif test $$? -ne 77; then \ all=`expr $$all + 1`; \ failed=`expr $$failed + 1`; \ echo "FAIL: $$tst"; \ fi; \ done; \ if test "$$failed" -eq 0; then \ banner="All $$all tests passed"; \ else \ banner="$$failed of $$all tests failed"; \ fi; \ dashes=`echo "$$banner" | sed s/./=/g`; \ echo "$$dashes"; \ echo "$$banner"; \ echo "$$dashes"; \ test "$$failed" -eq 0 info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am $(MAKE) $(AM_MAKEFLAGS) check-TESTS check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall: uninstall-am all-am: Makefile $(PROGRAMS) $(HEADERS) all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-noinstPROGRAMS mostlyclean-compile \ mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-noinstPROGRAMS clean-compile clean-tags clean-generic \ mostlyclean-am clean: clean-am distclean-am: distclean-noinstPROGRAMS distclean-compile distclean-tags \ distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-noinstPROGRAMS \ maintainer-clean-compile maintainer-clean-tags \ maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: mostlyclean-noinstPROGRAMS distclean-noinstPROGRAMS \ clean-noinstPROGRAMS maintainer-clean-noinstPROGRAMS \ mostlyclean-compile distclean-compile clean-compile \ maintainer-clean-compile tags mostlyclean-tags distclean-tags \ clean-tags maintainer-clean-tags distdir check-TESTS info-am info \ dvi-am dvi check check-am installcheck-am installcheck install-exec-am \ install-exec install-data-am install-data install-am install \ uninstall-am uninstall all-redirect all-am all installdirs \ mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/src/c4/region.test.c0000644000175000017500000000211311162714266014135 00000000000000/****************************************************************\ * * * C4 dynamic programming library - code for regions * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "c4.h" int main(void){ register Region *region = Region_create(0, 0, 10, 10); Region_destroy(region); return 0; } exonerate-2.4.0/src/c4/region.c0000644000175000017500000001032511162714266013163 00000000000000/****************************************************************\ * * * C4 dynamic programming library - code for regions * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "region.h" gboolean Region_is_valid(Region *region){ g_assert(region); g_assert(region->query_start >= 0); g_assert(region->target_start >= 0); g_assert(region->query_length >= 0); g_assert(region->target_length >= 0); return TRUE; } Region *Region_create_blank(void){ register Region *region = g_new0(Region, 1); region->ref_count = 1; return region; } Region *Region_create(gint query_start, gint target_start, gint query_length, gint target_length){ register Region *region = g_new(Region, 1); region->ref_count = 1; region->query_start = query_start; region->target_start = target_start; region->query_length = query_length; region->target_length = target_length; g_assert(Region_is_valid(region)); return region; } void Region_destroy(Region *region){ g_assert(region); g_assert(region->ref_count != -1); /* Check not static */ if(--region->ref_count) return; g_free(region); return; } Region *Region_share(Region *region){ g_assert(region); g_assert(region->ref_count != -1); /* Check not static */ region->ref_count++; return region; } Region *Region_copy(Region *region){ g_assert(region); return Region_create(region->query_start, region->target_start, region->query_length, region->target_length); } void Region_set_static(Region *region){ region->ref_count = -1; return; } void Region_init_static(Region *region, gint query_start, gint target_start, gint query_length, gint target_length){ region->ref_count = -1; region->query_start = query_start; region->target_start = target_start; region->query_length = query_length; region->target_length = target_length; g_assert(Region_is_valid(region)); return; } gboolean Region_is_within(Region *outer, Region *inner){ g_assert(outer); g_assert(inner); if(outer->query_start > inner->query_start) return FALSE; if(outer->target_start > inner->target_start) return FALSE; if(Region_query_end(outer) < Region_query_end(inner)) return FALSE; if(Region_target_end(outer) < Region_target_end(inner)) return FALSE; return TRUE; } gboolean Region_is_same(Region *region_a, Region *region_b){ g_assert(region_a); g_assert(region_b); if(region_a->query_start != region_b->query_start) return FALSE; if(region_a->target_start != region_b->target_start) return FALSE; if(Region_query_end(region_a) != Region_query_end(region_b)) return FALSE; if(Region_target_end(region_a) != Region_target_end(region_b)) return FALSE; return TRUE; } void Region_print(Region *region, gchar *name){ g_print("Region [%s] q[%d(%d)%d] t[%d(%d)%d]\n", name, region->query_start, region->query_length, Region_query_end(region), region->target_start, region->target_length, Region_target_end(region)); /* FIXME: temp */ g_print("draw_region(%d, %d, %d, %d, \"%s\")\n", region->query_start, region->target_start, region->query_length, region->target_length, name); return; } exonerate-2.4.0/src/c4/c4.test.c0000644000175000017500000000223011162714263013155 00000000000000/****************************************************************\ * * * C4 dynamic programming library - code for models * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "c4.h" int main(void){ register C4_Model *model = C4_Model_create("test model"); C4_Model_destroy(model); return 0; } /* Testing for the main c4 module is done mostly by the model tests */ exonerate-2.4.0/src/c4/c4.c0000644000175000017500000025325111222253541012204 00000000000000/****************************************************************\ * * * C4 dynamic programming library - code for models * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "c4.h" #include "matrix.h" #include /* For isalnum() */ #include /* For strlen() */ /**/ static C4_State *C4_State_create(gchar *name){ register C4_State *state = g_new(C4_State, 1); g_assert(name); state->name = g_strdup(name); state->input_transition_list = g_ptr_array_new(); state->output_transition_list = g_ptr_array_new(); state->src_shadow_list = g_ptr_array_new(); return state; } static void C4_State_destroy(C4_State *state){ g_assert(state); g_ptr_array_free(state->input_transition_list, TRUE); g_ptr_array_free(state->output_transition_list, TRUE); g_ptr_array_free(state->src_shadow_list, TRUE); g_free(state->name); g_free(state); return; } /**/ static C4_Calc *C4_Calc_create(gchar *name, C4_Score max_score, C4_CalcFunc calc_func, gchar *calc_macro, C4_PrepFunc init_func, gchar *init_macro, C4_PrepFunc exit_func, gchar *exit_macro, C4_Protect protect){ register C4_Calc *calc = g_new(C4_Calc, 1); g_assert(name); g_assert(calc_macro?(calc_func?TRUE:FALSE):TRUE); g_assert(init_macro?(init_func?TRUE:FALSE):TRUE); g_assert(exit_macro?(exit_func?TRUE:FALSE):TRUE); if(calc_func && (!calc_macro)) g_warning("Missing calc_macro for Calc: [%s]", name); if(init_func && (!init_macro)) g_warning("Missing init_macro for Calc: [%s]", name); if(exit_func && (!exit_macro)) g_warning("Missing exit_macro for Calc: [%s]", name); calc->name = g_strdup(name); calc->max_score = max_score; calc->calc_func = calc_func; calc->calc_macro = g_strdup(calc_macro); calc->init_func = init_func; calc->init_macro = g_strdup(init_macro); calc->exit_func = exit_func; calc->exit_macro = g_strdup(exit_macro); calc->protect = protect; return calc; } static void C4_Calc_destroy(C4_Calc *calc){ g_assert(calc); if(calc->calc_macro) g_free(calc->calc_macro); if(calc->init_macro) g_free(calc->init_macro); if(calc->exit_macro) g_free(calc->exit_macro); g_free(calc->name); g_free(calc); return; } static gboolean C4_Calc_diff(C4_Calc *a, C4_Calc *b){ if((a->max_score == b->max_score) && (a->calc_func == b->calc_func) && (a->init_func == b->init_func) && (a->exit_func == b->exit_func) && (a->protect == b->protect)) return FALSE; return TRUE; } /**/ static C4_StartState* C4_StartState_create(C4_State *state, C4_Scope scope, C4_CellStartFunc cell_start_func, gchar *cell_start_macro){ register C4_StartState *start_state = g_new(C4_StartState, 1); g_assert(state); start_state->state = state; start_state->scope = scope; start_state->cell_start_func = cell_start_func; if(cell_start_macro) start_state->cell_start_macro = g_strdup(cell_start_macro); else start_state->cell_start_macro = NULL; return start_state; } static void C4_StartState_destroy(C4_StartState *start_state){ g_assert(start_state); if(start_state->cell_start_macro) g_free(start_state->cell_start_macro); g_free(start_state); return; } /**/ static C4_EndState* C4_EndState_create(C4_State *state, C4_Scope scope, C4_CellEndFunc cell_end_func, gchar *cell_end_macro){ register C4_EndState *end_state = g_new(C4_EndState, 1); g_assert(state); end_state->state = state; end_state->scope = scope; end_state->cell_end_func = cell_end_func; if(cell_end_macro) end_state->cell_end_macro = g_strdup(cell_end_macro); else end_state->cell_end_macro = NULL; return end_state; } static void C4_EndState_destroy(C4_EndState *end_state){ g_assert(end_state); if(end_state->cell_end_macro) g_free(end_state->cell_end_macro); g_free(end_state); return; } /**/ static C4_Transition *C4_Transition_create(gchar *name, C4_State *input, C4_State *output, gint advance_query, gint advance_target, C4_Calc *calc, C4_Label label, gpointer label_data){ register C4_Transition *transition = g_new(C4_Transition, 1); g_assert(name); transition->name = g_strdup(name); transition->input = input; transition->output = output; transition->advance_query = advance_query; transition->advance_target = advance_target; transition->calc = calc; transition->label = label; transition->label_data = label_data; transition->dst_shadow_list = g_ptr_array_new(); return transition; } static void C4_Transition_destroy(C4_Transition *transition){ g_assert(transition); g_ptr_array_free(transition->dst_shadow_list, TRUE); g_free(transition->name); g_free(transition); return; } /**/ static C4_Shadow *C4_Shadow_create(gchar *name, C4_State *src, C4_Transition *dst, C4_StartFunc start_func, gchar *start_macro, C4_EndFunc end_func, gchar *end_macro){ register C4_Shadow *shadow = g_new0(C4_Shadow, 1); g_assert(name); g_assert(start_macro?(start_func?TRUE:FALSE):TRUE); g_assert(end_macro?(end_func?TRUE:FALSE):TRUE); if(start_func && (!start_macro)) g_warning("Missing start_macro for Shadow: [%s]", name); if(end_func && (!end_macro)) g_warning("Missing end_macro for Shadow: [%s]", name); shadow->name = g_strdup(name); g_assert(src); g_assert(dst); shadow->src_state_list = g_ptr_array_new(); shadow->dst_transition_list = g_ptr_array_new(); g_ptr_array_add(shadow->src_state_list, src); g_ptr_array_add(shadow->dst_transition_list, dst); shadow->start_func = start_func; if(start_macro) shadow->start_macro = g_strdup(start_macro); shadow->end_func = end_func; if(end_macro) shadow->end_macro = g_strdup(end_macro); g_ptr_array_add(src->src_shadow_list, shadow); g_ptr_array_add(dst->dst_shadow_list, shadow); return shadow; } static void C4_Shadow_destroy(C4_Shadow *shadow){ g_ptr_array_free(shadow->src_state_list, TRUE); g_ptr_array_free(shadow->dst_transition_list, TRUE); g_free(shadow->name); if(shadow->start_macro) g_free(shadow->start_macro); if(shadow->end_macro) g_free(shadow->end_macro); g_free(shadow); return; } void C4_Shadow_add_src_state(C4_Shadow *shadow, C4_State *src){ g_assert(shadow); g_assert(src); g_ptr_array_add(shadow->src_state_list, src); g_ptr_array_add(src->src_shadow_list, shadow); return; } void C4_Shadow_add_dst_transition(C4_Shadow *shadow, C4_Transition *dst){ g_assert(shadow); g_assert(dst); g_ptr_array_add(shadow->dst_transition_list, dst); g_ptr_array_add(dst->dst_shadow_list, shadow); return; } static gboolean C4_Shadow_is_valid(C4_Shadow *shadow, C4_Model *model){ register gint i, j; register C4_State *state; register C4_Transition *transition; g_assert(shadow); g_assert(shadow->src_state_list->len); g_assert(shadow->dst_transition_list->len); for(i = 0; i < shadow->src_state_list->len; i++){ state = shadow->src_state_list->pdata[i]; g_assert(state); for(j = 0; j < shadow->dst_transition_list->len; j++){ transition = shadow->dst_transition_list->pdata[j]; g_assert(transition); /* FIXME: this is not necessarily true for derived models */ /* g_assert(C4_Model_path_is_possible(model, state, transition->input)); */ } } return TRUE; } /**/ static C4_Portal *C4_Portal_create(gchar *name, C4_Calc *calc, gint advance_query, gint advance_target){ register C4_Portal *portal = g_new(C4_Portal, 1); g_assert(name); g_assert(calc); portal->name = g_strdup(name); portal->calc = calc; portal->advance_query = advance_query; portal->advance_target = advance_target; portal->transition_list = g_ptr_array_new(); return portal; } static void C4_Portal_destroy(C4_Portal *portal){ g_free(portal->name); g_ptr_array_free(portal->transition_list, TRUE); g_free(portal); return; } /**/ static void C4_Span_find_loop_transitions(C4_Span *span){ register gint i; register C4_Transition *transition; span->query_loop = span->target_loop = NULL; for(i = 0; i < span->span_state->output_transition_list->len; i++){ transition = span->span_state->output_transition_list->pdata[i]; if(transition->output == span->span_state){ /* looping */ /* Find a C4_Calc which is sequence position independent */ if((!transition->calc) || (!transition->calc->calc_func)){ g_assert((!transition->calc) || (!transition->calc->max_score)); g_assert(transition->advance_query || transition->advance_target); g_assert(!(transition->advance_query && transition->advance_target)); if(transition->advance_query){ g_assert(!span->query_loop); g_assert(!transition->advance_target); span->query_loop = transition; } else { g_assert(!span->target_loop); g_assert(transition->advance_target); g_assert(!transition->advance_query); span->target_loop = transition; } } } } g_assert(span->query_loop || span->target_loop); return; } static C4_Span *C4_Span_create(gchar *name, C4_State *span_state, gint min_query, gint max_query, gint min_target, gint max_target){ register C4_Span *span = g_new(C4_Span, 1); g_assert(name); g_assert(span_state); g_assert(min_query >= 0); g_assert(min_query <= max_query); g_assert(min_target >= 0); g_assert(min_target <= max_target); span->name = g_strdup(name); span->span_state = span_state; span->min_query = min_query; span->max_query = max_query; span->min_target = min_target; span->max_target = max_target; C4_Span_find_loop_transitions(span); return span; } static void C4_Span_destroy(C4_Span *span){ g_assert(span); g_free(span->name); g_free(span); return; } /**/ C4_Calc *C4_Model_add_calc(C4_Model *model, gchar *name, C4_Score max_score, C4_CalcFunc calc_func, gchar *calc_macro, C4_PrepFunc init_func, gchar *init_macro, C4_PrepFunc exit_func, gchar *exit_macro, C4_Protect protect){ register C4_Calc *calc = C4_Calc_create(name, max_score, calc_func, calc_macro, init_func, init_macro, exit_func, exit_macro, protect); g_assert(model); g_assert(name); g_assert(model->is_open); g_ptr_array_add(model->calc_list, calc); return calc; } static C4_Calc *C4_Model_match_calc(C4_Model *model, C4_Calc *calc){ register C4_Calc *mcalc = NULL; register gint i; g_assert(model); g_assert(calc); for(i = 0; i < model->calc_list->len; i++){ mcalc = model->calc_list->pdata[i]; if(!C4_Calc_diff(mcalc, calc)) return mcalc; } return NULL; } C4_State *C4_Model_add_state(C4_Model *model, gchar *name){ register C4_State *state = C4_State_create(name); g_assert(model); g_assert(name); g_assert(model->is_open); g_ptr_array_add(model->state_list, state); return state; } GPtrArray *C4_Model_select_transitions(C4_Model *model, C4_Label label){ register gint i; register GPtrArray *list = g_ptr_array_new(); register C4_Transition *transition; for(i = 0; i < model->transition_list->len; i++){ transition = model->transition_list->pdata[i]; if(transition->label == label) g_ptr_array_add(list, transition); } if(!list->len){ g_ptr_array_free(list, TRUE); return NULL; } return list; } C4_Transition *C4_Model_select_single_transition(C4_Model *model, C4_Label label){ register C4_Transition *transition = NULL; register GPtrArray *transition_list = C4_Model_select_transitions(model, label); g_assert(transition_list); g_assert(transition_list->len == 1); if(transition_list){ if(transition_list->len == 1) transition = transition_list->pdata[0]; g_ptr_array_free(transition_list, TRUE); } return transition; } /**/ C4_Transition *C4_Model_add_transition(C4_Model *model, gchar *name, C4_State *input, C4_State *output, gint advance_query, gint advance_target, C4_Calc *calc, C4_Label label, gpointer label_data){ register C4_Transition *transition; g_assert(model); g_assert(name); g_assert(model->is_open); g_assert(advance_query >= 0); g_assert(advance_target >= 0); g_assert( (label == C4_Label_NONE) || (advance_query || advance_target)); g_assert( (label != C4_Label_MATCH) || (advance_query && advance_target)); g_assert( (label != C4_Label_GAP) || (advance_query || advance_target)); g_assert( (label != C4_Label_FRAMESHIFT) || (advance_query || advance_target)); if(!input) input = model->start_state->state; if(!output) output = model->end_state->state; transition = C4_Transition_create(name, input, output, advance_query, advance_target, calc, label, label_data); g_ptr_array_add(input->output_transition_list, transition); g_ptr_array_add(output->input_transition_list, transition); g_ptr_array_add(model->transition_list, transition); return transition; } C4_Shadow *C4_Model_add_shadow(C4_Model *model, gchar *name, C4_State *src, C4_Transition *dst, C4_StartFunc start_func, gchar *start_macro, C4_EndFunc end_func, gchar *end_macro){ register C4_Shadow *shadow; register gint i; g_assert(model); g_assert(name); g_assert(model->is_open); g_assert(start_func); g_assert(end_func); if(!src) src = model->start_state->state; if(dst){ shadow = C4_Shadow_create(name, src, dst, start_func, start_macro, end_func, end_macro); } else { /* Use all transitions to END as dst */ g_assert(model->end_state->state->input_transition_list->len); dst = model->end_state->state->input_transition_list->pdata[0]; shadow = C4_Shadow_create(name, src, dst, start_func, start_macro, end_func, end_macro); for(i = 1; i < model->end_state->state ->input_transition_list->len; i++){ dst = model->end_state->state ->input_transition_list->pdata[i]; C4_Shadow_add_dst_transition(shadow, dst); } } g_assert(shadow); g_ptr_array_add(model->shadow_list, shadow); return shadow; } C4_Portal *C4_Model_add_portal(C4_Model *model, gchar *name, C4_Calc *calc, gint advance_query, gint advance_target){ register C4_Portal *portal; g_assert(model->is_open); g_assert(calc); g_assert(calc->calc_func); /* Must have position specific calc */ portal = C4_Portal_create(name, calc, advance_query, advance_target); /* Add the portal to the model */ g_ptr_array_add(model->portal_list, portal); return portal; } C4_Span *C4_Model_add_span(C4_Model *model, gchar *name, C4_State *span_state, gint min_query, gint max_query, gint min_target, gint max_target){ register C4_Span *span; g_assert(model->is_open); span = C4_Span_create(name, span_state, min_query, max_query, min_target, max_target); g_ptr_array_add(model->span_list, span); return span; } C4_Model *C4_Model_create(gchar *name){ register C4_Model *model = g_new(C4_Model, 1); g_assert(name); model->name = g_strdup(name); /**/ model->global_code_list = g_ptr_array_new(); model->local_code_list = g_ptr_array_new(); model->cflags_add_list = g_ptr_array_new(); model->init_func = NULL; model->init_macro = NULL; model->exit_func = NULL; model->exit_macro = NULL; /**/ model->thread_ref = ThreadRef_create(); model->is_open = TRUE; /**/ model->state_list = g_ptr_array_new(); model->transition_list = g_ptr_array_new(); model->shadow_list = g_ptr_array_new(); model->calc_list = g_ptr_array_new(); model->portal_list = g_ptr_array_new(); model->span_list = g_ptr_array_new(); /**/ model->start_state = C4_StartState_create( C4_Model_add_state(model, "START"), C4_Scope_ANYWHERE, NULL, NULL); model->end_state = C4_EndState_create( C4_Model_add_state(model, "END"), C4_Scope_ANYWHERE, NULL, NULL); return model; } void C4_Model_rename(C4_Model *model, gchar *name){ g_assert(model); g_assert(name); g_free(model->name); model->name = g_strdup(name); return; } /**/ static void C4_Model_destroy_string_list(GPtrArray *string_list){ register gint i; for(i = 0; i < string_list->len; i++) g_free(string_list->pdata[i]); g_ptr_array_free(string_list, TRUE); return; } void C4_Model_destroy(C4_Model *model){ register gint i; g_assert(model); if(ThreadRef_destroy(model->thread_ref)) return; g_free(model->name); C4_Model_destroy_string_list(model->global_code_list); C4_Model_destroy_string_list(model->local_code_list); C4_Model_destroy_string_list(model->cflags_add_list); if(model->init_macro) g_free(model->init_macro); if(model->exit_macro) g_free(model->exit_macro); C4_StartState_destroy(model->start_state); C4_EndState_destroy(model->end_state); for(i = 0; i < model->state_list->len; i++) C4_State_destroy(model->state_list->pdata[i]); for(i = 0; i < model->transition_list->len; i++) C4_Transition_destroy(model->transition_list->pdata[i]); for(i = 0; i < model->shadow_list->len; i++) C4_Shadow_destroy(model->shadow_list->pdata[i]); for(i = 0; i < model->calc_list->len; i++) C4_Calc_destroy(model->calc_list->pdata[i]); for(i = 0; i < model->portal_list->len; i++) C4_Portal_destroy(model->portal_list->pdata[i]); for(i = 0; i < model->span_list->len; i++) C4_Span_destroy(model->span_list->pdata[i]); g_ptr_array_free(model->state_list, TRUE); g_ptr_array_free(model->transition_list, TRUE); g_ptr_array_free(model->shadow_list, TRUE); g_ptr_array_free(model->calc_list, TRUE); g_ptr_array_free(model->portal_list, TRUE); g_ptr_array_free(model->span_list, TRUE); g_free(model); return; } C4_Model *C4_Model_share(C4_Model *model){ g_assert(model); g_assert(!model->is_open); ThreadRef_share(model->thread_ref); return model; } static gint C4_State_compare_by_name(gconstpointer a, gconstpointer b){ register C4_State *state_a = (C4_State*)a, *state_b = (C4_State*)b; return strcmp(state_a->name, state_b->name); } static gint C4_Transition_compare_by_name(gconstpointer a, gconstpointer b){ register C4_Transition *tran_a = (C4_Transition*)a, *tran_b = (C4_Transition*)b; return strcmp(tran_a->name, tran_b->name); } /* FIXME: rewrite without using state names */ C4_State **C4_Model_build_state_map(C4_Model *src, C4_Model *dst){ register C4_State **state_map = g_new0(C4_State*, src->state_list->len); register gint i; register C4_State *src_state, *dst_state; register GTree *state_map_tree = g_tree_new(C4_State_compare_by_name); g_assert(src); g_assert(dst); g_assert(src->state_list->len == dst->state_list->len); for(i = 0; i < dst->state_list->len; i++){ dst_state = dst->state_list->pdata[i]; if(g_tree_lookup(state_map_tree, dst_state)) g_error("Duplicate state name [%s] in model [%s]", dst_state->name, src->name); g_assert(!g_tree_lookup(state_map_tree, dst_state)); g_tree_insert(state_map_tree, dst_state, dst_state); } for(i = 0; i < src->state_list->len; i++){ src_state = src->state_list->pdata[i]; dst_state = g_tree_lookup(state_map_tree, src_state); g_assert(dst_state); state_map[src_state->id] = dst_state; } g_tree_destroy(state_map_tree); return state_map; } /* FIXME: rewrite without using transition names */ C4_Transition **C4_Model_build_transition_map(C4_Model *src, C4_Model *dst){ register C4_Transition **transition_map = g_new0(C4_Transition*, src->transition_list->len); register gint i; register C4_Transition *src_transition, *dst_transition; register GTree *transition_map_tree = g_tree_new(C4_Transition_compare_by_name); g_assert(src); g_assert(dst); g_assert(src->transition_list->len == dst->transition_list->len); for(i = 0; i < dst->transition_list->len; i++){ dst_transition = dst->transition_list->pdata[i]; if(g_tree_lookup(transition_map_tree, dst_transition)) g_error("Duplicate transition name [%s] in model [%s]", dst_transition->name, src->name); g_assert(!g_tree_lookup(transition_map_tree, dst_transition)); g_tree_insert(transition_map_tree, dst_transition, dst_transition); } for(i = 0; i < src->transition_list->len; i++){ src_transition = src->transition_list->pdata[i]; dst_transition = g_tree_lookup(transition_map_tree, src_transition); g_assert(dst_transition); transition_map[src_transition->id] = dst_transition; } g_tree_destroy(transition_map_tree); return transition_map; } /**/ void C4_Model_make_stereo(C4_Model *model, gchar *suffix_a, gchar *suffix_b){ register gint i, j, prev_state_count = model->state_list->len, prev_transition_count = model->transition_list->len, prev_shadow_count = model->shadow_list->len; register C4_State *state, **state_map = g_new0(C4_State*, prev_state_count); register C4_Transition *transition, **transition_map = g_new0(C4_Transition*, prev_transition_count); register C4_Shadow *shadow, *new_shadow; register gchar *name; g_assert(model->is_open); /* Copy each state -> state.suffix_b */ for(i = 0; i < prev_state_count; i++){ state = model->state_list->pdata[i]; if((state != model->start_state->state) && (state != model->end_state->state)){ name = g_strconcat(state->name, " ", suffix_b, NULL); state_map[state->id] = C4_Model_add_state(model, name); g_free(name); } } /* Copy each transition -> transition.suffix_b */ for(i = 0; i < prev_transition_count; i++){ transition = model->transition_list->pdata[i]; name = g_strconcat(transition->name, " ", suffix_b, NULL); transition_map[transition->id] = C4_Model_add_transition( model, name, state_map[transition->input->id], state_map[transition->output->id], transition->advance_query, transition->advance_target, transition->calc, transition->label, transition->label_data); g_free(name); } /* Copy each shadow -> shadow.suffix_b */ for(i = 0; i < prev_shadow_count; i++){ shadow = model->shadow_list->pdata[i]; name = g_strconcat(shadow->name, " ", suffix_b, NULL); g_assert(shadow->src_state_list->len); g_assert(shadow->dst_transition_list->len); state = shadow->src_state_list->pdata[0]; transition = shadow->dst_transition_list->pdata[0]; new_shadow = C4_Model_add_shadow(model, name, state_map[state->id], transition_map[transition->id], shadow->start_func, shadow->start_macro, shadow->end_func, shadow->end_macro); g_free(name); /* Copy remainding src states */ for(j = 1; j < shadow->src_state_list->len; j++){ state = shadow->src_state_list->pdata[j]; C4_Shadow_add_src_state(new_shadow, state); } /* Copy remainding dst transitions */ for(j = 1; j < shadow->dst_transition_list->len; j++){ transition = shadow->dst_transition_list->pdata[j]; C4_Shadow_add_dst_transition(new_shadow, transition); } } /* Rename each original state -> state.suffix_a */ for(i = 0; i < prev_state_count; i++){ state = model->state_list->pdata[i]; if((state != model->start_state->state) && (state != model->end_state->state)){ name = state->name; state->name = g_strconcat(name, " ", suffix_a, NULL); g_free(name); } } /* Rename each transition -> transition.suffix_a */ for(i = 0; i < prev_transition_count; i++){ transition = model->transition_list->pdata[i]; name = transition->name; transition->name = g_strconcat(name, " ", suffix_a, NULL); g_free(name); } /* Rename each shadow -> shadow.suffix_a */ for(i = 0; i < prev_shadow_count; i++){ shadow = model->shadow_list->pdata[i]; name = shadow->name; shadow->name = g_strconcat(name, " ", suffix_a, NULL); g_free(name); } g_free(state_map); g_free(transition_map); return; } static void C4_Model_insert_calcs(C4_Model *target, C4_Model *insert, C4_Calc **calc_map){ register gint i; register C4_Calc *insert_calc, *target_calc; for(i = 0; i < insert->calc_list->len; i++){ insert_calc = insert->calc_list->pdata[i]; target_calc = C4_Model_match_calc(target, insert_calc); if(!target_calc) target_calc = C4_Model_add_calc(target, insert_calc->name, insert_calc->max_score, insert_calc->calc_func, insert_calc->calc_macro, insert_calc->init_func, insert_calc->init_macro, insert_calc->exit_func, insert_calc->exit_macro, insert_calc->protect); calc_map[insert_calc->id] = target_calc; } return; } static void C4_Model_insert_states(C4_Model *target, C4_Model *insert, C4_State **state_map){ register gint i; register C4_State *insert_state; for(i = 0; i < insert->state_list->len; i++){ insert_state = insert->state_list->pdata[i]; if((insert_state != insert->start_state->state) && (insert_state != insert->end_state->state)) state_map[insert_state->id] = C4_Model_add_state(target, insert_state->name); } return; } static void C4_Model_insert_transitions(C4_Model *target, C4_Model *insert, C4_State *src, C4_State *dst, C4_Calc **calc_map, C4_State **state_map, C4_Transition **transition_map){ register gint i; register C4_Transition *transition; register C4_Calc *calc; for(i = 0; i < insert->transition_list->len; i++){ transition = insert->transition_list->pdata[i]; g_assert(transition); if(transition->calc){ calc = calc_map[transition->calc->id]; } else { calc = NULL; } transition_map[transition->id] = C4_Model_add_transition( target, transition->name, state_map[transition->input->id], state_map[transition->output->id], transition->advance_query, transition->advance_target, calc, transition->label, transition->label_data); } return; } static void C4_Model_insert_shadows(C4_Model *target, C4_Model *insert, C4_State *src, C4_State *dst, C4_State **state_map, C4_Transition **transition_map){ register gint i, j; register C4_Shadow *shadow, *new_shadow; register C4_State *state, *src_state; register C4_Transition *transition, *dst_transition; /* For each insert shadow */ for(i = 0; i < insert->shadow_list->len; i++){ shadow = insert->shadow_list->pdata[i]; g_assert(shadow); g_assert(shadow->src_state_list->len); g_assert(shadow->dst_transition_list->len); state = shadow->src_state_list->pdata[0]; src_state = state_map[state->id]; transition = shadow->dst_transition_list->pdata[0]; dst_transition = transition_map[transition->id]; new_shadow = C4_Model_add_shadow(target, shadow->name, src_state, dst_transition, shadow->start_func, shadow->start_macro, shadow->end_func, shadow->end_macro); for(j = 1; j < shadow->src_state_list->len; j++){ state = shadow->src_state_list->pdata[j]; src_state = state_map[state->id]; C4_Shadow_add_src_state(new_shadow, src_state); } for(j = 1; j < shadow->dst_transition_list->len; j++){ transition = shadow->dst_transition_list->pdata[j]; dst_transition = transition_map[transition->id]; C4_Shadow_add_dst_transition(new_shadow, dst_transition); } } return; } static gboolean C4_Calc_is_same(C4_Calc *calc_a, C4_Calc *calc_b){ if((calc_a->max_score == calc_b->max_score) && (calc_a->calc_func == calc_b->calc_func) && (calc_a->init_func == calc_b->init_func) && (calc_a->exit_func == calc_b->exit_func) && (calc_a->protect == calc_b->protect)) return TRUE; return FALSE; } static gboolean C4_Portal_is_same(C4_Portal *portal_a, C4_Portal *portal_b){ if((portal_a->advance_query == portal_b->advance_query) && (portal_a->advance_target == portal_b->advance_target) && C4_Calc_is_same(portal_a->calc, portal_b->calc)) return TRUE; return FALSE; } static void C4_Model_insert_portals(C4_Model *target, C4_Model *insert, C4_Calc **calc_map){ register gint i, j; register C4_Portal *insert_portal, *target_portal; register C4_Transition *transition; register gboolean found_same; for(i = 0; i < insert->portal_list->len; i++){ insert_portal = insert->portal_list->pdata[i]; found_same = FALSE; for(j = 0; j < target->portal_list->len; j++){ target_portal = target->portal_list->pdata[j]; if(C4_Portal_is_same(insert_portal, target_portal)){ for(j = 0; j < insert_portal->transition_list->len; j++){ transition = insert_portal->transition_list->pdata[j]; g_ptr_array_add(target_portal->transition_list, transition); } found_same = TRUE; break; } } if(!found_same){ C4_Model_add_portal(target, insert_portal->name, calc_map[insert_portal->calc->id], insert_portal->advance_query, insert_portal->advance_target); } } return; } /* FIXME: merge duplicate portals */ static void C4_Model_insert_spans(C4_Model *target, C4_Model *insert, C4_State **state_map){ register gint i; register C4_Span *span; for(i = 0; i < insert->span_list->len; i++){ span = insert->span_list->pdata[i]; C4_Model_add_span(target, span->name, state_map[span->span_state->id], span->min_query, span->max_query, span->min_target, span->max_target); } return; } static gboolean C4_Model_string_list_contains(GPtrArray *list, gchar *str){ register gint i; for(i = 0; i < list->len; i++) if(!strcmp(list->pdata[i], str)) return TRUE; return FALSE; } static void C4_Model_merge_codegen_list(GPtrArray *dst, GPtrArray *src){ register gint i; for(i = 0; i < src->len; i++) if(!C4_Model_string_list_contains(dst, src->pdata[i])) g_ptr_array_add(dst, g_strdup(src->pdata[i])); return; } static void C4_Model_merge_codegen(C4_Model *dst, C4_Model *src){ C4_Model_merge_codegen_list(dst->global_code_list, src->global_code_list); C4_Model_merge_codegen_list(dst->local_code_list, src->local_code_list); C4_Model_merge_codegen_list(dst->cflags_add_list, src->cflags_add_list); return; } void C4_Model_insert(C4_Model *target, C4_Model *insert, C4_State *src, C4_State *dst){ register C4_Calc **calc_map = g_new0(C4_Calc*, insert->calc_list->len); register C4_State **state_map = g_new0(C4_State*, insert->state_list->len); register C4_Transition **transition_map = g_new0(C4_Transition*, insert->transition_list->len); g_assert(target->is_open); g_assert(!insert->is_open); if(!src) src = target->start_state->state; if(!dst) dst = target->end_state->state; C4_Model_insert_calcs(target, insert, calc_map); C4_Model_insert_states(target, insert, state_map); /* Add src and dst states to the state map */ state_map[insert->start_state->state->id] = src; state_map[insert->end_state->state->id] = dst; C4_Model_insert_transitions(target, insert, src, dst, calc_map, state_map, transition_map); C4_Model_insert_shadows(target, insert, src, dst, state_map, transition_map); C4_Model_insert_portals(target, insert, calc_map); C4_Model_insert_spans(target, insert, state_map); C4_Model_merge_codegen(target, insert); C4_Model_configure_extra(target, insert->init_func, insert->init_macro, insert->exit_func, insert->exit_macro); g_free(state_map); g_free(transition_map); g_free(calc_map); return; } /**/ void C4_Model_clear_codegen(C4_Model *model){ C4_Model_destroy_string_list(model->global_code_list); model->global_code_list = g_ptr_array_new(); C4_Model_destroy_string_list(model->local_code_list); model->local_code_list = g_ptr_array_new(); C4_Model_destroy_string_list(model->cflags_add_list); model->cflags_add_list = g_ptr_array_new(); return; } void C4_Model_append_codegen(C4_Model *model, gchar *global_code, gchar *local_code, gchar *cflags_add){ if((global_code) && (!C4_Model_string_list_contains( model->global_code_list, global_code))) g_ptr_array_add(model->global_code_list, g_strdup(global_code)); if((local_code) && (!C4_Model_string_list_contains( model->local_code_list, local_code))) g_ptr_array_add(model->local_code_list, g_strdup(local_code)); if((cflags_add) && (!C4_Model_string_list_contains( model->cflags_add_list, cflags_add))) g_ptr_array_add(model->cflags_add_list, g_strdup(cflags_add)); return; } void C4_Model_configure_extra(C4_Model *model, C4_PrepFunc init_func, gchar *init_macro, C4_PrepFunc exit_func, gchar *exit_macro){ g_assert(model); g_assert(model->is_open); g_assert(init_macro?(init_func?TRUE:FALSE):TRUE); g_assert(exit_macro?(exit_func?TRUE:FALSE):TRUE); /* FIXME: all instances should be on a calc ... */ /* g_assert(!init_func); g_assert(!exit_func); */ /* if(init_func && (!init_macro)) g_warning("Missing init_macro for Model: [%s]", model->name); if(exit_func && (!exit_macro)) g_warning("Missing exit_macro for Model: [%s]", model->name); */ model->init_func = init_func; if(model->init_macro){ if(model->init_macro) g_free(model->init_macro); model->init_macro = g_strdup(init_macro); } model->exit_func = exit_func; if(model->exit_macro){ if(model->exit_macro) g_free(model->exit_macro); model->exit_macro = g_strdup(exit_macro); } return; } /* FIXME: need to allow joining of multiple funcs and macros ... */ void C4_Model_configure_start_state(C4_Model *model, C4_Scope scope, C4_CellStartFunc cell_start_func, gchar *cell_start_macro){ g_assert(model); /* if(cell_start_func && (!cell_start_macro)) g_warning("Missing cell_start_macro for Model: [%s]", model->name); */ model->start_state->scope = scope; model->start_state->cell_start_func = cell_start_func; if(model->start_state->cell_start_macro != cell_start_macro){ if(model->start_state->cell_start_macro) g_free(model->start_state->cell_start_macro); if(cell_start_macro) model->start_state->cell_start_macro = g_strdup(cell_start_macro); else model->start_state->cell_start_macro = NULL; } else { model->start_state->cell_start_macro = NULL; } return; } void C4_Model_configure_end_state(C4_Model *model, C4_Scope scope, C4_CellEndFunc cell_end_func, gchar *cell_end_macro){ g_assert(model); /* if(cell_end_func && (!cell_end_macro)) g_warning("Missing cell_end_macro for Model: [%s]", model->name); */ model->end_state->scope = scope; model->end_state->cell_end_func = cell_end_func; if(model->end_state->cell_end_macro != cell_end_macro){ if(model->end_state->cell_end_macro) g_free(model->end_state->cell_end_macro); if(cell_end_macro) model->end_state->cell_end_macro = g_strdup(cell_end_macro); else model->end_state->cell_end_macro = NULL; } else { model->end_state->cell_end_macro = NULL; } return; } /**/ gchar *C4_Scope_get_name(C4_Scope scope){ register gchar *name = NULL; switch(scope){ case C4_Scope_ANYWHERE: name = "anywhere"; break; case C4_Scope_EDGE: name = "edge"; break; case C4_Scope_QUERY: name = "query"; break; case C4_Scope_TARGET: name = "target"; break; case C4_Scope_CORNER: name = "corner"; break; default: g_error("Unknown C4_Scope type [%d]", scope); break; } return name; } gchar *C4_Label_get_name(C4_Label label){ register gchar *name = NULL; switch(label){ case C4_Label_NONE: name = "none"; break; case C4_Label_MATCH: name = "match"; break; case C4_Label_GAP: name = "gap"; break; case C4_Label_NER: name = "ner"; break; case C4_Label_5SS: name = "5'ss"; break; case C4_Label_3SS: name = "3'ss"; break; case C4_Label_INTRON: name = "intron"; break; case C4_Label_SPLIT_CODON: name = "split codon"; break; case C4_Label_FRAMESHIFT: name = "frameshift"; break; default: g_error("Unknown label [%d]", label); break; } return name; } void C4_Model_print(C4_Model *model){ register gint i; register C4_State *state; register C4_Transition *transition; register C4_Calc *calc; register C4_Portal *portal; register C4_Span *span; g_assert(model); g_print("--\n" " Info for model [%s]\n" " status [%s]\n" " start scope [%s]\n" " end scope [%s]\n", model->name, model->is_open?"OPEN":"CLOSED", C4_Scope_get_name(model->start_state->scope), C4_Scope_get_name(model->end_state->scope)); for(i = 0; i < model->state_list->len; i++){ state = model->state_list->pdata[i]; g_print("State [%s]\n", state->name); } for(i = 0; i < model->transition_list->len; i++){ transition = model->transition_list->pdata[i]; g_print("Transition [%s] ( [%s]->[%s] ) [%d,%d]\n", transition->name, transition->input->name, transition->output->name, transition->advance_query, transition->advance_query); } for(i = 0; i < model->calc_list->len; i++){ calc = model->calc_list->pdata[i]; g_print("Calc [%s]\n", calc->name); } for(i = 0; i < model->portal_list->len; i++){ portal = model->portal_list->pdata[i]; g_print("Portal [%s]\n", portal->name); } for(i = 0; i < model->span_list->len; i++){ span = model->span_list->pdata[i]; g_print("Span [%s]\n", span->name); } g_print("--\n"); return; } static gchar *clean_string(gchar *str){ register gchar *new_string = g_strdup(str); register gint i; for(i = 0; new_string[i]; i++){ if(!isalnum(new_string[i])) new_string[i] = '_'; } return new_string; } static gchar *break_string(gchar *str){ register GString *s = g_string_sized_new(strlen(str)+10); register gchar *new_string; register gint i; for(i = 0; str[i]; i++){ if(isspace(str[i])) g_string_append(s, "\\n"); else g_string_append_c(s, str[i]); } new_string = s->str; g_string_free(s, FALSE); return new_string; } void C4_Model_dump_graphviz(C4_Model *model){ register gint i, j, k; register C4_State *state; register C4_Transition *transition; register C4_Shadow *shadow; register gchar *identifier; g_assert(model); g_assert(!model->is_open); identifier = clean_string(model->name); g_print("// --- START OF GRAPHVIZ DUMP ---\n"); g_print("digraph %s {\n", identifier); g_free(identifier); g_print("// states\n"); for(i = 0; i < model->state_list->len; i++){ state = model->state_list->pdata[i]; identifier = break_string(state->name); g_print("state_%d [label=\"%s\"", state->id, identifier); g_free(identifier); if((state == model->start_state->state) || (state == model->end_state->state)) g_print(",shape=box"); g_print("]\n"); } g_print("// transitions\n"); for(i = 0; i < model->transition_list->len; i++){ transition = model->transition_list->pdata[i]; g_print("state_%d -> state_%d", transition->input->id, transition->output->id); if(transition->advance_query || transition->advance_target) g_print(" [label=\"%d:%d\"]", transition->advance_query, transition->advance_target); g_print(" // %s", transition->name); g_print("\n"); } if(model->shadow_list->len) g_print("// shadows\n"); /* Shadows are just drawn from src to dst->input */ for(i = 0; i < model->shadow_list->len; i++){ shadow = model->shadow_list->pdata[i]; for(j = 0; j < shadow->src_state_list->len; j++){ state = shadow->src_state_list->pdata[j]; for(k = 0; k < shadow->dst_transition_list->len; k++){ transition = shadow->dst_transition_list->pdata[k]; g_print("state_%d -> state_%d" " [style=dotted,label=\"%s\"]\n", state->id, transition->input->id, transition->name); } } } g_print("}\n"); g_print("// --- END OF GRAPHVIZ DUMP ---\n"); return; } /* FIXME: include display of portals and spans * with peripheries=2 etc */ void C4_Model_open(C4_Model *model){ g_assert(model); g_assert(!model->is_open); model->is_open = TRUE; return; } static gboolean C4_Model_path_is_possible_recur(C4_Model *model, C4_State *src, C4_State *dst, gboolean *visited){ register C4_Transition *transition; register C4_State *next_state; register gint i; visited[src->id] = TRUE; for(i = 0; i < src->output_transition_list->len; i++){ transition = src->output_transition_list->pdata[i]; next_state = transition->output; if(next_state == dst) return TRUE; if(!visited[next_state->id]){ if(C4_Model_path_is_possible_recur(model, next_state, dst, visited)){ return TRUE; } } } return FALSE; } /* Assumes that state->id is set */ gboolean C4_Model_path_is_possible(C4_Model *model, C4_State *src, C4_State *dst){ register gboolean *visited; register gboolean is_valid; g_assert(model); g_assert(src); g_assert(dst); visited = g_new0(gboolean, model->state_list->len); is_valid = C4_Model_path_is_possible_recur(model, src, dst, visited); g_free(visited); return is_valid; } static gboolean C4_Model_has_start_to_end_path(C4_Model *model){ return C4_Model_path_is_possible(model, model->start_state->state, model->end_state->state); } static void C4_Model_set_ids(C4_Model *model){ register gint i; register C4_State *state; register C4_Transition *transition; register C4_Shadow *shadow; register C4_Calc *calc; register C4_Portal *portal; register C4_Span *span; g_assert(model); for(i = 0; i < model->state_list->len; i++){ state = model->state_list->pdata[i]; state->id = i; } for(i = 0; i < model->transition_list->len; i++){ transition = model->transition_list->pdata[i]; transition->id = i; } for(i = 0; i < model->shadow_list->len; i++){ shadow = model->shadow_list->pdata[i]; shadow->id = i; } for(i = 0; i < model->calc_list->len; i++){ calc = model->calc_list->pdata[i]; calc->id = i; } for(i = 0; i < model->portal_list->len; i++){ portal = model->portal_list->pdata[i]; portal->id = i; } for(i = 0; i < model->span_list->len; i++){ span = model->span_list->pdata[i]; span->id = i; } return; } static gboolean C4_Model_is_valid(C4_Model *model){ register gint i; register C4_State *state; g_assert(model); for(i = 0; i < model->state_list->len; i++){ state = model->state_list->pdata[i]; /* All states except start must have input transitions */ if(state == model->start_state->state) g_assert(!state->input_transition_list->len); else g_assert(state->input_transition_list->len); /* All states except end must have output transitions */ if(state == model->end_state->state) g_assert(!state->output_transition_list->len); else g_assert(state->output_transition_list->len); } return TRUE; } static void c4_g_ptr_array_reverse(GPtrArray *ptr_array){ register gint a, z; register gpointer swap; for(a = 0, z = ptr_array->len-1; a < z; a++, z--){ swap = ptr_array->pdata[a]; ptr_array->pdata[a] = ptr_array->pdata[z]; ptr_array->pdata[z] = swap; } return; } /* Calculate dependency ordering and transition precedence: */ static void C4_Model_topological_sort(C4_Model *model){ register gint *dependent = g_new0(gint, model->transition_list->len); register GPtrArray *ordered = g_ptr_array_new(); register gint i, j; register C4_Transition *transition, *input_transition; register C4_State *state; register gboolean removed_transition; /**/ /* Set initial dependencies */ for(i = 0; i < model->transition_list->len; i++){ transition = model->transition_list->pdata[i]; if((transition->advance_query == 0) && (transition->advance_target == 0)){ state = transition->input; for(j = 0; j < state->input_transition_list->len; j++){ input_transition = state->input_transition_list->pdata[j]; if((input_transition->advance_query == 0) && (input_transition->advance_target == 0)){ dependent[input_transition->id]++; } } } } /* Perform topological sort of static transitions */ do { removed_transition = FALSE; for(i = 0; i < model->transition_list->len; i++){ if(dependent[i] != 0) continue; transition = model->transition_list->pdata[i]; if((transition->advance_query != 0) || (transition->advance_target != 0)) continue; removed_transition = TRUE; dependent[transition->id] = -1; g_ptr_array_add(ordered, transition); state = transition->input; for(j = 0; j < state->input_transition_list->len; j++){ input_transition = state->input_transition_list->pdata[j]; if((transition->advance_query == 0) && (transition->advance_target == 0)) dependent[input_transition->id]--; } } } while(removed_transition); /* Take the precedence zero transitions */ for(i = 0; i < model->transition_list->len; i++){ transition = model->transition_list->pdata[i]; if(transition->advance_query || transition->advance_target) g_ptr_array_add(ordered, transition); } /**/ c4_g_ptr_array_reverse(ordered); /* Assert that there are no cycles */ g_assert(ordered->len == model->transition_list->len); /* Reorder the transition list */ for(i = 0; i < ordered->len; i++){ transition = ordered->pdata[i]; transition->id = i; model->transition_list->pdata[i] = transition; } /**/ g_ptr_array_free(ordered, TRUE); g_free(dependent); return; } /* FIXME: optimisation: * transitions are only ordered; * may wish to store precedence info instead * to allow some optimisations. */ static void C4_Portal_find_transitions(C4_Portal *portal, C4_Model *model){ register gint i; register C4_Transition *transition; /* Find applicable transitions for portal->calc */ for(i = 0; i < model->transition_list->len; i++){ transition = model->transition_list->pdata[i]; if((transition->calc == portal->calc) && (transition->input == transition->output)){ g_assert(transition->advance_query == portal->advance_query); g_assert(transition->advance_target == portal->advance_target); g_ptr_array_add(portal->transition_list, transition); } } g_assert(portal->transition_list->len); return; } static void C4_Model_finalise(C4_Model *model){ register gint i; register C4_Portal *portal; register C4_Transition *transition; g_assert(model); for(i = 0; i < model->portal_list->len; i++){ portal = model->portal_list->pdata[i]; if(portal->transition_list->len) g_ptr_array_set_size(portal->transition_list, 0); C4_Portal_find_transitions(portal, model); } model->max_query_advance = 0; model->max_target_advance = 0; for(i = 0; i < model->transition_list->len; i++){ transition = model->transition_list->pdata[i]; if(model->max_query_advance < transition->advance_query) model->max_query_advance = transition->advance_query; if(model->max_target_advance < transition->advance_target) model->max_target_advance = transition->advance_target; } g_assert(model->max_query_advance || model->max_target_advance); return; } /**/ static void C4_Shadow_designate_recur(C4_Shadow *shadow, C4_Transition *transition, gboolean *des, gboolean *states_visited){ register gint i; register C4_State *state = transition->input; if(!states_visited[state->id]){ states_visited[state->id] = TRUE; /* If transition is an dst transition stop */ for(i = 0; i < transition->dst_shadow_list->len; i++){ if(shadow == transition->dst_shadow_list->pdata[i]) return; } /* Visit input transitions of input state */ for(i = 0; i < state->input_transition_list->len; i++){ transition = state->input_transition_list->pdata[i]; g_assert(!des[transition->id]); des[transition->id] = TRUE; C4_Shadow_designate_recur(shadow, transition, des, states_visited); } } return; } static gboolean *C4_Shadow_get_designation(C4_Model *model, C4_Shadow *shadow){ register gboolean *des = g_new0(gboolean, model->transition_list->len); register gboolean *states_visited = g_new0(gboolean, model->state_list->len); register gint i; register C4_Transition *transition; g_assert(C4_Shadow_is_valid(shadow, model)); for(i = 0; i < shadow->dst_transition_list->len; i++){ transition = shadow->dst_transition_list->pdata[i]; des[transition->id] = TRUE; C4_Shadow_designate_recur(shadow, transition, des, states_visited); } g_free(states_visited); return des; } static gboolean C4_Shadow_designation_fits(C4_Model *model, gboolean *des_a, gboolean *des_b){ register gint i; register gboolean *state_is_used; register C4_Transition *transition; for(i = 0; i < model->transition_list->len; i++) if(des_a[i] && des_b[i]) return FALSE; state_is_used = g_new0(gboolean, model->state_list->len); /* Fail if any des_a output states are des_b inputs */ for(i = 0; i < model->transition_list->len; i++) if(des_a[i]){ transition = model->transition_list->pdata[i]; state_is_used[transition->output->id] = TRUE; } for(i = 0; i < model->transition_list->len; i++) if(des_b[i]){ transition = model->transition_list->pdata[i]; if(state_is_used[transition->input->id]){ g_free(state_is_used); return FALSE; } } for(i = 0; i < model->state_list->len; i++) state_is_used[i] = FALSE; /* Fail if any des_b output states are des_a inputs */ for(i = 0; i < model->transition_list->len; i++) if(des_b[i]){ transition = model->transition_list->pdata[i]; state_is_used[transition->output->id] = TRUE; } for(i = 0; i < model->transition_list->len; i++) if(des_a[i]){ transition = model->transition_list->pdata[i]; if(state_is_used[transition->input->id]){ g_free(state_is_used); return FALSE; } } g_free(state_is_used); return TRUE; } static void C4_Shadow_designation_join(C4_Model *model, gboolean *master, gboolean *copy){ register gint i; for(i = 0; i < model->transition_list->len; i++){ if(copy[i]){ g_assert(!master[i]); master[i] = TRUE; } } return; } static void C4_Model_designate_shadows(C4_Model *model){ register gboolean *des, *curr_des; register GPtrArray *designation_list = g_ptr_array_new(); register gint i, j; register C4_Shadow *shadow; for(i = 0; i < model->shadow_list->len; i++){ shadow = model->shadow_list->pdata[i]; curr_des = C4_Shadow_get_designation(model, shadow); shadow->designation = -1; for(j = 0; j < designation_list->len; j++){ des = designation_list->pdata[j]; if(C4_Shadow_designation_fits(model, des, curr_des)){ C4_Shadow_designation_join(model, des, curr_des); shadow->designation = j; break; } } if(shadow->designation == -1){ /* not designated */ shadow->designation = designation_list->len; g_ptr_array_add(designation_list, curr_des); } else { g_free(curr_des); } } model->total_shadow_designations = designation_list->len; for(i = 0; i < designation_list->len; i++) g_free(designation_list->pdata[i]); g_ptr_array_free(designation_list, TRUE); return; } void C4_Model_close(C4_Model *model){ g_assert(model); g_assert(model->is_open); g_assert(C4_Model_is_valid(model)); C4_Model_set_ids(model); g_assert(C4_Model_has_start_to_end_path(model)); C4_Model_topological_sort(model); C4_Model_designate_shadows(model); C4_Model_finalise(model); model->is_open = FALSE; return; } /**/ void C4_Calc_init(C4_Calc *calc, Region *region, gpointer user_data){ if(!calc) return; if(calc->init_func) calc->init_func(region, user_data); return; } void C4_Calc_exit(C4_Calc *calc, Region *region, gpointer user_data){ if(!calc) return; if(calc->exit_func) calc->exit_func(region, user_data); return; } C4_Score C4_Calc_score(C4_Calc *calc, gint query_pos, gint target_pos, gpointer user_data){ register C4_Score score; if(!calc) return 0; if(calc->calc_func){ score = calc->calc_func(query_pos, target_pos, user_data); g_assert(score <= calc->max_score); return score; } return calc->max_score; } C4_Model *C4_Model_copy(C4_Model *old_model){ register C4_Model *new_model; register gchar *name; register gint i, j; register C4_Calc *calc, **calc_map = g_new0(C4_Calc*, old_model->calc_list->len); register C4_State *state, **state_map = g_new0(C4_State*, old_model->state_list->len); register C4_State *src_state, *dst_state; register C4_Transition *transition, **transition_map = g_new0(C4_Transition*, old_model->transition_list->len); register C4_Shadow *shadow, *new_shadow; register C4_Portal *portal; register C4_Span *span; g_assert(old_model); g_assert(!old_model->is_open); name = g_strdup_printf("Copy:%s", old_model->name); new_model = C4_Model_create(name); g_free(name); /* Copy calcs */ for(i = 0; i < old_model->calc_list->len; i++){ calc = old_model->calc_list->pdata[i]; calc_map[i] = C4_Model_add_calc(new_model, calc->name, calc->max_score, calc->calc_func, calc->calc_macro, calc->init_func, calc->init_macro, calc->exit_func, calc->exit_macro, calc->protect); } /* Copy states */ for(i = 0; i < old_model->state_list->len; i++){ state = old_model->state_list->pdata[i]; if((state == old_model->start_state->state) || (state == old_model->end_state->state)) continue; state_map[i] = C4_Model_add_state(new_model, state->name); } /* Copy transitions */ for(i = 0; i < old_model->transition_list->len; i++){ transition = old_model->transition_list->pdata[i]; if(transition->input == old_model->start_state->state) src_state = NULL; else src_state = state_map[transition->input->id]; if(transition->output == old_model->end_state->state) dst_state = NULL; else dst_state = state_map[transition->output->id]; transition_map[i] = C4_Model_add_transition(new_model, transition->name, src_state, dst_state, transition->advance_query, transition->advance_target, transition->calc?calc_map[transition->calc->id]:NULL, transition->label, transition->label_data); } /* Copy shadows */ for(i = 0; i < old_model->shadow_list->len; i++){ shadow = old_model->shadow_list->pdata[i]; state = shadow->src_state_list->pdata[0]; transition = shadow->dst_transition_list->pdata[0]; new_shadow = C4_Model_add_shadow(new_model, shadow->name, state_map[state->id], transition_map[transition->id], shadow->start_func, shadow->start_macro, shadow->end_func, shadow->end_macro); for(j = 1; j < shadow->src_state_list->len; j++){ state = shadow->src_state_list->pdata[j]; C4_Shadow_add_src_state(new_shadow, state_map[state->id]); } for(j = 1; j < shadow->dst_transition_list->len; j++){ transition = shadow->dst_transition_list->pdata[j]; C4_Shadow_add_dst_transition(new_shadow, transition_map[transition->id]); } } /* Copy portals */ for(i = 0; i < old_model->portal_list->len; i++){ portal = old_model->portal_list->pdata[i]; C4_Model_add_portal(new_model, portal->name, calc_map[portal->calc->id], portal->advance_query, portal->advance_target); } /* Copy spans */ for(i = 0; i < old_model->span_list->len; i++){ span = old_model->span_list->pdata[i]; C4_Model_add_span(new_model, span->name, state_map[span->span_state->id], span->min_query, span->max_query, span->min_target, span->max_target); } /* Configure extras */ C4_Model_merge_codegen(new_model, old_model); C4_Model_configure_extra(new_model, old_model->init_func, old_model->init_macro, old_model->exit_func, old_model->exit_macro); /* Configure start and end states */ C4_Model_configure_start_state(new_model, old_model->start_state->scope, old_model->start_state->cell_start_func, old_model->start_state->cell_start_macro); C4_Model_configure_end_state(new_model, old_model->end_state->scope, old_model->end_state->cell_end_func, old_model->end_state->cell_end_macro); g_free(calc_map); g_free(state_map); g_free(transition_map); /* Close the model without a topological sort */ C4_Model_set_ids(new_model); C4_Model_designate_shadows(new_model); C4_Model_finalise(new_model); new_model->is_open = FALSE; return new_model; } /**/ void C4_Model_remove_transition(C4_Model *model, C4_Transition *transition){ register gboolean removed_transition; /* Remove from the input state */ removed_transition = g_ptr_array_remove_fast( transition->input->output_transition_list, transition); g_assert(removed_transition); /* Remove from the output state */ removed_transition = g_ptr_array_remove_fast( transition->output->input_transition_list, transition); g_assert(removed_transition); /* Remove from the Model */ removed_transition = g_ptr_array_remove_fast(model->transition_list, transition); g_assert(removed_transition); C4_Transition_destroy(transition); return; } static void C4_Model_remove_shadow(C4_Model *model, C4_Shadow *shadow){ register gint i; register gboolean removed_shadow; register C4_State *state; register C4_Transition *transition; /* Remove from src states */ for(i = 0; i < shadow->src_state_list->len; i++){ state = shadow->src_state_list->pdata[i]; removed_shadow = g_ptr_array_remove_fast( state->src_shadow_list, shadow); g_assert(removed_shadow); } /* Remove from dst transitions */ for(i = 0; i < shadow->dst_transition_list->len; i++){ transition = shadow->dst_transition_list->pdata[i]; removed_shadow = g_ptr_array_remove_fast( transition->dst_shadow_list, shadow); g_assert(removed_shadow); } /* Remove from the Model */ removed_shadow = g_ptr_array_remove_fast(model->shadow_list, shadow); g_assert(removed_shadow); C4_Shadow_destroy(shadow); return; } void C4_Model_remove_state(C4_Model *model, C4_State *state){ register gboolean was_open, removed_state; register gint i; register C4_Transition *transition; register C4_Shadow *shadow; register C4_Calc *calc; register C4_Portal *portal; register C4_Span *span; register gboolean *calc_used; /* Check not start or end */ g_assert(model); g_assert(state); g_assert(state != model->start_state->state); g_assert(state != model->end_state->state); /* Open model if not open */ was_open = model->is_open; if(!was_open) C4_Model_open(model); /* Remove from state */ removed_state = g_ptr_array_remove_fast(model->state_list, state); g_assert(removed_state); /* Remove unused transitions */ for(i = 0; i < state->input_transition_list->len; i++){ transition = state->input_transition_list->pdata[i]; C4_Model_remove_transition(model, transition); } for(i = 0; i < state->output_transition_list->len; i++){ transition = state->input_transition_list->pdata[i]; C4_Model_remove_transition(model, transition); } /* Remove state from shadows */ for(i = 0; i < state->src_shadow_list->len; i++){ shadow = state->src_shadow_list->pdata[i]; if(shadow->src_state_list->len == 1){ /* Remove entire shadow */ C4_Model_remove_shadow(model, shadow); } else { /* Remove state from shadow state list */ removed_state = g_ptr_array_remove_fast( shadow->src_state_list, state); g_assert(removed_state); } } /* Remove unused calcs */ calc_used = g_new0(gboolean, model->calc_list->len); for(i = 0; i < model->calc_list->len; i++){ calc = model->calc_list->pdata[i]; calc->id = i; } for(i = 0; i < model->transition_list->len; i++){ transition = model->transition_list->pdata[i]; calc_used[transition->calc->id] = TRUE; } for(i = model->calc_list->len-1; i >= 0; i--){ if(!calc_used[i]){ calc = model->calc_list->pdata[i]; g_ptr_array_remove_fast(model->calc_list, calc); C4_Calc_destroy(calc); } } /* Remove unused portals */ for(i = model->portal_list->len; i >= 0; i--){ portal = model->portal_list->pdata[i]; if(!calc_used[portal->calc->id]){ g_ptr_array_remove_fast(model->portal_list, portal); C4_Portal_destroy(portal); } } g_free(calc_used); /* Remove related spans */ for(i = model->span_list->len-1; i >= 0; i--){ span = model->span_list->pdata[i]; if(span->span_state == state){ g_ptr_array_remove_fast(model->span_list, span); C4_Span_destroy(span); } } /* Close model */ if(!was_open) C4_Model_close(model); C4_State_destroy(state); return; } gboolean C4_Model_is_global(C4_Model *model){ if(model->start_state->scope != C4_Scope_CORNER) return FALSE; if(model->end_state->scope != C4_Scope_CORNER) return FALSE; return TRUE; } gboolean C4_Model_is_local(C4_Model *model){ if(model->start_state->scope != C4_Scope_ANYWHERE) return FALSE; if(model->end_state->scope != C4_Scope_ANYWHERE) return FALSE; return TRUE; } void C4_Model_remove_all_shadows(C4_Model *model){ register C4_Shadow *shadow; g_assert(model->is_open); while(model->shadow_list->len){ shadow = model->shadow_list->pdata[0]; g_assert(shadow); C4_Model_remove_shadow(model, shadow); } return; } /**/ typedef struct { GPtrArray *new_src_state_list; GPtrArray *new_dst_transition_list; } C4_ProtoShadow; static C4_ProtoShadow *C4_ProtoShadow_create(void){ register C4_ProtoShadow *cps = g_new(C4_ProtoShadow, 1); cps->new_src_state_list = g_ptr_array_new(); cps->new_dst_transition_list = g_ptr_array_new(); return cps; } static void C4_ProtoShadow_destroy(C4_ProtoShadow *cps){ g_ptr_array_free(cps->new_src_state_list, TRUE); g_ptr_array_free(cps->new_dst_transition_list, TRUE); g_free(cps); return; } static void C4_ProtoShadow_add_state(C4_ProtoShadow *cps, C4_State *state){ g_ptr_array_add(cps->new_src_state_list, state); return; } static void C4_ProtoShadow_add_transition(C4_ProtoShadow *cps, C4_Transition *transition){ g_ptr_array_add(cps->new_dst_transition_list, transition); return; } static void C4_ProtoShadow_generate(C4_ProtoShadow *cps, C4_Model *old_model, C4_Model *new_model, C4_Shadow *old_shadow){ register gint i; register C4_Shadow *new_shadow; g_assert(cps); g_assert(cps->new_src_state_list->len); g_assert(cps->new_dst_transition_list->len); new_shadow = C4_Model_add_shadow(new_model, old_shadow->name, cps->new_src_state_list->pdata[0], cps->new_dst_transition_list->pdata[0], old_shadow->start_func, old_shadow->start_macro, old_shadow->end_func, old_shadow->end_macro); for(i = 1; i < cps->new_src_state_list->len; i++) C4_Shadow_add_src_state(new_shadow, cps->new_src_state_list->pdata[i]); for(i = 1; i < cps->new_dst_transition_list->len; i++) C4_Shadow_add_dst_transition(new_shadow, cps->new_dst_transition_list->pdata[i]); return; } /**/ static void C4_Model_segment_reuse_state(C4_Model *old_model, C4_Model *new_model, C4_State *old_state, C4_State **state_map, C4_ProtoShadow **proto_shadow_map){ register C4_State *new_state; register gint i; register C4_Shadow *shadow; if(old_state == old_model->start_state->state) return; if(old_state == old_model->end_state->state) return; if(state_map[old_state->id]) /* Already reused */ return; new_state = C4_Model_add_state(new_model, old_state->name); state_map[old_state->id] = new_state; for(i = 0; i < old_state->src_shadow_list->len; i++){ shadow = old_state->src_shadow_list->pdata[i]; if(!proto_shadow_map[shadow->id]) proto_shadow_map[shadow->id] = C4_ProtoShadow_create(); C4_ProtoShadow_add_state(proto_shadow_map[shadow->id], new_state); } return; } static void C4_Model_segment_add_transition(C4_Model *old_model, C4_Model *new_model, C4_Transition *transition, C4_State **state_map, C4_Calc **calc_map, GTree *transition_map_tree, C4_ProtoShadow **proto_shadow_map, gboolean from_start, gboolean to_end){ register C4_Calc *calc = NULL; register C4_Transition *new_transition; register gint i; register C4_Shadow *shadow; if(!from_start) C4_Model_segment_reuse_state(old_model, new_model, transition->input, state_map, proto_shadow_map); if(!to_end) C4_Model_segment_reuse_state(old_model, new_model, transition->output, state_map, proto_shadow_map); if(transition->calc){ if(!calc_map[transition->calc->id]){ calc_map[transition->calc->id] = C4_Model_add_calc(new_model, transition->calc->name, transition->calc->max_score, transition->calc->calc_func, transition->calc->calc_macro, transition->calc->init_func, transition->calc->init_macro, transition->calc->exit_func, transition->calc->exit_macro, transition->calc->protect); } calc = calc_map[transition->calc->id]; } new_transition = C4_Model_add_transition(new_model, transition->name, from_start?NULL:state_map[transition->input->id], to_end?NULL:state_map[transition->output->id], transition->advance_query, transition->advance_target, calc, transition->label, transition->label_data); g_tree_insert(transition_map_tree, new_transition, transition); for(i = 0; i < transition->dst_shadow_list->len; i++){ shadow = transition->dst_shadow_list->pdata[i]; if(!proto_shadow_map[shadow->id]) proto_shadow_map[shadow->id] = C4_ProtoShadow_create(); C4_ProtoShadow_add_transition(proto_shadow_map[shadow->id], new_transition); } return; } static void C4_Model_segment_recur(C4_Model *old_model, C4_Model *new_model, C4_State *state, gboolean *visited_state, C4_State **state_map, C4_Calc **calc_map, GTree *transition_map_tree, C4_ProtoShadow **proto_shadow_map){ register gint i; register C4_Transition *transition; if(!state_map[state->id]) /* Check that it is on the state map */ return; if(visited_state[state->id]) /* Check is it not already visited */ return; if(state == old_model->start_state->state) return; if(state == old_model->end_state->state) return; visited_state[state->id] = TRUE; for(i = 0; i < state->output_transition_list->len; i++){ transition = state->output_transition_list->pdata[i]; if(transition->output == old_model->end_state->state) continue; C4_Model_segment_add_transition(old_model, new_model, transition, state_map, calc_map, transition_map_tree, proto_shadow_map, FALSE, FALSE); C4_Model_segment_recur(old_model, new_model, transition->output, visited_state, state_map, calc_map, transition_map_tree, proto_shadow_map); } return; } #if 0 /* FIXME: not currently used */ static void C4_State_remove_duplicate_transitions(C4_State *state, C4_Model *model, GPtrArray *to_remove){ register gint i, j; register C4_Score score_a, score_b; register C4_Transition *transition_a, *transition_b; for(i = 0; i < state->output_transition_list->len; i++){ transition_a = state->output_transition_list->pdata[i]; for(j = 0; j < i; j++){ transition_b = state->output_transition_list->pdata[j]; if(transition_a->output != transition_b->output) continue; if(transition_a->advance_query != transition_b->advance_query) continue; if(transition_a->advance_target != transition_b->advance_target) continue; if(transition_a->calc == transition_b->calc) C4_Model_remove_transition(model, transition_b); if(transition_a->calc && transition_a->calc->calc_func) continue; if(transition_b->calc && transition_b->calc->calc_func) continue; score_a = transition_a->calc ? transition_a->calc->max_score : 0; score_b = transition_b->calc ? transition_b->calc->max_score : 0; if(score_a > score_b){ g_ptr_array_add(to_remove, transition_b); } else { g_ptr_array_add(to_remove, transition_a); } } } return; } static void C4_Model_remove_duplicate_transitions(C4_Model *model){ register gint i; register C4_State *state; register C4_Transition *transition; register GPtrArray *to_remove = g_ptr_array_new(); /* Find transitions to remove */ for(i = 0; i < model->state_list->len; i++){ state = model->state_list->pdata[i]; C4_State_remove_duplicate_transitions(state, model, to_remove); } /* Remove transitions from the model */ for(i = 0; i < to_remove->len; i++){ transition = to_remove->pdata[i]; C4_Model_remove_transition(model, transition); } g_ptr_array_free(to_remove, TRUE); return; } #endif /* 0 */ static C4_Model *C4_Model_select(C4_Model *model, C4_State *src, C4_State *dst, C4_State **state_map, GTree *transition_map_tree){ register gchar *name = g_strdup_printf( "Segment(\"%s\"->\"%s\"):[%s]", src->name, dst->name, model->name); register C4_Model *segment_model = C4_Model_create(name); register C4_Transition *transition; register gboolean *visited_state = g_new0(gboolean, model->state_list->len); register C4_State *state; register C4_Calc **calc_map = g_new0(C4_Calc*, model->calc_list->len); register gint i; register C4_Shadow *shadow; register C4_ProtoShadow **proto_shadow_map = g_new0(C4_ProtoShadow*, model->shadow_list->len); /* Propagate model extras */ C4_Model_configure_extra(segment_model, model->init_func, model->init_macro, model->exit_func, model->exit_macro); C4_Model_merge_codegen(segment_model, model); /* Propagate any shadows from start */ for(i = 0; i < src->src_shadow_list->len; i++){ shadow = src->src_shadow_list->pdata[i]; if(!proto_shadow_map[shadow->id]) proto_shadow_map[shadow->id] = C4_ProtoShadow_create(); C4_ProtoShadow_add_state(proto_shadow_map[shadow->id], segment_model->start_state->state); } /* Transitions from start */ for(i = 0; i < src->output_transition_list->len; i++){ transition = src->output_transition_list->pdata[i]; if(!C4_Model_path_is_possible(model, transition->output, dst)) continue; C4_Model_segment_add_transition(model, segment_model, transition, state_map, calc_map, transition_map_tree, proto_shadow_map, TRUE, FALSE); } /* Transitions to end */ for(i = 0; i < dst->input_transition_list->len; i++){ transition = dst->input_transition_list->pdata[i]; if(!C4_Model_path_is_possible(model, src, transition->input)) continue; C4_Model_segment_add_transition(model, segment_model, transition, state_map, calc_map, transition_map_tree, proto_shadow_map, FALSE, TRUE); } /* Other transitions */ for(i = 0; i < model->state_list->len; i++){ state = model->state_list->pdata[i]; C4_Model_segment_recur(model, segment_model, state, visited_state, state_map, calc_map, transition_map_tree, proto_shadow_map); } /* Remove duplicate equivalent transitions */ /* C4_Model_remove_duplicate_transitions(segment_model); */ for(i = 0; i < model->shadow_list->len; i++){ if(proto_shadow_map[i]){ C4_ProtoShadow_generate(proto_shadow_map[i], model, segment_model, model->shadow_list->pdata[i]); C4_ProtoShadow_destroy(proto_shadow_map[i]); } } g_free(proto_shadow_map); g_free(calc_map); g_free(visited_state); g_free(name); return segment_model; } static gint C4_Transition_compare_by_address(gconstpointer a, gconstpointer b){ return GPOINTER_TO_INT(a)-GPOINTER_TO_INT(b); } C4_DerivedModel *C4_DerivedModel_create(C4_Model *original_model, C4_State *src, C4_State *dst, C4_Scope start_scope, C4_CellStartFunc cell_start_func, gchar *cell_start_macro, C4_Scope end_scope, C4_CellEndFunc cell_end_func, gchar *cell_end_macro){ register C4_DerivedModel *derived_model = g_new(C4_DerivedModel, 1); register GTree *transition_map_tree = g_tree_new(C4_Transition_compare_by_address); register gint i; register C4_Transition *original_transition, *derived_transition; register C4_State **state_map = g_new0(C4_State*, original_model->state_list->len); g_assert(!original_model->is_open); g_assert(src); g_assert(dst); derived_model->original = C4_Model_share(original_model); derived_model->derived = C4_Model_select(original_model, src, dst, state_map, transition_map_tree); g_free(state_map); C4_Model_close(derived_model->derived); C4_Model_configure_start_state(derived_model->derived, start_scope, cell_start_func, cell_start_macro); C4_Model_configure_end_state(derived_model->derived, end_scope, cell_end_func, cell_end_macro); derived_model->transition_map = g_new(C4_Transition*, derived_model->derived->transition_list->len); for(i = 0; i < derived_model->derived->transition_list->len; i++){ derived_transition = derived_model->derived->transition_list->pdata[i]; g_assert(derived_transition); original_transition = g_tree_lookup(transition_map_tree, derived_transition); g_assert(original_transition); derived_model->transition_map[derived_transition->id] = original_transition; } g_tree_destroy(transition_map_tree); return derived_model; } /* FIXME: optimisation: when start and end are the same as original, * just do a C4_Model_copy(), (with transition map) */ void C4_DerivedModel_destroy(C4_DerivedModel *derived_model){ C4_Model_destroy(derived_model->original); C4_Model_destroy(derived_model->derived); g_free(derived_model->transition_map); g_free(derived_model); return; } /**/ exonerate-2.4.0/src/c4/alignment.test.c0000644000175000017500000000335511162714262014635 00000000000000/****************************************************************\ * * * C4 dynamic programming library - alignment code * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "alignment.h" #include "match.h" gint Argument_main(Argument *arg){ register Region *region = Region_create_blank(); register C4_Model *model = C4_Model_create("test"); register Alignment *alignment; register Match *match; Match_ArgumentSet_create(arg); Argument_process(arg, "alignment.test", NULL, NULL); match = Match_find(Match_Type_DNA2DNA); C4_Model_add_transition(model, "start to end", NULL, NULL, 1, 1, NULL, C4_Label_MATCH, match); C4_Model_close(model); alignment = Alignment_create(model, region, 0); g_message("alignment test"); Alignment_destroy(alignment); C4_Model_destroy(model); Region_destroy(region); return 0; } /* Proper testing is done with the model tests */ exonerate-2.4.0/src/c4/alignment.c0000644000175000017500000042520011220254204013642 00000000000000/****************************************************************\ * * * C4 dynamic programming library - alignment code * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strlen() */ #include /* For tolower() */ #include /* For time() */ #include "match.h" #include "alignment.h" Alignment_ArgumentSet *Alignment_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static Alignment_ArgumentSet aas = {80, TRUE}; if(arg){ as = ArgumentSet_create("Alignment options"); ArgumentSet_add_option(as, '\0', "alignmentwidth", NULL, "Alignment display width", "80", Argument_parse_int, &aas.alignment_width); ArgumentSet_add_option(as, '\0', "forwardcoordinates", NULL, "Report all coordinates on the forward strand", "TRUE", Argument_parse_boolean, &aas.forward_strand_coords); Argument_absorb_ArgumentSet(arg, as); } return &aas; } /**/ Alignment *Alignment_create(C4_Model *model, Region *region, C4_Score score){ register Alignment *alignment = g_new(Alignment, 1); g_assert(model); g_assert(region); alignment->ref_count = 1; alignment->region = Region_share(region); alignment->score = score; alignment->operation_list = g_ptr_array_new(); alignment->model = C4_Model_share(model); return alignment; } Alignment *Alignment_share(Alignment *alignment){ g_assert(alignment); alignment->ref_count++; return alignment; } void Alignment_destroy(Alignment *alignment){ register gint i; g_assert(alignment); if(--alignment->ref_count) return; for(i = 0; i < alignment->operation_list->len; i++) g_free(alignment->operation_list->pdata[i]); g_ptr_array_free(alignment->operation_list, TRUE); Region_destroy(alignment->region); C4_Model_destroy(alignment->model); g_free(alignment); return; } void Alignment_add(Alignment *alignment, C4_Transition *transition, gint length){ register AlignmentOperation *alignment_operation, *prev; g_assert(alignment); g_assert(transition); if(alignment->operation_list->len){ prev = alignment->operation_list->pdata [alignment->operation_list->len-1]; if(prev->transition == transition){ prev->length += length; g_assert(prev->length >= 0); /* Cannot have -ve total */ if(prev->length == 0){ /* Drop the transition */ g_free(prev); g_ptr_array_set_size(alignment->operation_list, alignment->operation_list->len-1); } return; } g_assert(prev->transition->output == transition->input); } alignment_operation = g_new(AlignmentOperation, 1); alignment_operation->transition = transition; alignment_operation->length = length; g_ptr_array_add(alignment->operation_list, alignment_operation); return; } /**/ static gchar Alignment_match_get_symbol(Sequence *seq, gint pos, gint advance, Translate *translate){ register gchar symbol = '\0'; switch(advance){ case 1: symbol = Sequence_get_symbol(seq, pos); break; case 3: g_assert(translate); symbol = Translate_base(translate, Sequence_get_symbol(seq, pos), Sequence_get_symbol(seq, pos+1), Sequence_get_symbol(seq, pos+2)); break; default: g_error("Cannot process match advance of [%d]", advance); break; } return symbol; } static gchar *Alignment_match_get_string(Sequence *seq, gint pos, gint advance, gint max, Translate *translate){ register gint ch; switch(max){ case 1: g_assert(advance == 1); return g_strdup_printf("%c", Sequence_get_symbol(seq, pos)); break; case 3: switch(advance){ case 1: g_assert(seq->alphabet->type == Alphabet_Type_PROTEIN); ch = Sequence_get_symbol(seq, pos); return g_strdup(Alphabet_aa2tla(ch)); break; case 3: g_assert(seq->alphabet->type == Alphabet_Type_DNA); return g_strdup_printf("%c%c%c", /* codon */ Sequence_get_symbol(seq, pos), Sequence_get_symbol(seq, pos+1), Sequence_get_symbol(seq, pos+2)); break; default: g_error("Cannot handle advance [%d]", advance); break; } break; default: g_error("Cannot find match display for max advance of [%d]", advance); break; } return NULL; } /**/ static gchar Alignment_get_gene_orientation(Alignment *alignment){ register gint i; register AlignmentOperation *ao; for(i = 0; i < alignment->operation_list->len; i++){ ao = alignment->operation_list->pdata[i]; if(ao->transition->label == C4_Label_5SS) return '+'; if(ao->transition->label == C4_Label_3SS) return '-'; } return '.'; } static gint Alignment_get_coordinate(Alignment *alignment, Sequence *query, Sequence *target, gboolean on_query, gboolean report_start){ register gint pos; register Alignment_ArgumentSet *aas = Alignment_ArgumentSet_create(NULL); if(on_query){ if(report_start){ pos = alignment->region->query_start; } else { pos = Region_query_end(alignment->region); } if(aas->forward_strand_coords && (query->strand == Sequence_Strand_REVCOMP)){ pos = query->len - pos; } } else { /* on_target */ if(report_start){ pos = alignment->region->target_start; } else { pos = Region_target_end(alignment->region); } if(aas->forward_strand_coords && (target->strand == Sequence_Strand_REVCOMP)){ pos = target->len - pos; } } return pos; } static gint Alignment_convert_coordinate(Alignment *alignment, Sequence *query, Sequence *target, gint query_pos, gint target_pos, gboolean on_query){ register gint pos; register Alignment_ArgumentSet *aas = Alignment_ArgumentSet_create(NULL); if(on_query){ pos = query_pos; if(aas->forward_strand_coords && (query->strand == Sequence_Strand_REVCOMP)){ pos = query->len - pos; } } else { /* on_target */ pos = target_pos; if(aas->forward_strand_coords && (target->strand == Sequence_Strand_REVCOMP)){ pos = target->len - pos; } } return pos; } static gint Alignment_get_max_pos_len(Alignment *alignment, Sequence *query, Sequence *target){ register gint qmax = MAX( Alignment_get_coordinate(alignment, query, target, TRUE, TRUE), Alignment_get_coordinate(alignment, query, target, TRUE, FALSE)); register gint tmax = MAX( Alignment_get_coordinate(alignment, query, target, FALSE, TRUE), Alignment_get_coordinate(alignment, query, target, FALSE, FALSE)); register gint max = MAX(qmax, tmax); register gchar *tmpstr = g_strdup_printf("%d", max); register gint maxlen = strlen(tmpstr); g_free(tmpstr); return maxlen; } /**/ typedef struct { gint query_pos; gint target_pos; } AlignmentPosition; typedef struct { gint query_separation; gint target_separation; } AlignmentSeparation; typedef struct { GString *outer_query; GString *inner_query; GString *middle; GString *inner_target; GString *outer_target; GPtrArray *row_marker; /* List containing AlignmentPosition */ gint max_pos_len; gint width; gint limit; gint query_intron_count; gint target_intron_count; gint joint_intron_count; gint intron_advance_query; gint intron_advance_target; gchar gene_orientation; gint ner_count; gint ner_advance_query; gint ner_advance_target; gint curr_split_codon_count; /* Add 2 for each split codon */ GPtrArray *split_codon_separation_list; /* List containing AlignmentSeparation */ } AlignmentView; static AlignmentView *AlignmentView_create(Alignment *alignment, Sequence *query, Sequence *target){ register AlignmentView *av = g_new(AlignmentView, 1); register AlignmentOperation *ao; register AlignmentSeparation *curr_split_codon_separation = NULL; register gint i; register Alignment_ArgumentSet *aas = Alignment_ArgumentSet_create(NULL); av->outer_query = g_string_sized_new(16); if(alignment->model->max_query_advance == 3) av->inner_query = g_string_sized_new(16); else av->inner_query = NULL; av->middle = g_string_sized_new(16); if(alignment->model->max_target_advance == 3) av->inner_target = g_string_sized_new(16); else av->inner_target = NULL; av->outer_target = g_string_sized_new(16); av->row_marker = g_ptr_array_new(); av->max_pos_len = Alignment_get_max_pos_len(alignment, query, target); av->width = aas->alignment_width-((av->max_pos_len + 5) << 1); g_assert(av->width > 0); av->limit = av->width; av->query_intron_count = 0; av->target_intron_count = 0; av->joint_intron_count = 0; av->intron_advance_query = 0; av->intron_advance_target = 0; av->gene_orientation = Alignment_get_gene_orientation(alignment); av->ner_count = 0; av->ner_advance_query = 0; av->ner_advance_target = 0; av->curr_split_codon_count = 0; av->split_codon_separation_list = g_ptr_array_new(); for(i = 0; i < alignment->operation_list->len; i++){ ao = alignment->operation_list->pdata[i]; if(curr_split_codon_separation){ if(ao->transition->label == C4_Label_SPLIT_CODON){ g_ptr_array_add(av->split_codon_separation_list, curr_split_codon_separation); curr_split_codon_separation = NULL; } else { curr_split_codon_separation->query_separation += (ao->length * ao->transition->advance_query); curr_split_codon_separation->target_separation += (ao->length * ao->transition->advance_target); } } else { if(ao->transition->label == C4_Label_SPLIT_CODON){ curr_split_codon_separation = g_new(AlignmentSeparation, 1); curr_split_codon_separation->query_separation = (ao->length * ao->transition->advance_query); curr_split_codon_separation->target_separation = (ao->length * ao->transition->advance_target); } } } /* Check we have an even number of split codons */ g_assert(!curr_split_codon_separation); return av; } static void AlignmentView_destroy(AlignmentView *av){ register gint i; for(i = 0; i < av->split_codon_separation_list->len; i++) g_free(av->split_codon_separation_list->pdata[i]); g_ptr_array_free(av->split_codon_separation_list, TRUE); g_string_free(av->outer_query, TRUE); g_string_free(av->middle, TRUE); g_string_free(av->outer_target, TRUE); if(av->inner_query) g_string_free(av->inner_query, TRUE); if(av->inner_target) g_string_free(av->inner_target, TRUE); for(i = 0; i < av->row_marker->len; i++) g_free(av->row_marker->pdata[i]); g_ptr_array_free(av->row_marker, TRUE); g_free(av); return; } static void AlignmentView_add(AlignmentView *av, gchar *query_string, gchar *inner_query_string, gchar *match_string, gchar *inner_target_string, gchar *target_string, gint query_pos, gint target_pos){ register AlignmentPosition *apos; register gint i; g_assert(strlen(query_string) == strlen(match_string)); g_assert(strlen(match_string) == strlen(target_string)); if(av->inner_query){ if(inner_query_string){ g_assert(strlen(match_string) == strlen(inner_query_string)); g_string_append(av->inner_query, inner_query_string); } else { for(i = strlen(match_string)-1; i >=0; i--) g_string_append_c(av->inner_query, ' '); } } if(av->inner_target){ if(inner_target_string){ g_assert(strlen(match_string) == strlen(inner_target_string)); g_string_append(av->inner_target, inner_target_string); } else { for(i = strlen(match_string)-1; i >=0; i--) g_string_append_c(av->inner_target, ' '); } } g_string_append(av->outer_query, query_string); g_string_append(av->middle, match_string); g_string_append(av->outer_target, target_string); if(av->outer_query->len >= av->limit){ apos = g_new(AlignmentPosition, 1); apos->query_pos = query_pos; apos->target_pos = target_pos; g_ptr_array_add(av->row_marker, apos); av->limit += av->width; } return; } typedef struct { gchar *match_string; gchar *codon; } AlignmentMatchData; static void Alignment_match_translate_reverse(gchar *dna, gint length, gpointer user_data){ register AlignmentMatchData *amd = user_data; register gint i; for(i = 0; i < 3; i++) if(dna[i] == amd->codon[i]) amd->match_string[i] = '!'; return; } static gchar Alignment_get_equiv_symbol(gchar symbol_a, gchar symbol_b, Submat *submat){ register gint score; g_assert(symbol_a); g_assert(symbol_b); if(submat){ score = Submat_lookup(submat, symbol_a, symbol_b); if(score == 0) return '.'; if(score > 0){ if(toupper(symbol_a) == toupper(symbol_b)) return '|'; else return ':'; } } else { if(symbol_a == symbol_b) return '|'; } return ' '; } static gchar *Alignment_get_codon_match_string(gchar *codon, gchar aa, Submat *protein_submat, Translate *translate){ register gchar codon_aa = Translate_codon(translate, codon); register gchar match_symbol; register gchar *codon_match; gchar aa_seq[2]; AlignmentMatchData amd; g_assert(translate); g_assert(protein_submat); match_symbol = Alignment_get_equiv_symbol(codon_aa, aa, protein_submat); codon_match = g_strnfill(3, match_symbol); if(match_symbol != '|'){ amd.match_string = codon_match; amd.codon = codon; aa_seq[0] = aa; aa_seq[1] = '\0'; Translate_reverse(translate, aa_seq, 1, Alignment_match_translate_reverse, &amd); } return codon_match; } static void AlignmentView_add_MATCH(AlignmentView *av, C4_Transition *transition, gint total_length, Sequence *query, Sequence *target, gint query_pos, gint target_pos, Submat *dna_submat, Submat *protein_submat, Translate *translate){ register gchar query_symbol, target_symbol; register gchar *query_string, *target_string; register gchar *inner_query_string = NULL, *inner_target_string = NULL; register gint max_advance; register gint i; register gint curr_query_pos = query_pos, curr_target_pos = target_pos; register Match *match = (Match*)transition->label_data; gchar match_string[4]; /* FIXME: temp: should be max_advance long */ for(i = 0; i < total_length; i++){ max_advance = MAX(transition->advance_query, transition->advance_target); query_string = Alignment_match_get_string(query, curr_query_pos, transition->advance_query, max_advance, translate); target_string = Alignment_match_get_string(target, curr_target_pos, transition->advance_target, max_advance, translate); query_symbol = Alignment_match_get_symbol(query, curr_query_pos, transition->advance_query, translate); target_symbol = Alignment_match_get_symbol(target, curr_target_pos, transition->advance_target, translate); if(transition->advance_query == 3) inner_query_string = Alphabet_aa2tla(query_symbol); if(transition->advance_target == 3) inner_target_string = Alphabet_aa2tla(target_symbol); if(match){ match->display_func(match, query, target, curr_query_pos, curr_target_pos, match_string); } else { /* In the absense of Match */ g_assert(transition->advance_query == 1); g_assert(transition->advance_target == 1); match_string[0] = (Sequence_get_symbol(query, query_pos) == Sequence_get_symbol(target, target_pos)) ?'|':' '; match_string[1] = '\0'; } AlignmentView_add(av, query_string, inner_query_string, match_string, inner_target_string, target_string, curr_query_pos, curr_target_pos); g_free(query_string); g_free(target_string); curr_query_pos += transition->advance_query; curr_target_pos += transition->advance_target; } return; } static void AlignmentView_add_GAP(AlignmentView *av, gint advance_query, gint advance_target, gint total_length, Sequence *query, Sequence *target, gint query_pos, gint target_pos, Translate *translate){ register gint i, j; register gint curr_query_pos = query_pos, curr_target_pos = target_pos; register gchar *seq_string = g_new(gchar, 4), *match_string = g_new(gchar, 4), *gap_string = g_new(gchar, 4); register Alphabet_Type emitted_alphabet_type; register gchar tr_codon, *tr_codon_name; register gboolean is_translating = ( ( (query->alphabet->type == Alphabet_Type_PROTEIN) && (target->alphabet->type == Alphabet_Type_DNA)) ||( (query->alphabet->type == Alphabet_Type_DNA) && (target->alphabet->type == Alphabet_Type_PROTEIN)) || ((advance_query|advance_target) == 3)); g_assert(!(advance_query && advance_target)); match_string[0] = ' '; gap_string[0] = '-'; if(advance_query) emitted_alphabet_type = query->alphabet->type; else /* advance_target */ emitted_alphabet_type = target->alphabet->type; for(i = 0; i < total_length; i++){ for(j = 0; j < (advance_query|advance_target); j++){ if(advance_query) seq_string[j] = Sequence_get_symbol(query, curr_query_pos+j); else /* advance_target */ seq_string[j] = Sequence_get_symbol(target, curr_target_pos+j); match_string[j] = match_string[0]; gap_string[j] = gap_string[0]; } seq_string[j] = match_string[j] = gap_string[j] = '\0'; tr_codon_name = NULL; if(is_translating){ if(emitted_alphabet_type == Alphabet_Type_PROTEIN){ strncpy(seq_string, Alphabet_aa2tla(seq_string[0]), 3); match_string[2] = match_string[1] = match_string[0]; gap_string[2] = gap_string[1] = gap_string[0]; seq_string[3] = match_string[3] = gap_string[3] = '\0'; } if((advance_query|advance_target) == 3){ gap_string[0] = '<'; gap_string[1] = '-'; gap_string[2] = '>'; gap_string[3] = '\0'; tr_codon = Translate_codon(translate, seq_string); tr_codon_name = Alphabet_aa2tla(tr_codon); } } if(advance_query) AlignmentView_add(av, seq_string, tr_codon_name, match_string, is_translating?gap_string:NULL, gap_string, curr_query_pos, curr_target_pos); else AlignmentView_add(av, gap_string, is_translating?gap_string:NULL, match_string, tr_codon_name, seq_string, curr_query_pos, curr_target_pos); curr_query_pos += advance_query; curr_target_pos += advance_target; } g_free(seq_string); g_free(match_string); g_free(gap_string); return; } /* Display gap:[- N] codon[<-> NNN] frameshift [-*N] */ static void AlignmentView_set_consensus_ss_string(AlignmentView *av, gboolean is_5_prime, gchar *splice_site, gchar *consensus_string){ register gchar cons_a, cons_b; if(av->gene_orientation == '+'){ if(is_5_prime){ /* FWD: gt..ag */ cons_a = 'G'; cons_b = 'T'; } else { cons_a = 'A'; cons_b = 'G'; } } else { g_assert(av->gene_orientation == '-'); if(is_5_prime){ /* REV: ct..ac */ cons_a = 'A'; cons_b = 'C'; } else { cons_a = 'C'; cons_b = 'T'; } } if(toupper(splice_site[0]) == cons_a) consensus_string[0] = '+'; else consensus_string[0] = '-'; if(toupper(splice_site[1]) == cons_b) consensus_string[1] = '+'; else consensus_string[1] = '-'; return; } static void AlignmentView_add_SPLICE_SITE(AlignmentView *av, gint advance_query, gint advance_target, gint total_length, Sequence *query, Sequence *target, gint query_pos, gint target_pos, gboolean is_5_prime, C4_Transition *last_match){ register gchar *gap_string = " "; static gchar qy_seq_string[3], tg_seq_string[3], qy_cons_string[3], tg_cons_string[3]; qy_cons_string[0] = qy_cons_string[1] = ' '; tg_cons_string[0] = tg_cons_string[1] = ' '; qy_cons_string[2] = tg_cons_string[2] = '\0'; qy_seq_string[2] = tg_seq_string[2] = '\0'; if(advance_query == 2){ qy_seq_string[0] = Sequence_get_symbol(query, query_pos); qy_seq_string[1] = Sequence_get_symbol(query, query_pos+1); AlignmentView_set_consensus_ss_string(av, is_5_prime, qy_seq_string, qy_cons_string); g_strdown(qy_seq_string); } if(advance_target == 2){ tg_seq_string[0] = Sequence_get_symbol(target, target_pos); tg_seq_string[1] = Sequence_get_symbol(target, target_pos+1); AlignmentView_set_consensus_ss_string(av, is_5_prime, tg_seq_string, tg_cons_string); g_strdown(tg_seq_string); } if(advance_query == 2){ if(advance_target == 2){ /* Joint intron */ AlignmentView_add(av, qy_seq_string, qy_cons_string, gap_string, tg_cons_string, tg_seq_string, query_pos, target_pos); } else { /* Query intron */ g_assert(advance_target == 0); g_strdown(qy_seq_string); g_assert(last_match); if(last_match->advance_query == 3) AlignmentView_add(av, qy_seq_string, qy_cons_string, gap_string, gap_string, gap_string, query_pos, target_pos); else AlignmentView_add(av, qy_seq_string, NULL, qy_cons_string, NULL, gap_string, query_pos, target_pos); } } else { /* Target intron */ g_assert(advance_query == 0); g_assert(advance_target == 2); g_strdown(tg_seq_string); g_assert(last_match); if(last_match->advance_target == 3) AlignmentView_add(av, gap_string, gap_string, gap_string, tg_cons_string, tg_seq_string, query_pos, target_pos); else AlignmentView_add(av, gap_string, NULL, tg_cons_string, NULL, tg_seq_string, query_pos, target_pos); } return; } static void AlignmentView_add_INTRON(AlignmentView *av, gint advance_query, gint advance_target, Sequence *query, Sequence *target, gint query_pos, gint target_pos, C4_Transition *last_match){ register gchar *dir_sign = "????", *label = NULL; register gint fill, intron_count = 0; register gchar *name_string, *middle_string, *gap_string, *pad_string, *intron_name; g_assert(last_match); if(av->gene_orientation == '+'){ dir_sign = ">>>>"; } else if(av->gene_orientation == '-'){ dir_sign = "<<<<"; } if(advance_query){ if(advance_target){ intron_count = ++av->joint_intron_count; intron_name = "Joint"; label = g_strdup_printf("%d bp // %d bp", advance_query+4, advance_target+4); } else { intron_count = ++av->query_intron_count; intron_name = "Query"; label = g_strdup_printf("%d bp", advance_query+4); } } else { intron_count = ++av->target_intron_count; intron_name = "Target"; label = g_strdup_printf("%d bp", advance_target+4); } name_string = g_strdup_printf("%s %s Intron %d %s", dir_sign, intron_name, intron_count, dir_sign); g_assert(strlen(name_string) > strlen(label)); fill = (strlen(name_string)-strlen(label))+1; middle_string = g_strdup_printf("%*c%s%*c", ((fill|1)>>1), ' ', label, ((fill-1)>>1), ' '); gap_string = g_strnfill(strlen(name_string), '.'); pad_string = g_strnfill(strlen(name_string), '^'); g_assert(strlen(name_string) == strlen(middle_string)); g_assert(strlen(middle_string) == strlen(gap_string)); if(advance_query){ if(advance_target){ /* joint intron */ AlignmentView_add(av, name_string, NULL, middle_string, NULL, name_string, query_pos, target_pos); } else { /* query intron */ if(last_match->advance_query == 3) AlignmentView_add(av, gap_string, pad_string, middle_string, pad_string, name_string, query_pos, target_pos); else AlignmentView_add(av, gap_string, NULL, middle_string, NULL, name_string, query_pos, target_pos); } } else { /* target intron */ if(last_match->advance_target == 3) AlignmentView_add(av, name_string, pad_string, middle_string, pad_string, gap_string, query_pos, target_pos); else AlignmentView_add(av, name_string, NULL, middle_string, NULL, gap_string, query_pos, target_pos); } g_free(label); g_free(name_string); g_free(gap_string); g_free(pad_string); g_free(middle_string); return; } static void AlignmentView_add_NER(AlignmentView *av, gint advance_query, gint advance_target, gint query_pos, gint target_pos){ register gchar *upper_string = g_strdup_printf("%d", advance_query), *middle_string = g_strdup_printf("NER %d", ++av->ner_count), *lower_string = g_strdup_printf("%d", advance_target); register gint upper_len = strlen(upper_string), middle_len = strlen(middle_string), lower_len = strlen(lower_string), max_len; register gchar *upper_padded, *middle_padded, *lower_padded; max_len = upper_len; if(max_len < middle_len) max_len = middle_len; if(max_len < lower_len) max_len = lower_len; upper_padded = g_strdup_printf("--<%*c%s%*c>--", 1+(((max_len-upper_len)+1)>>1), ' ', upper_string, 1+((max_len-upper_len)>>1), ' '); middle_padded = g_strdup_printf("--<%*c%s%*c>--", 1+(((max_len-middle_len)+1)>>1), ' ', middle_string, 1+((max_len-middle_len)>>1), ' '); lower_padded = g_strdup_printf("--<%*c%s%*c>--", 1+(((max_len-lower_len)+1)>>1), ' ', lower_string, 1+((max_len-lower_len)>>1), ' '); g_free(upper_string); g_free(middle_string); g_free(lower_string); AlignmentView_add(av, upper_padded, NULL, middle_padded, NULL, lower_padded, query_pos, target_pos); g_free(upper_padded); g_free(middle_padded); g_free(lower_padded); return; } #define AlignmentView_combine_advance(advance_query, advance_target) \ (((advance_query) << 8) | (advance_target)) static void AlignmentView_add_SPLIT_CODON(AlignmentView *av, Sequence *query, Sequence *target, gint advance_query, gint advance_target, gint query_pos, gint target_pos, Submat *protein_submat, Translate *translate){ register gchar *query_string = NULL, *target_string = NULL, *match_string = NULL, *codon_match; register gchar qy_aa_symbol = '\0', tg_aa_symbol = '\0', *qy_aa_name = NULL, *tg_aa_name = NULL, *tr_codon_name; register gchar *inner_query_string = NULL, *inner_target_string = NULL; register gint pos_to_start = -1, num_to_print; register gint qp0 = 0, qp1 = 0, qp2 = 0, tp0 = 0, tp1 = 0, tp2 = 0; register gboolean query_is_dna, target_is_dna, before_intron; gchar qy_codon[4], tg_codon[4]; register AlignmentSeparation *as = av->split_codon_separation_list->pdata [av->curr_split_codon_count >> 1]; g_assert(advance_query || advance_target); query_is_dna = (query->alphabet->type == Alphabet_Type_DNA); target_is_dna = (target->alphabet->type == Alphabet_Type_DNA); before_intron = (av->curr_split_codon_count & 1)?FALSE:TRUE; qy_codon[0] = qy_codon[1] = qy_codon[2] = qy_codon[3] = '\0'; tg_codon[0] = tg_codon[1] = tg_codon[2] = tg_codon[3] = '\0'; /**/ if(query_is_dna && target_is_dna){ switch(AlignmentView_combine_advance( advance_query, advance_target)){ case AlignmentView_combine_advance(1, 1): if(before_intron){ pos_to_start = 0; qp0 = query_pos; qp1 = query_pos + as->query_separation; qp2 = query_pos + as->query_separation + 1; tp0 = target_pos; tp1 = target_pos + as->target_separation; tp2 = target_pos + as->target_separation + 1; } else { /* after_intron */ pos_to_start = 2; qp0 = query_pos - as->query_separation; qp1 = query_pos - as->query_separation + 1; qp2 = query_pos; tp0 = target_pos - as->target_separation; tp1 = target_pos - as->target_separation + 1; tp2 = target_pos; } break; case AlignmentView_combine_advance(2, 2): if(before_intron){ pos_to_start = 0; qp0 = query_pos; qp1 = query_pos + 1; qp2 = query_pos + as->query_separation; tp0 = target_pos; tp1 = target_pos + 1; tp2 = target_pos + as->target_separation; } else { /* after_intron */ pos_to_start = 1; qp0 = query_pos - as->query_separation; qp1 = query_pos; qp2 = query_pos + 1; tp0 = target_pos - as->target_separation; tp1 = target_pos; tp2 = target_pos + 1; } break; default: g_error("Unexpected d2d split codon [%d,%d]", advance_query, advance_target); break; } qy_codon[0] = Sequence_get_symbol(query, qp0); qy_codon[1] = Sequence_get_symbol(query, qp1); qy_codon[2] = Sequence_get_symbol(query, qp2); tg_codon[0] = Sequence_get_symbol(target, tp0); tg_codon[1] = Sequence_get_symbol(target, tp1); tg_codon[2] = Sequence_get_symbol(target, tp2); } else { if(query_is_dna){ g_assert(target->alphabet->type == Alphabet_Type_PROTEIN); tg_aa_symbol = Sequence_get_symbol(target, target_pos); switch(AlignmentView_combine_advance( advance_query, advance_target)){ case AlignmentView_combine_advance(1, 0): pos_to_start = 0; qp0 = query_pos; qp1 = query_pos + as->query_separation; qp2 = query_pos + as->query_separation + 1; break; case AlignmentView_combine_advance(2, 0): pos_to_start = 0; qp0 = query_pos; qp1 = query_pos + 1; qp2 = query_pos + as->query_separation; break; case AlignmentView_combine_advance(2, 1): pos_to_start = 1; qp0 = query_pos - as->query_separation; qp1 = query_pos; qp2 = query_pos + 1; break; case AlignmentView_combine_advance(1, 1): pos_to_start = 2; qp0 = query_pos - as->query_separation; qp1 = query_pos - as->query_separation + 1; qp2 = query_pos; break; default: g_error("Unexpected d2p split codon [%d,%d]", advance_query, advance_target); break; } qy_codon[0] = Sequence_get_symbol(query, qp0); qy_codon[1] = Sequence_get_symbol(query, qp1); qy_codon[2] = Sequence_get_symbol(query, qp2); } else { g_assert(query->alphabet->type == Alphabet_Type_PROTEIN); g_assert(target->alphabet->type == Alphabet_Type_DNA); qy_aa_symbol = Sequence_get_symbol(query, query_pos); switch(AlignmentView_combine_advance( advance_query, advance_target)){ case AlignmentView_combine_advance(0, 1): pos_to_start = 0; tp0 = target_pos; tp1 = target_pos + as->target_separation; tp2 = target_pos + as->target_separation + 1; break; case AlignmentView_combine_advance(0, 2): pos_to_start = 0; tp0 = target_pos; tp1 = target_pos + 1; tp2 = target_pos + as->target_separation; break; case AlignmentView_combine_advance(1, 2): pos_to_start = 1; tp0 = target_pos - as->target_separation; tp1 = target_pos; tp2 = target_pos + 1; break; case AlignmentView_combine_advance(1, 1): pos_to_start = 2; tp0 = target_pos - as->target_separation; tp1 = target_pos - as->target_separation + 1; tp2 = target_pos; break; default: g_error("Unexpected p2d split codon [%d,%d]", advance_query, advance_target); break; } tg_codon[0] = Sequence_get_symbol(target, tp0); tg_codon[1] = Sequence_get_symbol(target, tp1); tg_codon[2] = Sequence_get_symbol(target, tp2); } } g_assert(pos_to_start != -1); av->curr_split_codon_count++; /**/ if(!query_is_dna) qy_aa_name = Alphabet_aa2tla(qy_aa_symbol); if(!target_is_dna) tg_aa_name = Alphabet_aa2tla(tg_aa_symbol); num_to_print = MAX(advance_query, advance_target); query_string = g_strdup_printf("{%.*s}", num_to_print, (query_is_dna?qy_codon:qy_aa_name)+pos_to_start); target_string = g_strdup_printf("{%.*s}", num_to_print, (target_is_dna?tg_codon:tg_aa_name)+pos_to_start); g_strup(qy_codon); g_strup(tg_codon); if(query_is_dna){ g_assert(!qy_aa_symbol); qy_aa_symbol = Translate_codon(translate, qy_codon); tr_codon_name = Alphabet_aa2tla(qy_aa_symbol); inner_query_string = g_strdup_printf("{%.*s}", num_to_print, tr_codon_name+pos_to_start); } if(target_is_dna){ g_assert(!tg_aa_symbol); tg_aa_symbol = Translate_codon(translate, tg_codon); tr_codon_name = Alphabet_aa2tla(tg_aa_symbol); inner_target_string = g_strdup_printf("{%.*s}", num_to_print, tr_codon_name+pos_to_start); } /**/ if(query_is_dna){ if(target_is_dna){ /* d,d */ codon_match = g_strnfill(3, Alignment_get_equiv_symbol( qy_aa_symbol, tg_aa_symbol, protein_submat)); } else { /* d,p */ codon_match = Alignment_get_codon_match_string( qy_codon, tg_aa_symbol, protein_submat, translate); } } else { /* p,d */ codon_match = Alignment_get_codon_match_string( tg_codon, qy_aa_symbol, protein_submat, translate); } match_string = g_strdup_printf("{%.*s}", num_to_print, codon_match+pos_to_start); g_free(codon_match); /**/ g_assert(query_string); g_assert(match_string); g_assert(target_string); AlignmentView_add(av, query_string, inner_query_string, match_string, inner_target_string, target_string, query_pos, target_pos); g_free(query_string); g_free(match_string); g_free(target_string); if(inner_query_string) g_free(inner_query_string); if(inner_target_string) g_free(inner_target_string); return; } static void AlignmentView_add_FRAMESHIFT(AlignmentView *av, gint advance_query, gint advance_target, gint total_length, Sequence *query, Sequence *target, gint query_pos, gint target_pos, Translate *translate){ register gint i, j; register gint curr_query_pos = query_pos, curr_target_pos = target_pos; static gchar seq_string[4], match_string[4], gap_string[4]; register Alphabet_Type emitted_alphabet_type; g_assert(!(advance_query && advance_target)); match_string[0] = '#'; /* Frameshift */ gap_string[0] = '-'; if(advance_query) emitted_alphabet_type = query->alphabet->type; else /* advance_target */ emitted_alphabet_type = target->alphabet->type; for(i = 0; i < total_length; i++){ for(j = 0; j < (advance_query|advance_target); j++){ if(advance_query) seq_string[j] = Sequence_get_symbol(query, curr_query_pos+j); else /* advance_target */ seq_string[j] = Sequence_get_symbol(target, curr_target_pos+j); match_string[j] = match_string[0]; gap_string[j] = gap_string[0]; } seq_string[j] = match_string[j] = gap_string[j] = '\0'; if(emitted_alphabet_type == Alphabet_Type_PROTEIN){ strncpy(seq_string, Alphabet_aa2tla(seq_string[0]), 3); match_string[2] = match_string[1] = match_string[0]; gap_string[2] = gap_string[1] = gap_string[0]; seq_string[3] = match_string[3] = gap_string[3] = '\0'; } if(advance_query) AlignmentView_add(av, seq_string, match_string, match_string, gap_string, gap_string, curr_query_pos, curr_target_pos); else AlignmentView_add(av, gap_string, gap_string, match_string, match_string, seq_string, curr_query_pos, curr_target_pos); curr_query_pos += advance_query; curr_target_pos += advance_target; } return; } static void AlignmentView_add_label_operation(AlignmentView *av, C4_Transition *transition, gint total_length, Sequence *query, Sequence *target, gint query_pos, gint target_pos, Submat *dna_submat, Submat *protein_submat, Translate *translate, gboolean next_has_same_label, C4_Transition **last_match){ switch(transition->label){ case C4_Label_NONE: g_assert(!transition->advance_query); g_assert(!transition->advance_target); break; case C4_Label_MATCH: (*last_match) = transition; AlignmentView_add_MATCH(av, transition, total_length, query, target, query_pos, target_pos, dna_submat, protein_submat, translate); break; case C4_Label_GAP: AlignmentView_add_GAP(av, transition->advance_query, transition->advance_target, total_length, query, target, query_pos, target_pos, translate); break; case C4_Label_5SS: /*fallthrough*/ AlignmentView_add_SPLICE_SITE(av, transition->advance_query, transition->advance_target, total_length, query, target, query_pos, target_pos, TRUE, (*last_match)); break; case C4_Label_3SS: AlignmentView_add_SPLICE_SITE(av, transition->advance_query, transition->advance_target, total_length, query, target, query_pos, target_pos, FALSE, (*last_match)); break; case C4_Label_INTRON: av->intron_advance_query += (transition->advance_query * total_length); av->intron_advance_target += (transition->advance_target * total_length); if(!next_has_same_label){ AlignmentView_add_INTRON(av, av->intron_advance_query, av->intron_advance_target, query, target, query_pos, target_pos, (*last_match)); av->intron_advance_query = 0; av->intron_advance_target = 0; } break; case C4_Label_NER: av->ner_advance_query += (transition->advance_query * total_length); av->ner_advance_target += (transition->advance_target * total_length); if(!next_has_same_label){ AlignmentView_add_NER(av, av->ner_advance_query, av->ner_advance_target, query_pos, target_pos); av->ner_advance_query = 0; av->ner_advance_target = 0; } break; case C4_Label_SPLIT_CODON: g_assert(total_length == 1); AlignmentView_add_SPLIT_CODON(av, query, target, transition->advance_query, transition->advance_target, query_pos, target_pos, protein_submat, translate); break; case C4_Label_FRAMESHIFT: AlignmentView_add_FRAMESHIFT(av, transition->advance_query, transition->advance_target, total_length, query, target, query_pos, target_pos, translate); break; default: g_error("AlignmentView cannot use label [%s]", C4_Label_get_name(transition->label)); break; } return; } static void AlignmentView_prepare(AlignmentView *av, Alignment *alignment, Sequence *query, Sequence *target, Submat *dna_submat, Submat *protein_submat, Translate *translate){ register gint i; register gint query_pos = alignment->region->query_start, target_pos = alignment->region->target_start; register AlignmentPosition *apos; register AlignmentOperation *ao, *prev_ao; register gint total_length; C4_Transition *last_match = NULL; apos = g_new(AlignmentPosition, 1); apos->query_pos = alignment->region->query_start-1; apos->target_pos = alignment->region->target_start-1; g_ptr_array_add(av->row_marker, apos); prev_ao = alignment->operation_list->pdata[0]; total_length = prev_ao->length; for(i = 1; i < alignment->operation_list->len; i++){ ao = alignment->operation_list->pdata[i]; if(prev_ao->transition == ao->transition){ /* group */ total_length += ao->length; } else { /* report */ AlignmentView_add_label_operation(av, prev_ao->transition, total_length, query, target, query_pos, target_pos, dna_submat, protein_submat, translate, (prev_ao->transition->label == ao->transition->label), &last_match); query_pos += (prev_ao->transition->advance_query * total_length); target_pos += (prev_ao->transition->advance_target * total_length); /* start new */ prev_ao = ao; total_length = prev_ao->length; } } AlignmentView_add_label_operation(av, prev_ao->transition, total_length, query, target, query_pos, target_pos, dna_submat, protein_submat, translate, FALSE, &last_match); apos = g_new(AlignmentPosition, 1); apos->query_pos = Region_query_end(alignment->region)-1; apos->target_pos = Region_target_end(alignment->region)-1; g_ptr_array_add(av->row_marker, apos); return; } static gboolean AlignmentView_string_is_empty(GString *str, gint pos, gint width){ register gint i; for(i = 0; i < width; i++) if(str->str[pos+i] != ' ') return FALSE; return TRUE; } static void AlignmentView_prepare_seq(GString *outer, GString *inner, gint pos, gint width){ register gint i; register gchar t; for(i = 0; i < width; i++){ if(inner->str[pos+i] == ' '){ /* Swap empty */ t = inner->str[pos+i]; inner->str[pos+i] = outer->str[pos+i]; outer->str[pos+i] = t; continue; } if(inner->str[pos+i] == '^') /* Replace padding */ inner->str[pos+i] = ' '; } return; } static void AlignmentView_replace_padding(GString *str, gint pos, gint width){ register gint i; for(i = 0; i < width; i++){ if(str->str[pos+i] == '^') str->str[pos+i] = ' '; } return; } static void AlignmentView_display_row(AlignmentView *av, gint row, gint pos, gint width, gint maxposlen, Sequence *query, Sequence *target, FILE *fp){ register AlignmentPosition *apos1 = av->row_marker->pdata[row], *apos2 = av->row_marker->pdata[row+1]; register gint p1q, p1t, p2q, p2t; register Alignment_ArgumentSet *aas = Alignment_ArgumentSet_create(NULL); register gboolean show_inner_query = FALSE, show_inner_target = FALSE; p1q = apos1->query_pos+1; p2q = apos2->query_pos+1; p1t = apos1->target_pos+1; p2t = apos2->target_pos+1; if(aas->forward_strand_coords){ if(query->strand == Sequence_Strand_REVCOMP){ p1q = query->len - p1q - 1; p2q = query->len - p2q + 1; } if(target->strand == Sequence_Strand_REVCOMP){ p1t = target->len - p1t - 1; p2t = target->len - p2t + 1; } } if(av->inner_query && (!AlignmentView_string_is_empty(av->inner_query, pos, width))){ show_inner_query = TRUE; AlignmentView_prepare_seq(av->outer_query, av->inner_query, pos, width); } if(av->inner_target && (!AlignmentView_string_is_empty(av->inner_target, pos, width))){ show_inner_target = TRUE; AlignmentView_prepare_seq(av->outer_target, av->inner_target, pos, width); } AlignmentView_replace_padding(av->outer_query, pos, width); AlignmentView_replace_padding(av->outer_target, pos, width); fprintf(fp, " %*d : %.*s : %*d\n", maxposlen, p1q+1, width, av->outer_query->str+pos, maxposlen, p2q); if(show_inner_query) fprintf(fp, " %*s %.*s\n", maxposlen, " ", width, av->inner_query->str+pos); fprintf(fp, " %*s %.*s\n", maxposlen, " ", width, av->middle->str+pos); if(show_inner_target) fprintf(fp, " %*s %.*s\n", maxposlen, " ", width, av->inner_target->str+pos); fprintf(fp, " %*d : %.*s : %*d\n", maxposlen, p1t+1, width, av->outer_target->str+pos, maxposlen, p2t); return; } static void AlignmentView_display(AlignmentView *av, Sequence *query, Sequence *target, FILE *fp){ register gint pos = 0, pause, row = 0; pause = av->outer_query->len-av->width; while(pos < pause){ AlignmentView_display_row(av, row, pos, av->width, av->max_pos_len, query, target, fp); pos += av->width; row++; fprintf(fp, "\n"); } AlignmentView_display_row(av, row, pos, av->outer_query->len-pos, av->max_pos_len, query, target, fp); fprintf(fp, "\n"); return; } /**/ void Alignment_display(Alignment *alignment, Sequence *query, Sequence *target, Submat *dna_submat, Submat *protein_submat, Translate *translate, FILE *fp){ register AlignmentView *av = AlignmentView_create(alignment, query, target); g_assert(alignment); g_assert(!alignment->model->is_open); /* Display header */ fprintf(fp, "\n" "C4 Alignment:\n" "------------\n" " Query: %s%s%s\n" " Target: %s%s%s\n" " Model: %s\n" " Raw score: %d\n" " Query range: %d -> %d\n" " Target range: %d -> %d\n\n", query->id, query->def?" ":"", query->def?query->def:"", target->id, target->def?" ":"", target->def?target->def:"", alignment->model->name, alignment->score, Alignment_get_coordinate(alignment, query, target, TRUE, TRUE), Alignment_get_coordinate(alignment, query, target, TRUE, FALSE), Alignment_get_coordinate(alignment, query, target, FALSE, TRUE), Alignment_get_coordinate(alignment, query, target, FALSE, FALSE)); AlignmentView_prepare(av, alignment, query, target, dna_submat, protein_submat, translate); AlignmentView_display(av, query, target, fp); AlignmentView_destroy(av); return; } /**/ static gint Alignment_get_equivalenced_matching(Alignment *alignment, Sequence *query, Sequence *target, Translate *translate, gboolean report_id, gpointer user_data){ register gint i, j, match = 0; register AlignmentOperation *ao; register gint query_pos = alignment->region->query_start, target_pos = alignment->region->target_start; register gchar qy_symbol, tg_symbol; for(i = 0; i < alignment->operation_list->len; i++){ ao = alignment->operation_list->pdata[i]; if(ao->transition->label == C4_Label_MATCH){ for(j = 0; j < ao->length; j++){ qy_symbol = Alignment_match_get_symbol(query, query_pos, ao->transition->advance_query, translate); tg_symbol = Alignment_match_get_symbol(target, target_pos, ao->transition->advance_target, translate); if(report_id){ if(toupper(qy_symbol) == toupper(tg_symbol)) match++; } else { /* report similarity */ if(C4_Calc_score(ao->transition->calc, query_pos, target_pos, user_data) > 0) match++; } query_pos += ao->transition->advance_query; target_pos += ao->transition->advance_target; } } else { query_pos += (ao->transition->advance_query * ao->length); target_pos += (ao->transition->advance_target * ao->length); } } return match; } /* Reports the %id over the equivalenced regions of the alignment */ static gint Alignment_get_equivalenced_matching_region(Alignment *alignment, Sequence *query, Sequence *target, Translate *translate, gboolean report_id, gpointer user_data, gint exon_query_start, gint exon_query_end){ register gint i, j, match = 0; register AlignmentOperation *ao; register gint query_pos = alignment->region->query_start, target_pos = alignment->region->target_start; register gchar qy_symbol, tg_symbol; for(i = 0; i < alignment->operation_list->len; i++){ ao = alignment->operation_list->pdata[i]; if(ao->transition->label == C4_Label_MATCH){ for(j = 0; j < ao->length; j++){ if(query_pos > exon_query_end) return match; if(query_pos >= exon_query_start){ qy_symbol = Alignment_match_get_symbol(query, query_pos, ao->transition->advance_query, translate); tg_symbol = Alignment_match_get_symbol(target, target_pos, ao->transition->advance_target, translate); if(report_id){ if(toupper(qy_symbol) == toupper(tg_symbol)) match++; } else { /* report similarity */ if(C4_Calc_score(ao->transition->calc, query_pos, target_pos, user_data) > 0) match++; } } query_pos += ao->transition->advance_query; target_pos += ao->transition->advance_target; } } else { query_pos += (ao->transition->advance_query * ao->length); target_pos += (ao->transition->advance_target * ao->length); } } return match; } /* Reports the %id over the equivalenced regions of the alignment */ static gint Alignment_get_equivalenced_total(Alignment *alignment){ register gint i, total = 0; register AlignmentOperation *ao; for(i = 0; i < alignment->operation_list->len; i++){ ao = alignment->operation_list->pdata[i]; if(ao->transition->label == C4_Label_MATCH) total += ao->length; } return total; } static gint Alignment_get_equivalenced_total_region(Alignment *alignment, gint exon_query_start, gint exon_query_end){ register gint i, j, total = 0; register gint query_pos = alignment->region->query_start, target_pos = alignment->region->target_start; register AlignmentOperation *ao; for(i = 0; i < alignment->operation_list->len; i++){ ao = alignment->operation_list->pdata[i]; if(ao->transition->label == C4_Label_MATCH){ for(j = 0; j < ao->length; j++){ if(query_pos > exon_query_end) return total; if(query_pos >= exon_query_start) total++; query_pos += ao->transition->advance_query; target_pos += ao->transition->advance_target; } } else { query_pos += (ao->transition->advance_query * ao->length); target_pos += (ao->transition->advance_target * ao->length); } } return total; } /* FIXME: optimisation: should use gene object to avoid full alignment pass */ static gfloat Alignment_get_percent_score_region(Alignment *alignment, Sequence *query, Sequence *target, Translate *translate, gboolean report_id, gpointer user_data, gint exon_query_start, gint exon_query_end){ return (((gfloat)Alignment_get_equivalenced_matching_region(alignment, query, target, translate, report_id, user_data, exon_query_start, exon_query_end)) / ((gfloat)Alignment_get_equivalenced_total_region(alignment, exon_query_start, exon_query_end)))*100; } /* FIXME: should also count split codons */ static gfloat Alignment_get_percent_score(Alignment *alignment, Sequence *query, Sequence *target, Translate *translate, gboolean report_id, gpointer user_data){ return (((gfloat)Alignment_get_equivalenced_matching(alignment, query, target, translate, report_id, user_data)) / ((gfloat)Alignment_get_equivalenced_total(alignment)))*100; } /* FIXME: should also count split codons */ static gint Alignment_get_match_score(Alignment *alignment, Sequence *query, Sequence *target, gpointer user_data){ register gint i, j; register AlignmentOperation *ao; register C4_Score score = 0; register gint query_pos = alignment->region->query_start, target_pos = alignment->region->target_start; for(i = 0; i < alignment->operation_list->len; i++){ ao = alignment->operation_list->pdata[i]; for(j = 0; j < ao->length; j++){ if(ao->transition->label == C4_Label_MATCH){ score += C4_Calc_score(ao->transition->calc, query_pos, target_pos, user_data); } query_pos += ao->transition->advance_query; target_pos += ao->transition->advance_target; } } g_message("MATCH SCORE [%d]", score); return score; } static gint Alignment_get_self_match_score(Alignment *alignment, Sequence *query, Sequence *target, gpointer self_data){ register gint i, j; register AlignmentOperation *ao; register C4_Score score = 0; register gint query_pos = alignment->region->query_start; for(i = 0; i < alignment->operation_list->len; i++){ ao = alignment->operation_list->pdata[i]; for(j = 0; j < ao->length; j++){ if(ao->transition->label == C4_Label_MATCH){ /* Use self_data to find self-comparison scores */ score += C4_Calc_score(ao->transition->calc, query_pos, query_pos, self_data); } query_pos += ao->transition->advance_query; } } g_message("SELF MATCH SCORE [%d]", score); return score; } static gfloat Alignment_get_percent_self(Alignment *alignment, Sequence *query, Sequence *target, Translate *translate, gpointer user_data, gpointer self_data){ return (((gfloat)Alignment_get_match_score(alignment, query, target, user_data)) / ((gfloat)Alignment_get_self_match_score(alignment, query, target, self_data)))*100; } /* Returns score as a percentage of maximum possible * over the equivalenced transitions */ /**/ static void Alignment_print_sugar_block(Alignment *alignment, Sequence *query, Sequence *target, FILE *fp){ fprintf(fp, "%s %d %d %c %s %d %d %c %d", query->id, Alignment_get_coordinate(alignment, query, target, TRUE, TRUE), Alignment_get_coordinate(alignment, query, target, TRUE, FALSE), Sequence_get_strand_as_char(query), target->id, Alignment_get_coordinate(alignment, query, target, FALSE, TRUE), Alignment_get_coordinate(alignment, query, target, FALSE, FALSE), Sequence_get_strand_as_char(target), alignment->score); return; } static gchar Alignment_get_cigar_type(AlignmentOperation *ao, gint *move){ if(!ao->transition->advance_query){ (*move) = ao->transition->advance_target * ao->length; return 'D'; } if(!ao->transition->advance_target){ (*move) = ao->transition->advance_query * ao->length; return 'I'; } (*move) = MAX(ao->transition->advance_query, ao->transition->advance_target) * ao->length; return 'M'; } static void Alignment_print_cigar_block(Alignment *alignment, Sequence *query, Sequence *target, FILE *fp){ register gint i = 0; register gchar *gap = ""; register AlignmentOperation *ao; register gchar type, next_type; gint move, next_move; ao = alignment->operation_list->pdata[i]; type = Alignment_get_cigar_type(ao, &move); for(i = 1; i < alignment->operation_list->len; i++){ ao = alignment->operation_list->pdata[i]; next_type = Alignment_get_cigar_type(ao, &next_move); if(type == next_type){ move += next_move; } else { if(move) fprintf(fp, "%s%c %d", gap, type, move); move = next_move; type = next_type; gap = " "; } } if(move) fprintf(fp, "%s%c %d", gap, type, move); return; } static void Alignment_print_vulgar_block(Alignment *alignment, Sequence *query, Sequence *target, FILE *fp){ register gint i; register gchar *gap = ""; register C4_Label curr_label; register gint curr_advance_query, curr_advance_target; register AlignmentOperation *ao; register gboolean curr_is_codon = FALSE; ao = alignment->operation_list->pdata[0]; curr_label = ao->transition->label; curr_advance_query = (ao->transition->advance_query * ao->length); curr_advance_target = (ao->transition->advance_target * ao->length); for(i = 1; i < alignment->operation_list->len; i++){ ao = alignment->operation_list->pdata[i]; if((ao->transition->label == curr_label) && (curr_advance_query || (!ao->transition->advance_query)) && (curr_advance_target || (!ao->transition->advance_target)) && (curr_is_codon == ((ao->transition->advance_query == 3) && (ao->transition->advance_target == 3)))){ curr_advance_query += (ao->transition->advance_query * ao->length); curr_advance_target += (ao->transition->advance_target * ao->length); } else { switch(curr_label){ case C4_Label_NONE: break; case C4_Label_MATCH: g_assert(curr_advance_query && curr_advance_target); fprintf(fp, "%s%c %d %d", gap, curr_is_codon?'C':'M', curr_advance_query, curr_advance_target); gap = " "; break; case C4_Label_GAP: g_assert(curr_advance_query || curr_advance_target); g_assert(! (curr_advance_query && curr_advance_target)); fprintf(fp, "%sG %d %d", gap, curr_advance_query, curr_advance_target); gap = " "; break; case C4_Label_NER: fprintf(fp, "%sN %d %d", gap, curr_advance_query, curr_advance_target); gap = " "; break; case C4_Label_5SS: fprintf(fp, "%s5 %d %d", gap, curr_advance_query, curr_advance_target); gap = " "; break; case C4_Label_3SS: fprintf(fp, "%s3 %d %d", gap, curr_advance_query, curr_advance_target); gap = " "; break; case C4_Label_INTRON: fprintf(fp, "%sI %d %d", gap, curr_advance_query, curr_advance_target); gap = " "; break; case C4_Label_SPLIT_CODON: fprintf(fp, "%sS %d %d", gap, curr_advance_query, curr_advance_target); gap = " "; break; case C4_Label_FRAMESHIFT: fprintf(fp, "%sF %d %d", gap, curr_advance_query, curr_advance_target); gap = " "; break; default: g_error("Unknown C4_Label [%d]", curr_label); break; } curr_label = ao->transition->label; curr_is_codon = ((ao->transition->advance_query == 3) && (ao->transition->advance_target == 3)); curr_advance_query = (ao->transition->advance_query * ao->length); curr_advance_target = (ao->transition->advance_target * ao->length); } } return; } typedef enum { Alignment_RYO_TOKEN_STRING, /**/ Alignment_RYO_TOKEN_ID, /* [QT] */ Alignment_RYO_TOKEN_DEF, /* [QT] */ Alignment_RYO_TOKEN_LEN, /* [QT] */ Alignment_RYO_TOKEN_STRAND, /* [QT] */ Alignment_RYO_TOKEN_SEQ, /* [QT] */ Alignment_RYO_TOKEN_TYPE, /* [QT] */ /**/ Alignment_RYO_TOKEN_ALIGN_BEGIN, /* [QT] */ Alignment_RYO_TOKEN_ALIGN_END, /* [QT] */ Alignment_RYO_TOKEN_ALIGN_LEN, /* [QT] */ Alignment_RYO_TOKEN_ALIGN_SEQ, /* [QT] */ /**/ Alignment_RYO_TOKEN_CODING_BEGIN, /* [QT] */ Alignment_RYO_TOKEN_CODING_END, /* [QT] */ Alignment_RYO_TOKEN_CODING_LEN, /* [QT] */ Alignment_RYO_TOKEN_CODING_SEQ, /* [QT] */ /**/ Alignment_RYO_TOKEN_SCORE, Alignment_RYO_TOKEN_MODEL_NAME, Alignment_RYO_TOKEN_RANK, Alignment_RYO_TOKEN_PERCENT_ID, Alignment_RYO_TOKEN_PERCENT_SIMILARITY, Alignment_RYO_TOKEN_PERCENT_SELF, Alignment_RYO_TOKEN_GENE_ORIENTATION, Alignment_RYO_TOKEN_EQUIVALENCED_TOTAL, Alignment_RYO_TOKEN_EQUIVALENCED_IDENTCAL, Alignment_RYO_TOKEN_EQUIVALENCED_SIMILAR, Alignment_RYO_TOKEN_EQUIVALENCED_MISMATCHES, /**/ Alignment_RYO_TOKEN_SUGAR_BLOCK, Alignment_RYO_TOKEN_CIGAR_BLOCK, Alignment_RYO_TOKEN_VULGAR_BLOCK, /**/ Alignment_RYO_TOKEN_PTO_OPEN, Alignment_RYO_TOKEN_PTO_CLOSE, /**/ Alignment_RYO_TOKEN_PTO_SEQ, /* [QT] */ Alignment_RYO_TOKEN_PTO_ADVANCE, /* [QT] */ Alignment_RYO_TOKEN_PTO_BEGIN, /* [QT] */ Alignment_RYO_TOKEN_PTO_END, /* [QT] */ /**/ Alignment_RYO_TOKEN_PTO_NAME, Alignment_RYO_TOKEN_PTO_SCORE, Alignment_RYO_TOKEN_PTO_LABEL } Alignment_RYO_Token; typedef struct { Alignment_RYO_Token token; GString *str; /* For TOKEN_STRING */ gboolean on_query; /* For [QT] */ } Alignment_RYO_ComplexToken; static Alignment_RYO_ComplexToken *Alignment_RYO_ComplexToken_create( Alignment_RYO_Token token, gchar *str, gboolean on_query){ register Alignment_RYO_ComplexToken *rct = g_new(Alignment_RYO_ComplexToken, 1); rct->token = token; if(str) rct->str = g_string_new(str); else rct->str = NULL; rct->on_query = on_query; return rct; } static void Alignment_RYO_ComplexToken_destroy( Alignment_RYO_ComplexToken *rct){ if(rct->str) g_string_free(rct->str, TRUE); g_free(rct); return; } static void Alignment_RYO_add_string(GPtrArray *token_list, gchar *str){ register Alignment_RYO_ComplexToken *rct = NULL; if(token_list->len){ rct = token_list->pdata[token_list->len-1]; if(rct->token != Alignment_RYO_TOKEN_STRING) rct = NULL; } if(rct){ g_assert(rct->str); g_string_append(rct->str, str); } else { rct = Alignment_RYO_ComplexToken_create( Alignment_RYO_TOKEN_STRING, str, FALSE); g_ptr_array_add(token_list, rct); } return; } static void Alignment_RYO_add_token(GPtrArray *token_list, Alignment_RYO_Token token, gboolean on_query){ register Alignment_RYO_ComplexToken *rct = Alignment_RYO_ComplexToken_create(token, NULL, on_query); g_ptr_array_add(token_list, rct); return; } static gint Alignment_RYO_tokenise_strand(GPtrArray *token_list, gchar *format, gint pos, gboolean on_query){ register gint move = 1; switch(format[pos]){ case 'i': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_ID, on_query); break; case 'd': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_DEF, on_query); break; case 'l': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_LEN, on_query); break; case 's': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_SEQ, on_query); break; case 'S': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_STRAND, on_query); break; case 't': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_TYPE, on_query); break; case 'a': switch(format[pos+1]){ case 'b': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_ALIGN_BEGIN, on_query); break; case 'e': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_ALIGN_END, on_query); break; case 'l': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_ALIGN_LEN, on_query); break; case 's': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_ALIGN_SEQ, on_query); break; default: g_error("Unknown [%%%c%c%c] in format string [%s]", format[pos-1], format[pos], format[pos+1], format); break; } move++; break; case 'c': switch(format[pos+1]){ case 'b': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_CODING_BEGIN, on_query); break; case 'e': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_CODING_END, on_query); break; case 'l': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_CODING_LEN, on_query); break; case 's': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_CODING_SEQ, on_query); break; default: g_error("Unknown [%%%c%c%c] in format string [%s]", format[pos-1], format[pos], format[pos+1], format); break; } move++; break; default: g_error("Unknown [%%%c%c] in format string [%s]", format[pos-1], format[pos], format); break; } return move; } static gint Alignment_RYO_tokenise_PTO_strand(GPtrArray *token_list, gchar *format, gint pos, gboolean on_query){ register gint move = 1; switch(format[pos]){ case 's': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_PTO_SEQ, on_query); break; case 'a': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_PTO_ADVANCE, on_query); break; case 'b': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_PTO_BEGIN, on_query); break; case 'e': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_PTO_END, on_query); break; default: g_error("Unknown [%%%c%c%c] in format string [%s]", format[pos-2], format[pos-1], format[pos], format); break; } return move; } static gint Alignment_RYO_tokenise_PTO(GPtrArray *token_list, gchar *format, gint pos){ register gint move = 1; switch(format[pos]){ case 'q': move += Alignment_RYO_tokenise_PTO_strand(token_list, format, pos+1, TRUE); break; case 't': move += Alignment_RYO_tokenise_PTO_strand(token_list, format, pos+1, FALSE); break; case 'n': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_PTO_NAME, FALSE); break; case 's': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_PTO_SCORE, FALSE); break; case 'l': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_PTO_LABEL, FALSE); break; default: g_error("Unknown [%%%c%c] in format string [%s]", format[pos-1], format[pos], format); break; } return move; } static GPtrArray *Alignment_RYO_tokenise(gchar *format){ register gint i; register GPtrArray *token_list = g_ptr_array_new(); gchar mini_str[2]; mini_str[1] = '\0'; for(i = 0; format[i]; i++){ switch(format[i]){ case '\\': switch(format[i+1]){ case '\\': Alignment_RYO_add_string(token_list, "\\"); break; case 'n': Alignment_RYO_add_string(token_list, "\n"); break; case 't': Alignment_RYO_add_string(token_list, "\t"); break; case '{': Alignment_RYO_add_string(token_list, "{"); break; case '}': Alignment_RYO_add_string(token_list, "}"); break; default: g_error("Unknown [\\%c] in ryo string [%s]", format[i+1], format); break; } i++; break; case '%': switch(format[i+1]){ case '%': Alignment_RYO_add_string(token_list, "%"); break; case 'q': i += Alignment_RYO_tokenise_strand(token_list, format, i+2, TRUE); break; case 't': i += Alignment_RYO_tokenise_strand(token_list, format, i+2, FALSE); break; case 's': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_SCORE, FALSE); break; case 'm': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_MODEL_NAME, FALSE); break; case 'r': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_RANK, FALSE); break; case 'p': switch(format[i+2]){ case 'i': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_PERCENT_ID, FALSE); break; case 's': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_PERCENT_SIMILARITY, FALSE); break; case 'S': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_PERCENT_SELF, FALSE); break; default: g_error( "Unknown [%%%c%c] in format string", format[i+1], format[i+2]); break; } i++; break; case 'e': switch(format[i+2]){ case 't': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_EQUIVALENCED_TOTAL, FALSE); break; case 'i': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_EQUIVALENCED_IDENTCAL, FALSE); break; case 's': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_EQUIVALENCED_SIMILAR, FALSE); break; case 'm': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_EQUIVALENCED_MISMATCHES, FALSE); break; default: g_error( "Unknown [%%%c%c] in format string", format[i+1], format[i+2]); break; } i++; break; case 'g': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_GENE_ORIENTATION, FALSE); break; case 'S': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_SUGAR_BLOCK, FALSE); break; case 'C': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_CIGAR_BLOCK, FALSE); break; case 'V': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_VULGAR_BLOCK, FALSE); break; case 'P': i += Alignment_RYO_tokenise_PTO(token_list, format, i+2); break; default: g_error("Unknown [%%%c] in format string [%s]", format[i+1], format); break; } i++; break; case '{': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_PTO_OPEN, FALSE); break; case '}': Alignment_RYO_add_token(token_list, Alignment_RYO_TOKEN_PTO_CLOSE, FALSE); break; default: mini_str[0] = format[i]; Alignment_RYO_add_string(token_list, mini_str); break; } } return token_list; } /* %[qt][idlsSt] {query,target}{id,def,len,seq,Strand,type] * %[qt]a[bels] {query,target}align{begin,end,len,seq} * %s score * %m model_name * %r rank * %p[isS] percent {id,similarity,self} * %e[tid] equivalenced {total,identical,similar} * %g gene_orientation * %S sugar block * %C cigar block * %V vulgar block * %% % * \n newline * \t tab * \\ \ * * \{ { * \} } * { start per-transition section * } end per-transition section * %P[qt][sabe] per-transition-output{query,target} * {seq,advance,begin,end} * %P[nsl] per-transition-output{name,score,label} */ static void Alignment_RYO_token_list_destroy(GPtrArray *token_list){ register gint i; register Alignment_RYO_ComplexToken *rct; for(i = 0; i < token_list->len; i++){ rct = token_list->pdata[i]; Alignment_RYO_ComplexToken_destroy(rct); } g_ptr_array_free(token_list, TRUE); return; } /**/ typedef struct { Alignment *alignment; AlignmentOperation *ao; gint operation_id; gint operation_pos; gint query_pos; gint target_pos; C4_Score *cell; C4_Score *start_cell; C4_Score score; gpointer user_data; /* No ref stored */ } Alignment_Position; static Alignment_Position *Alignment_Position_create( Alignment *alignment, gpointer user_data){ register Alignment_Position *ap = g_new(Alignment_Position, 1); register gint i; register gint shadow_total = 1 + alignment->model->total_shadow_designations; register C4_Calc *calc; ap->alignment = Alignment_share(alignment); ap->ao = alignment->operation_list->pdata[0]; ap->operation_id = 0; ap->operation_pos = 0; ap->query_pos = alignment->region->query_start; ap->target_pos = alignment->region->target_start; ap->cell = g_new0(C4_Score, shadow_total); ap->user_data = user_data; if(alignment->model->start_state->cell_start_func){ ap->start_cell = alignment->model->start_state->cell_start_func (ap->query_pos, ap->target_pos, user_data); for(i = 0; i < shadow_total; i++) ap->cell[i] = ap->start_cell[i]; } if(alignment->model->init_func) alignment->model->init_func(alignment->region, user_data); for(i = 0; i < alignment->model->calc_list->len; i++){ calc = alignment->model->calc_list->pdata[i]; if(calc && calc->init_func) calc->init_func(alignment->region, user_data); } return ap; } static void Alignment_Position_destroy(Alignment_Position *ap){ register gint i; register C4_Calc *calc; for(i = 0; i < ap->alignment->model->calc_list->len; i++){ calc = ap->alignment->model->calc_list->pdata[i]; if(calc && calc->exit_func) calc->exit_func(ap->alignment->region, ap->user_data); } if(ap->alignment->model->exit_func) ap->alignment->model->exit_func(ap->alignment->region, ap->user_data); Alignment_destroy(ap->alignment); g_free(ap); return; } static void Alignment_Position_set_shadows(Alignment_Position *ap){ register gint i; register C4_State *state = ap->ao->transition->input; register C4_Shadow *shadow; for(i = 0; i < state->src_shadow_list->len; i++){ shadow = state->src_shadow_list->pdata[i]; ap->cell[1 + shadow->designation] = shadow->start_func( ap->query_pos, ap->target_pos, ap->user_data); } for(i = 0; i < ap->ao->transition->dst_shadow_list->len; i++){ shadow = ap->ao->transition->dst_shadow_list->pdata[i]; shadow->end_func(ap->cell[1 + shadow->designation], ap->query_pos + ap->ao->transition->advance_query, ap->target_pos + ap->ao->transition->advance_target, ap->user_data); } return; } static gboolean Alignment_Position_next(Alignment_Position *ap){ ap->query_pos += ap->ao->transition->advance_query; ap->target_pos += ap->ao->transition->advance_target; /* Use next transition */ if(++ap->operation_pos < ap->ao->length){ Alignment_Position_set_shadows(ap); return TRUE; } /* Use next operation */ if(++ap->operation_id < ap->alignment->operation_list->len){ ap->operation_pos = 0; ap->ao = ap->alignment->operation_list->pdata[ap->operation_id]; Alignment_Position_set_shadows(ap); return TRUE; } return FALSE; } /**/ typedef struct { gint begin; gint end; Sequence *seq; } Alignment_Coding; static Alignment_Coding *Alignment_Coding_create(Alignment *alignment, Sequence *query, Sequence *target, gpointer user_data, gboolean on_query){ register Alignment_Coding *ac = g_new(Alignment_Coding, 1); register Alignment_Position *ap = Alignment_Position_create(alignment, user_data); register Sequence *src = NULL; register GString *seq = g_string_sized_new(1024); register gint i, advance, pos; if(on_query){ g_assert(query->alphabet->type == Alphabet_Type_DNA); src = query; } else { g_assert(target->alphabet->type == Alphabet_Type_DNA); src = target; } while(Alignment_Position_next(ap)){ if(on_query){ advance = ap->ao->transition->advance_query; pos = ap->query_pos; } else { advance = ap->ao->transition->advance_target; pos = ap->target_pos; } switch(ap->ao->transition->label){ case C4_Label_MATCH: if(advance == 3){ if(!seq->len) ac->begin = pos; g_string_append_c(seq, Sequence_get_symbol(src, pos)); g_string_append_c(seq, Sequence_get_symbol(src, pos+1)); g_string_append_c(seq, Sequence_get_symbol(src, pos+2)); ac->end = pos; } break; case C4_Label_SPLIT_CODON: g_assert(advance); g_assert(seq); g_assert(seq->len); for(i = 0; i < advance; i++) g_string_append_c(seq, Sequence_get_symbol(src, pos+i)); break; case C4_Label_GAP: if(advance == 3){ g_assert(seq); g_assert(seq->len); g_string_append_c(seq, Sequence_get_symbol(src, pos)); g_string_append_c(seq, Sequence_get_symbol(src, pos+1)); g_string_append_c(seq, Sequence_get_symbol(src, pos+2)); } break; default: /* ignored */ break; } } Alignment_Position_destroy(ap); ac->seq = Sequence_create("coding", NULL, seq->str, seq->len, src->strand, src->alphabet); g_string_free(seq, TRUE); return ac; } static void Alignment_Coding_destroy(Alignment_Coding *ac){ Sequence_destroy(ac->seq); g_free(ac); return; } /**/ static void Alignment_RYO_token_list_print(GPtrArray *token_list, Alignment *alignment, Sequence *query, Sequence *target, Translate *translate, gint rank, gpointer user_data, gpointer self_data, FILE *fp){ register gint i, j, pto_start = -1; register Alignment_RYO_ComplexToken *rct; register Sequence *seq, *subseq; register Alignment_Position *ap = NULL; register Alignment_Coding *qy_ac = NULL, *tg_ac = NULL, *ac; for(i = 0; i < token_list->len; i++){ rct = token_list->pdata[i]; seq = rct->on_query?query:target; switch(rct->token){ case Alignment_RYO_TOKEN_STRING: fprintf(fp, "%s", rct->str->str); break; /**/ case Alignment_RYO_TOKEN_ID: fprintf(fp, "%s", seq->id); break; case Alignment_RYO_TOKEN_DEF: if(seq->def) fprintf(fp, "%s", seq->def); break; case Alignment_RYO_TOKEN_LEN: fprintf(fp, "%d", seq->len); break; case Alignment_RYO_TOKEN_STRAND: fprintf(fp, "%c", Sequence_get_strand_as_char(seq)); break; case Alignment_RYO_TOKEN_SEQ: Sequence_print_fasta_block(seq, fp); break; case Alignment_RYO_TOKEN_TYPE: fprintf(fp, "%s", Alphabet_Type_get_name(seq->alphabet->type)); break; /**/ case Alignment_RYO_TOKEN_ALIGN_BEGIN: fprintf(fp, "%d", Alignment_get_coordinate(alignment, query, target, rct->on_query, TRUE)); break; case Alignment_RYO_TOKEN_ALIGN_END: fprintf(fp, "%d", Alignment_get_coordinate(alignment, query, target, rct->on_query, FALSE)); break; case Alignment_RYO_TOKEN_ALIGN_LEN: fprintf(fp, "%d", rct->on_query?alignment->region->query_length :alignment->region->target_length); break; case Alignment_RYO_TOKEN_ALIGN_SEQ: if(rct->on_query){ subseq = Sequence_subseq(seq, alignment->region->query_start, alignment->region->query_length); } else { subseq = Sequence_subseq(seq, alignment->region->target_start, alignment->region->target_length); } Sequence_print_fasta_block(subseq, fp); Sequence_destroy(subseq); break; /**/ case Alignment_RYO_TOKEN_CODING_BEGIN: ac = rct->on_query?qy_ac:tg_ac; if(!ac) ac = Alignment_Coding_create(alignment, query, target, user_data, rct->on_query); fprintf(fp, "%d", Alignment_convert_coordinate(alignment, query, target, ac->begin, ac->begin, rct->on_query)); break; case Alignment_RYO_TOKEN_CODING_END: ac = rct->on_query?qy_ac:tg_ac; if(!ac) ac = Alignment_Coding_create(alignment, query, target, user_data, rct->on_query); fprintf(fp, "%d", Alignment_convert_coordinate(alignment, query, target, ac->end, ac->end, rct->on_query)); break; case Alignment_RYO_TOKEN_CODING_LEN: ac = rct->on_query?qy_ac:tg_ac; if(!ac) ac = Alignment_Coding_create(alignment, query, target, user_data, rct->on_query); fprintf(fp, "%d", ac->seq->len); break; case Alignment_RYO_TOKEN_CODING_SEQ: ac = rct->on_query?qy_ac:tg_ac; if(!ac) ac = Alignment_Coding_create(alignment, query, target, user_data, rct->on_query); Sequence_print_fasta_block(ac->seq, fp); break; /**/ case Alignment_RYO_TOKEN_SCORE: fprintf(fp, "%d", alignment->score); break; case Alignment_RYO_TOKEN_MODEL_NAME: fprintf(fp, "%s", alignment->model->name); break; case Alignment_RYO_TOKEN_RANK: if(rank == -1) fprintf(fp, "%%_EXONERATE_BESTN_RANK_%%"); else fprintf(fp, "%d", rank); break; /**/ case Alignment_RYO_TOKEN_PERCENT_ID: fprintf(fp, "%2.2f", Alignment_get_percent_score( alignment, query, target, translate, TRUE, user_data)); break; case Alignment_RYO_TOKEN_PERCENT_SIMILARITY: fprintf(fp, "%2.2f", Alignment_get_percent_score( alignment, query, target, translate, FALSE, user_data)); break; case Alignment_RYO_TOKEN_PERCENT_SELF: fprintf(fp, "%2.2f", Alignment_get_percent_self( alignment, query, target, translate, user_data, self_data)); break; case Alignment_RYO_TOKEN_GENE_ORIENTATION: fprintf(fp, "%c", Alignment_get_gene_orientation(alignment)); break; /**/ case Alignment_RYO_TOKEN_EQUIVALENCED_TOTAL: fprintf(fp, "%d", Alignment_get_equivalenced_total( alignment)); break; case Alignment_RYO_TOKEN_EQUIVALENCED_IDENTCAL: fprintf(fp, "%d", Alignment_get_equivalenced_matching( alignment, query, target, translate, TRUE, user_data)); break; case Alignment_RYO_TOKEN_EQUIVALENCED_SIMILAR: fprintf(fp, "%d", Alignment_get_equivalenced_matching( alignment, query, target, translate, FALSE, user_data)); break; case Alignment_RYO_TOKEN_EQUIVALENCED_MISMATCHES: fprintf(fp, "%d", Alignment_get_equivalenced_total(alignment) - Alignment_get_equivalenced_matching( alignment, query, target, translate, TRUE, user_data)); break; /**/ case Alignment_RYO_TOKEN_SUGAR_BLOCK: Alignment_print_sugar_block(alignment, query, target, fp); break; case Alignment_RYO_TOKEN_CIGAR_BLOCK: Alignment_print_cigar_block(alignment, query, target, fp); break; case Alignment_RYO_TOKEN_VULGAR_BLOCK: Alignment_print_vulgar_block(alignment, query, target, fp); break; /**/ case Alignment_RYO_TOKEN_PTO_OPEN: if(pto_start != -1) g_error("Cannot nest PTO brackets"); pto_start = i; ap = Alignment_Position_create(alignment, user_data); break; case Alignment_RYO_TOKEN_PTO_CLOSE: if(pto_start == -1) g_error("No opening PTO bracket in --ryo string"); if(Alignment_Position_next(ap)){ i = pto_start; } else { pto_start = -1; Alignment_Position_destroy(ap); ap = NULL; } break; /**/ case Alignment_RYO_TOKEN_PTO_SEQ: g_assert(pto_start != -1); if(rct->on_query){ if(ap->ao->transition->advance_query){ for(j = 0; j < ap->ao->transition->advance_query; j++) fprintf(fp, "%c", Sequence_get_symbol(query, ap->query_pos+j)); } else { fprintf(fp, "-"); } } else { if(ap->ao->transition->advance_target){ for(j = 0; j < ap->ao->transition->advance_target; j++) fprintf(fp, "%c", Sequence_get_symbol(target, ap->target_pos+j)); } else { fprintf(fp, "-"); } } break; case Alignment_RYO_TOKEN_PTO_ADVANCE: fprintf(fp, "%d", rct->on_query?ap->ao->transition->advance_query :ap->ao->transition->advance_target); break; case Alignment_RYO_TOKEN_PTO_BEGIN: fprintf(fp, "%d", Alignment_convert_coordinate(alignment, query, target, ap->query_pos, ap->target_pos, rct->on_query)); break; case Alignment_RYO_TOKEN_PTO_END: fprintf(fp, "%d", Alignment_convert_coordinate(alignment, query, target, ap->query_pos+ap->ao->transition->advance_query, ap->target_pos+ap->ao->transition->advance_target, rct->on_query)); break; /**/ case Alignment_RYO_TOKEN_PTO_NAME: fprintf(fp, "%s", ap->ao->transition->name); break; case Alignment_RYO_TOKEN_PTO_SCORE: fprintf(fp, "%d", C4_Calc_score(ap->ao->transition->calc, ap->query_pos, ap->target_pos, user_data)); break; case Alignment_RYO_TOKEN_PTO_LABEL: fprintf(fp, "%s", C4_Label_get_name(ap->ao->transition->label)); break; default: g_error("Unknown token [%d]", rct->token); break; } } if(pto_start != -1) g_error("No closing PTO bracket in --ryo string"); if(qy_ac) Alignment_Coding_destroy(qy_ac); if(tg_ac) Alignment_Coding_destroy(tg_ac); return; } void Alignment_display_ryo(Alignment *alignment, Sequence *query, Sequence *target, gchar *format, Translate *translate, gint rank, gpointer user_data, gpointer self_data, FILE *fp){ register GPtrArray *token_list = Alignment_RYO_tokenise(format); Alignment_RYO_token_list_print(token_list, alignment, query, target, translate, rank, user_data, self_data, fp); Alignment_RYO_token_list_destroy(token_list); return; } void Alignment_display_sugar(Alignment *alignment, Sequence *query, Sequence *target, FILE *fp){ fprintf(fp, "sugar: "); Alignment_print_sugar_block(alignment, query, target, fp); fprintf(fp, "\n"); return; } /* sugar: simple ungapped alignment report */ void Alignment_display_cigar(Alignment *alignment, Sequence *query, Sequence *target, FILE *fp){ fprintf(fp, "cigar: "); Alignment_print_sugar_block(alignment, query, target, fp); fprintf(fp, " "); Alignment_print_cigar_block(alignment, query, target, fp); fprintf(fp, "\n"); return; } /* cigar: concise idiosyncratic gapped alignment report * query_id query_start query_end query_strand \ * target_id target_start target_end target_strand \ * score \ * <[MID] length> (m=Match i=Insert d=Delete) * */ void Alignment_display_vulgar(Alignment *alignment, Sequence *query, Sequence *target, FILE *fp){ fprintf(fp, "vulgar: "); Alignment_print_sugar_block(alignment, query, target, fp); fprintf(fp, " "); Alignment_print_vulgar_block(alignment, query, target, fp); fprintf(fp, "\n"); return; } /**/ static void Alignment_display_gff_header(Alignment *alignment, Sequence *query, Sequence *target, gboolean report_on_query, FILE *fp){ time_t timenow = time(NULL); register struct tm *tm_ptr = localtime(&timenow); gchar curr_time_str[12]; strftime(curr_time_str, 30, "%Y-%m-%d", tm_ptr); fprintf(fp, "#\n" "##gff-version 2\n" "##source-version %s:%s %s\n" "##date %s\n" "##type %s\n" "#\n", PACKAGE, alignment->model->name, VERSION, curr_time_str, Alphabet_Type_get_name(report_on_query ? query->alphabet->type : target->alphabet->type)); fprintf(fp, "#\n# seqname source feature start end" " score strand frame attributes\n#\n"); return; } static void Alignment_display_gff_line(Alignment *alignment, Sequence *query, Sequence *target, gboolean report_on_query, gchar *feature, gint query_start, gint target_start, gint query_end, gint target_end, gboolean show_score, gint score, gboolean show_frame, gint frame, GPtrArray *attribute_list, FILE *fp){ /* * [attributes] [comments] */ register Sequence *seq; register gint i, start, end, t; register gchar *attribute; if(report_on_query){ seq = query; start = query_start; end = query_end; } else { seq = target; start = target_start; end = target_end; } if(seq->strand == Sequence_Strand_REVCOMP){ start = seq->len - start; end = seq->len - end; /* swap coords for gff output */ t = start; start = end; end = t; } /* Sanity checks for valid GFF output */ g_assert(start >= 0); g_assert(end <= seq->len); g_assert(start < end); g_assert((!show_frame) || ((frame >= 0) && (frame <= 2))); fprintf(fp, "%s\t%s:%s\t%s\t%d\t%d\t", seq->id, PACKAGE, alignment->model->name, feature, start+1, end); if(show_score) fprintf(fp, "%d", score); else fprintf(fp, "."); fprintf(fp, "\t%c\t", Sequence_get_strand_as_char(seq)); if(show_frame) fprintf(fp, "%d", frame); else fprintf(fp, "."); fprintf(fp, "\t"); if(attribute_list){ for(i = 0; i < attribute_list->len; i++){ attribute = attribute_list->pdata[i]; fprintf(fp, "%s", attribute); if((i+1) < attribute_list->len) fprintf(fp, " ; "); } } fprintf(fp, "\n"); return; } static void Alignment_free_attribute_list(GPtrArray *attribute_list){ register gint i; for(i = 0; i < attribute_list->len; i++) g_free(attribute_list->pdata[i]); g_ptr_array_free(attribute_list, TRUE); return; } static void Alignment_display_gff_exon(Alignment *alignment, Sequence *query, Sequence *target, Translate *translate, gboolean report_on_query, gint query_pos, gint target_pos, gint exon_query_start, gint exon_target_start, gint exon_query_gap, gint exon_target_gap, gint exon_query_frameshift, gint exon_target_frameshift, gpointer user_data, FILE *fp){ register GPtrArray *attribute_list = g_ptr_array_new(); g_ptr_array_add(attribute_list, g_strdup_printf("insertions %d", report_on_query?exon_query_gap :exon_target_gap)); g_ptr_array_add(attribute_list, g_strdup_printf("deletions %d", report_on_query?exon_target_gap :exon_query_gap)); g_ptr_array_add(attribute_list, g_strdup_printf("identity %2.2f", Alignment_get_percent_score_region(alignment, query, target, translate, TRUE, user_data, exon_query_start, query_pos))); g_ptr_array_add(attribute_list, g_strdup_printf("similarity %2.2f", Alignment_get_percent_score_region(alignment, query, target, translate, FALSE, user_data, exon_query_start, query_pos))); if(report_on_query){ if(exon_query_frameshift) g_ptr_array_add(attribute_list, g_strdup_printf("frameshifts %d", exon_query_frameshift)); } else { if(exon_target_frameshift) g_ptr_array_add(attribute_list, g_strdup_printf("frameshifts %d", exon_target_frameshift)); } Alignment_display_gff_line(alignment, query, target, report_on_query, "exon", exon_query_start, exon_target_start, query_pos, target_pos, FALSE, 0, FALSE, 0, attribute_list, fp); Alignment_free_attribute_list(attribute_list); return; } static void Alignment_display_gff_utr(Alignment *alignment, Sequence *query, Sequence *target, gboolean report_on_query, gboolean post_cds, gint cds_query_start, gint cds_target_start, gint cds_query_end, gint cds_target_end, gint exon_query_start, gint exon_target_start, gint query_pos, gint target_pos, FILE *fp){ register gint curr_cds_query_start, curr_cds_target_start, curr_utr_query_start, curr_utr_target_start; if(post_cds){ curr_utr_query_start = MAX(exon_query_start, cds_query_end); curr_utr_target_start = MAX(exon_target_start, cds_target_end); Alignment_display_gff_line(alignment, query, target, report_on_query, "utr3", curr_utr_query_start, curr_utr_target_start, query_pos, target_pos, FALSE, 0, FALSE, 0, NULL, fp); } else if(cds_query_start == -1){ Alignment_display_gff_line(alignment, query, target, report_on_query, "utr5", exon_query_start, exon_target_start, query_pos, target_pos, FALSE, 0, FALSE, 0, NULL, fp); } else { curr_cds_query_start = MAX(cds_query_start, exon_query_start); curr_cds_target_start = MAX(cds_target_start, exon_target_start); Alignment_display_gff_line(alignment, query, target, report_on_query, "cds", curr_cds_query_start, curr_cds_target_start, query_pos, target_pos, FALSE, 0, FALSE, 0, NULL, fp); } return; } static void Alignment_display_gff_gene(Alignment *alignment, Sequence *query, Sequence *target, Translate *translate, gboolean report_on_query, gint result_id, gpointer user_data, FILE *fp){ register gint i; register gint query_pos = alignment->region->query_start, target_pos = alignment->region->target_start; register gint intron_id = 0, intron_length = 0; register gint exon_query_start = 0, exon_target_start = 0; register gint exon_query_gap = 0, exon_target_gap = 0; register gint exon_query_frameshift = 0, exon_target_frameshift = 0; register gint cds_query_start = -1, cds_target_start = -1, cds_query_end = -1, cds_target_end = -1; register gboolean in_exon = FALSE, post_cds = FALSE; register AlignmentOperation *ao; register gchar gene_orientation = Alignment_get_gene_orientation(alignment); register GPtrArray *attribute_list = g_ptr_array_new(); register gint curr_utr_query_start, curr_utr_target_start; /**/ g_ptr_array_add(attribute_list, g_strdup_printf("gene_id %d", result_id)); g_ptr_array_add(attribute_list, g_strdup_printf("sequence %s", report_on_query?target->id:query->id)); g_ptr_array_add(attribute_list, g_strdup_printf("gene_orientation %c", gene_orientation)); g_ptr_array_add(attribute_list, g_strdup_printf("identity %2.2f", Alignment_get_percent_score(alignment, query, target, translate, TRUE, user_data))); g_ptr_array_add(attribute_list, g_strdup_printf("similarity %2.2f", Alignment_get_percent_score(alignment, query, target, translate, FALSE, user_data))); Alignment_display_gff_line(alignment, query, target, report_on_query, "gene", alignment->region->query_start, alignment->region->target_start, Region_query_end(alignment->region), Region_target_end(alignment->region), TRUE, alignment->score, FALSE, 0, attribute_list, fp); Alignment_free_attribute_list(attribute_list); for(i = 1; i < alignment->operation_list->len; i++){ ao = alignment->operation_list->pdata[i]; switch(ao->transition->label){ case C4_Label_MATCH: if((ao->transition->advance_query == 1) && (ao->transition->advance_target == 1)){ if((cds_query_start != -1) && (!post_cds)){ Alignment_display_gff_line(alignment, query, target, report_on_query, "cds", exon_query_start, exon_target_start, query_pos, target_pos, FALSE, 0, FALSE, 0, NULL, fp); post_cds = TRUE; } } else { if(cds_query_start == -1){ /* First coding exon */ if(in_exon){ /* Have UTR5 */ Alignment_display_gff_line(alignment, query, target, report_on_query, "utr5", exon_query_start, exon_target_start, query_pos, target_pos, FALSE, 0, FALSE, 0, NULL, fp); } cds_query_start = query_pos; cds_target_start = target_pos; } cds_query_end = query_pos + (ao->transition->advance_query * ao->length); cds_target_end = target_pos + (ao->transition->advance_target * ao->length); } /*fallthrough*/ case C4_Label_SPLIT_CODON: if(!in_exon){ exon_query_start = query_pos; exon_target_start = target_pos; exon_query_gap = 0; exon_target_gap = 0; exon_query_frameshift = 0; exon_target_frameshift = 0; in_exon = TRUE; } break; case C4_Label_NONE: break; case C4_Label_GAP: exon_query_gap += (ao->transition->advance_query * ao->length); exon_target_gap += (ao->transition->advance_target * ao->length); break; case C4_Label_5SS: if(report_on_query){ g_assert(ao->transition->advance_query == 2); g_assert(ao->transition->advance_target == 0); } else { g_assert(ao->transition->advance_query == 0); g_assert(ao->transition->advance_target == 2); } if(in_exon){ Alignment_display_gff_utr(alignment, query, target, report_on_query, post_cds, cds_query_start, cds_target_start, cds_query_end, cds_target_end, exon_query_start, exon_target_start, query_pos, target_pos, fp); Alignment_display_gff_exon(alignment, query, target, translate, report_on_query, query_pos, target_pos, exon_query_start, exon_target_start, exon_query_gap, exon_target_gap, exon_query_frameshift, exon_target_frameshift, user_data, fp); in_exon = FALSE; } attribute_list = g_ptr_array_new(); g_ptr_array_add(attribute_list, g_strdup_printf("intron_id %d", intron_id+1)); g_ptr_array_add(attribute_list, g_strdup_printf("splice_site \"%c%c\"", report_on_query?Sequence_get_symbol(query, query_pos) :Sequence_get_symbol(target, target_pos), report_on_query?Sequence_get_symbol(query, query_pos+1) :Sequence_get_symbol(target, target_pos+1))); Alignment_display_gff_line(alignment, query, target, report_on_query, "splice5", query_pos, target_pos, query_pos+2, target_pos+2, FALSE, 0, FALSE, 0, attribute_list, fp); Alignment_free_attribute_list(attribute_list); intron_length = 0; break; case C4_Label_3SS: if(report_on_query){ g_assert(ao->transition->advance_query == 2); g_assert(ao->transition->advance_target == 0); } else { g_assert(ao->transition->advance_query == 0); g_assert(ao->transition->advance_target == 2); } if(in_exon){ Alignment_display_gff_utr(alignment, query, target, report_on_query, post_cds, cds_query_start, cds_target_start, cds_query_end, cds_target_end, exon_query_start, exon_target_start, query_pos, target_pos, fp); Alignment_display_gff_exon(alignment, query, target, translate, report_on_query, query_pos, target_pos, exon_query_start, exon_target_start, exon_query_gap, exon_target_gap, exon_query_frameshift, exon_target_frameshift, user_data, fp); in_exon = FALSE; } if(gene_orientation == '+'){ attribute_list = g_ptr_array_new(); g_ptr_array_add(attribute_list, g_strdup_printf("intron_id %d", ++intron_id)); Alignment_display_gff_line(alignment, query, target, report_on_query, "intron", query_pos-intron_length-2, target_pos-intron_length-2, query_pos+2, target_pos+2, FALSE, 0, FALSE, 0, attribute_list, fp); Alignment_free_attribute_list(attribute_list); } attribute_list = g_ptr_array_new(); g_ptr_array_add(attribute_list, g_strdup_printf("intron_id %d", intron_id-1)); g_ptr_array_add(attribute_list, g_strdup_printf("splice_site \"%c%c\"", report_on_query?Sequence_get_symbol(query, query_pos) :Sequence_get_symbol(target, target_pos), report_on_query?Sequence_get_symbol(query, query_pos+1) :Sequence_get_symbol(target, target_pos+1))); Alignment_display_gff_line(alignment, query, target, report_on_query, "splice3", query_pos, target_pos, query_pos+2, target_pos+2, FALSE, 0, FALSE, 0, attribute_list, fp); Alignment_free_attribute_list(attribute_list); intron_length = 0; break; case C4_Label_INTRON: if(report_on_query){ g_assert(ao->transition->advance_query == 1); g_assert(ao->transition->advance_target == 0); } else { g_assert(ao->transition->advance_query == 0); g_assert(ao->transition->advance_target == 1); } intron_length += ao->length; break; case C4_Label_FRAMESHIFT: exon_query_frameshift += (ao->transition->advance_query * ao->length); exon_target_frameshift += (ao->transition->advance_target * ao->length); break; case C4_Label_NER: g_error("Unexpected NER for gff gene output"); break; default: g_error("Unknown C4_Label [%s]", C4_Label_get_name(ao->transition->label)); break; } query_pos += (ao->transition->advance_query * ao->length); target_pos += (ao->transition->advance_target * ao->length); } if(in_exon){ if(cds_query_end != -1){ if(cds_query_end != query_pos){ /* Have 3' UTR */ curr_utr_query_start = MAX(exon_query_start, cds_query_end); curr_utr_target_start = MAX(exon_target_start, cds_target_end); Alignment_display_gff_line(alignment, query, target, report_on_query, "utr3b", curr_utr_query_start, curr_utr_target_start, query_pos, target_pos, FALSE, 0, FALSE, 0, NULL, fp); } else { Alignment_display_gff_line(alignment, query, target, report_on_query, "cds", exon_query_start, exon_target_start, query_pos, target_pos, FALSE, 0, FALSE, 0, NULL, fp); } } Alignment_display_gff_exon(alignment, query, target, translate, report_on_query, query_pos, target_pos, exon_query_start, exon_target_start, exon_query_gap, exon_target_gap, exon_query_frameshift, exon_target_frameshift, user_data, fp); } return; } static void Alignment_display_gff_similarity(Alignment *alignment, Sequence *query, Sequence *target, gboolean report_on_query, gint result_id, FILE *fp){ register gint i, qp, tp; register gint query_pos = alignment->region->query_start, target_pos = alignment->region->target_start; register AlignmentOperation *ao; register GPtrArray *attribute_list = g_ptr_array_new(); g_ptr_array_add(attribute_list, g_strdup_printf("alignment_id %d", result_id)); if(report_on_query) g_ptr_array_add(attribute_list, g_strdup_printf("Target %s", target->id)); else g_ptr_array_add(attribute_list, g_strdup_printf("Query %s", query->id)); /* Align [] ; */ for(i = 1; i < alignment->operation_list->len; i++){ ao = alignment->operation_list->pdata[i]; switch(ao->transition->label){ case C4_Label_MATCH: qp = query_pos; tp = target_pos; if(query->strand == Sequence_Strand_REVCOMP) qp = query->len - qp; if(target->strand == Sequence_Strand_REVCOMP) tp = target->len - tp; if(report_on_query){ g_ptr_array_add(attribute_list, g_strdup_printf("Align %d %d %d", qp+1, tp+1, ao->length*ao->transition->advance_query)); } else { g_ptr_array_add(attribute_list, g_strdup_printf("Align %d %d %d", tp+1, qp+1, ao->length*ao->transition->advance_target)); } break; case C4_Label_NONE: case C4_Label_GAP: case C4_Label_NER: case C4_Label_5SS: case C4_Label_3SS: case C4_Label_INTRON: case C4_Label_SPLIT_CODON: case C4_Label_FRAMESHIFT: break; default: g_error("Unknown C4_Label [%s]", C4_Label_get_name(ao->transition->label)); break; } query_pos += (ao->transition->advance_query * ao->length); target_pos += (ao->transition->advance_target * ao->length); } Alignment_display_gff_line(alignment, query, target, report_on_query, "similarity", alignment->region->query_start, alignment->region->target_start, Region_query_end(alignment->region), Region_target_end(alignment->region), TRUE, alignment->score, FALSE, 0, attribute_list, fp); Alignment_free_attribute_list(attribute_list); return; } /**/ void Alignment_display_gff(Alignment *alignment, Sequence *query, Sequence *target, Translate *translate, gboolean report_on_query, gboolean report_on_genomic, gint result_id, gpointer user_data, FILE *fp){ fprintf(fp, "# --- START OF GFF DUMP ---\n#\n"); Alignment_display_gff_header(alignment, query, target, report_on_query, fp); if(report_on_genomic){ Alignment_display_gff_gene(alignment, query, target, translate, report_on_query, result_id, user_data, fp); } Alignment_display_gff_similarity(alignment, query, target, report_on_query, result_id, fp); fprintf(fp, "# --- END OF GFF DUMP ---\n#\n"); return; } /**/ /* features: gene,intron,exon,splice[35],similarity */ /**/ #ifndef G_DISABLE_ASSERT static gboolean Alignment_has_valid_alignment(Alignment *alignment, gpointer user_data){ register gint query_pos = alignment->region->query_start, target_pos = alignment->region->target_start; register C4_Score score; register C4_Calc *calc; register AlignmentOperation *alignment_operation; register C4_State *state; register C4_Transition *transition; register C4_Shadow *shadow; register gint i, j, k, t; register C4_Score *start_cell, *cell = g_new0(C4_Score, 1+alignment->model->total_shadow_designations); /* FIXME: temp */ /* #define DEBUG */ #ifdef DEBUG g_message("Check model [%s] align score [%d]", alignment->model->name, alignment->score); for(i = 0; i < alignment->model->transition_list->len; i++){ transition = alignment->model->transition_list->pdata[i]; g_message("Check transition (%d) [%s] has dst_shad[%d] (%s:%s)", i, transition->name, transition->dst_shadow_list->len, transition->input->name, transition->output->name); } Region_print(alignment->region, "Alignment_has_valid_alignment"); #endif /* DEBUG */ if(alignment->model->init_func) alignment->model->init_func(alignment->region, user_data); for(i = 0; i < alignment->model->calc_list->len; i++){ calc = alignment->model->calc_list->pdata[i]; if(calc && calc->init_func) calc->init_func(alignment->region, user_data); } if(alignment->model->start_state->cell_start_func){ start_cell = alignment->model->start_state->cell_start_func (query_pos, target_pos, user_data); #ifdef DEBUG g_print("USING start cell from (%d,%d): ", query_pos, target_pos); #endif /* DEBUG */ for(i = 0; i < (1 + alignment->model->total_shadow_designations); i++){ cell[i] = start_cell[i]; #ifdef DEBUG g_print("[%d]", cell[i]); #endif /* DEBUG */ } #ifdef DEBUG g_print("\n"); #endif /* DEBUG */ } score = cell[0]; #ifdef DEBUG g_message("start with [%d] (%s)", score, alignment->model->name); #endif /* DEBUG */ for(i = 0; i < alignment->operation_list->len; i++){ alignment_operation = alignment->operation_list->pdata[i]; for(j = 0; j < alignment_operation->length; j++){ state = alignment_operation->transition->input; for(k = 0; k < state->src_shadow_list->len; k++){ shadow = state->src_shadow_list->pdata[k]; /* #define DEBUG_SHADOWS */ #ifdef DEBUG_SHADOWS g_message("start_func on [%s:%s:%s] (%d,%d) shadow(%d)", state->name, alignment_operation->transition->name, shadow->name, query_pos, target_pos, k); #endif /* DEBUG_SHADOWS */ cell[1 + shadow->designation] = shadow->start_func( query_pos, target_pos, user_data); } transition = alignment_operation->transition; for(k = 0; k < transition->dst_shadow_list->len; k++){ shadow = transition->dst_shadow_list->pdata[k]; #ifdef DEBUG_SHADOWS g_message("end_func on [%s:%s] (%d,%d) shadow(%d)=(%d)", transition->name, shadow->name, query_pos, target_pos, k, cell[1 + shadow->designation]); #endif /* DEBUG_SHADOWS */ shadow->end_func(cell[1 + shadow->designation], query_pos, target_pos, user_data); } /**/ t = C4_Calc_score(alignment_operation->transition->calc, query_pos, target_pos, user_data); score += t; #ifdef DEBUG g_message("add on [%d] (%s) => [%d] at (%d,%d) {%d,%d}", t, alignment_operation->transition->name, score, query_pos, target_pos, i, j); #endif /* DEBUG */ query_pos += alignment_operation->transition->advance_query; target_pos += alignment_operation->transition->advance_target; } } for(i = 0; i < alignment->model->calc_list->len; i++){ calc = alignment->model->calc_list->pdata[i]; if(calc && calc->exit_func) calc->exit_func(alignment->region, user_data); } if(alignment->model->exit_func) alignment->model->exit_func(alignment->region, user_data); #ifdef DEBUG g_message("end with [%d]", score); #endif /* DEBUG */ #ifdef DEBUG g_message("CHECKIT: [%d-%d=%d],[%d] [%d-%d=%d],[%d]", query_pos, alignment->region->query_start, (query_pos-alignment->region->query_start), alignment->region->query_length, target_pos, alignment->region->target_start, (target_pos-alignment->region->target_start), alignment->region->target_length); #endif /* DEBUG */ if(score != alignment->score) /* FIXME: temp */ g_warning("Score difference DP[%d] TB[%d]", alignment->score, score); g_assert((query_pos-alignment->region->query_start) == alignment->region->query_length); g_assert((target_pos-alignment->region->target_start) == alignment->region->target_length); g_assert(score == alignment->score); g_free(cell); return TRUE; } /* FIXME: Should change equality test in case the score is a float */ #endif /* G_DISABLE_ASSERT */ static gboolean Alignment_has_valid_path(Alignment *alignment){ register C4_State *first_state, *last_state; register AlignmentOperation *alignment_operation, *prev; register gint i; g_assert(alignment); g_assert(alignment->operation_list); g_assert(alignment->operation_list->len > 0); alignment_operation = alignment->operation_list->pdata[0]; first_state = alignment_operation->transition->input; g_assert(first_state == alignment->model->start_state->state); alignment_operation = alignment->operation_list->pdata [alignment->operation_list->len-1]; last_state = alignment_operation->transition->output; g_assert(last_state == alignment->model->end_state->state); for(i = 1; i < alignment->operation_list->len; i++){ alignment_operation = alignment->operation_list->pdata[i]; prev = alignment->operation_list->pdata[i-1]; g_assert(alignment_operation->transition->input == prev->transition->output); } return TRUE; } static gboolean Alignment_is_within_scope(Alignment *alignment, Region *seq_region){ register gboolean start_at_query_edge, start_at_target_edge, end_at_query_edge, end_at_target_edge; start_at_query_edge = (seq_region->query_start == alignment->region->query_start); start_at_target_edge = (seq_region->target_start == alignment->region->target_start); end_at_query_edge = (Region_query_end(alignment->region) == Region_query_end(seq_region)); end_at_target_edge = (Region_target_end(alignment->region) == Region_target_end(seq_region)); switch(alignment->model->start_state->scope){ case C4_Scope_ANYWHERE: break; case C4_Scope_EDGE: g_assert(start_at_query_edge || start_at_target_edge); break; case C4_Scope_QUERY: g_assert(start_at_query_edge); break; case C4_Scope_TARGET: g_assert(start_at_target_edge); break; case C4_Scope_CORNER: g_assert(start_at_query_edge && start_at_target_edge); break; } switch(alignment->model->end_state->scope){ case C4_Scope_ANYWHERE: break; case C4_Scope_EDGE: g_assert(end_at_query_edge || end_at_target_edge); break; case C4_Scope_QUERY: g_assert(end_at_query_edge); break; case C4_Scope_TARGET: g_assert(end_at_target_edge); break; case C4_Scope_CORNER: g_assert(end_at_query_edge && end_at_target_edge); break; } return TRUE; } gboolean Alignment_is_valid(Alignment *alignment, Region *seq_region, gpointer user_data){ g_assert(alignment); g_assert(seq_region); g_assert(Region_is_within(seq_region, alignment->region)); g_assert(Alignment_is_within_scope(alignment, seq_region)); g_assert(Alignment_has_valid_path(alignment)); g_assert(Alignment_has_valid_alignment(alignment, user_data)); return TRUE; } /**/ void Alignment_import_derived(Alignment *alignment, Alignment *to_add, C4_DerivedModel *derived_model){ register gint i; register AlignmentOperation *alignment_operation; register C4_Transition *transition; g_assert(alignment->model == derived_model->original); g_assert(to_add->model == derived_model->derived); for(i = 0; i < to_add->operation_list->len; i++){ alignment_operation = to_add->operation_list->pdata[i]; transition = derived_model->transition_map [alignment_operation->transition->id]; Alignment_add(alignment, transition, alignment_operation->length); } return; } exonerate-2.4.0/src/c4/codegen.test.c0000644000175000017500000000277311162714265014271 00000000000000/****************************************************************\ * * * Code generation module for C4 * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "codegen.h" gint Argument_main(Argument *arg){ register gchar *module = "testmodule"; register Codegen *c = Codegen_create(NULL, module); Codegen_printf(c, "%s\n\n%s\n", "#include ", "int test_func(){"); Codegen_indent(c, 1); Codegen_printf(c, "%s\n%s\n%s\n%s", "register int total = 1+2+3+4;", "printf(\"testing C4 plugin [%d]\\n\", total);", "return total;", "}"); Codegen_indent(c, -1); Codegen_compile(c, NULL, NULL); Codegen_destroy(c); return 0; } exonerate-2.4.0/src/c4/codegen.c0000644000175000017500000001765311162714263013314 00000000000000/****************************************************************\ * * * Code generation module for C4 * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For system() */ #include /* For strlen() */ #include /* For isalnum() */ #include /* For stat(), mkdir() */ #include /* For stat(), mkdir() */ #include "codegen.h" Codegen_ArgumentSet *Codegen_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static Codegen_ArgumentSet cas = {TRUE}; if(arg){ as = ArgumentSet_create("Code generation options"); ArgumentSet_add_option(as, 'C', "compiled", NULL, "Use compiled viterbi implementations", "TRUE", Argument_parse_boolean, &cas.use_compiled); Argument_absorb_ArgumentSet(arg, as); } return &cas; } /**/ gchar *Codegen_clean_path_component(gchar *name){ register gint i; register GString *file_name_str = g_string_sized_new(strlen(name)); register gchar *file_name, *tmp; g_assert(name); for(i = 0; name[i]; i++){ if(isalnum(name[i]) || (name[i] == '_')){ g_string_append_c(file_name_str, name[i]); } else { tmp = g_strdup_printf("_%d_", name[i]); g_string_append(file_name_str, tmp); g_free(tmp); } } file_name = file_name_str->str; g_string_free(file_name_str, FALSE); return file_name; } gboolean Codegen_directory_exists(gchar *path){ struct stat result; if(!stat(path, &result)) if(S_ISDIR(result.st_mode)) return TRUE; return FALSE; } gboolean Codegen_file_exists(gchar *path){ struct stat result; if(!stat(path, &result)) if(S_ISREG(result.st_mode)) return TRUE; return FALSE; } static void Codegen_directory_create(gchar *path){ register mode_t mode = S_IRUSR|S_IWUSR|S_IXUSR | S_IRGRP|S_IXGRP | S_IROTH|S_IXOTH; g_warning("Creating codegen directory [%s]", path); if(mkdir(path, mode)) g_error("Could not create directory [%s]", path); else g_warning("Creating codegen directory [%s]", path); return; } static gchar *Codegen_get_code_dir(gchar *directory){ register gchar *platform_directory, *clean_hosttype = Codegen_clean_path_component(HOSTTYPE); register gchar *default_directory = NULL; if(!directory){ default_directory = (gchar*)g_getenv("C4_CODEGEN_DIRECTORY"); if(!default_directory) default_directory = g_strconcat(SOURCE_ROOT_DIR, G_DIR_SEPARATOR_S, "codegen", NULL); directory = default_directory; } if(!Codegen_directory_exists(directory)) Codegen_directory_create(directory); platform_directory = g_strconcat(directory, G_DIR_SEPARATOR_S, clean_hosttype, NULL); if(!Codegen_directory_exists(platform_directory)) Codegen_directory_create(platform_directory); if(directory) g_free(directory); g_free(clean_hosttype); return platform_directory; } Codegen *Codegen_create(gchar *directory, gchar *name){ register Codegen *c = g_new(Codegen, 1); register gchar *code_dir = Codegen_get_code_dir(directory); c->name = Codegen_clean_path_component(name); c->code_path = g_strconcat(code_dir, G_DIR_SEPARATOR_S, c->name, ".c", NULL); c->object_path = g_strconcat(code_dir, G_DIR_SEPARATOR_S, c->name, ".o", NULL); c->fp = fopen(c->code_path, "w"); if(!c->fp){ perror("Writing codegen file"); g_error("Could not write codegen code to [%s]", c->code_path); } c->indent = 0; g_free(code_dir); return c; } void Codegen_destroy(Codegen *c){ g_assert(c); if(c->fp) fclose(c->fp); /* remove(c->code_path); */ g_free(c->code_path); g_free(c->name); g_free(c); return; } static gchar *Codegen_expand_format(Codegen *c, gchar *format){ register GString *str = g_string_sized_new(strlen(format)); register gint i, j; register gchar *expanded_format; for(i = 0; format[i]; i++){ g_string_append_c(str, format[i]); if((format[i] == '\n') /* If there is a newline */ && format[i+1] /* not at the end */ && (format[i+1] != '\n')){ /* or preceeding another newline */ for(j = 0; j < ((c->indent)<<2); j++) g_string_append_c(str, ' '); } } expanded_format = str->str; g_string_free(str, FALSE); return expanded_format; } /* Allows tidy printing of multi-line statements */ void Codegen_printf(Codegen *c, gchar *format, ...){ va_list args; register gchar *code; format = Codegen_expand_format(c, format); /* Reuse expanded format to stop compiler complaining */ va_start(args, format); code = g_strdup_vprintf(format, args); va_end(args); fprintf(c->fp, "%*s%s", (c->indent<<2), "", code); g_free(format); /* Free expanded format */ g_free(code); return; } void Codegen_indent(Codegen *c, gint indent_change){ g_assert((c->indent += indent_change) || TRUE); return; } /* Code indentation is only active when assertion checking is on. */ /**/ void Codegen_compile(Codegen *c, gchar *add_ccflags, gchar *add_ldflags){ register gchar *cc_command = "gcc"; /* Optimisation is turned off when assertions * are on for faster compilation */ #ifdef G_DISABLE_ASSERT register gchar *cc_flags = "-O2 -Wall"; #else /* G_DISABLE_ASSERT */ register gchar *cc_flags = "-g -Wall"; #endif /* G_DISABLE_ASSERT */ /* FIXME: optimisation * try some more aggressive compiler optimisations * -fomit-frame-pointer -O3 etc * or --arch ev6 / --arch ev67 with native cc on OSF1 * -O3 -tpp6 -xK */ gchar *compile_command; register gchar *tmp; /* Allow customistation of compilation with environment variables */ g_assert(c->indent == 0); fclose(c->fp); c->fp = NULL; /* If output already present, do not compile */ if(Codegen_file_exists(c->object_path)){ /* FIXME: unless object_path is older than model.o */ g_warning("Reusing codegen object [%s]", c->object_path); return; } tmp = (gchar*)g_getenv("CC"); if(tmp) cc_command = tmp; tmp = (gchar*)g_getenv("CFLAGS"); if(tmp) cc_flags = tmp; /* FIXME: these enviroment variables should be merged * with the command line options for this module. */ /**/ if(add_ccflags) cc_flags = g_strconcat(cc_flags, " ", add_ccflags, NULL); /**/ compile_command = g_strconcat(cc_command, " ", cc_flags, " -o ", c->object_path, " -c ", c->code_path, NULL); g_message("Compiling codegen:\n%s", compile_command); if(system(compile_command)) g_error("Problem compiling with\n%s", compile_command); g_free(compile_command); if(add_ccflags) g_free(cc_flags); return; } exonerate-2.4.0/src/c4/optimal.test.c0000644000175000017500000000314111162714266014321 00000000000000/****************************************************************\ * * * C4 dynamic programming library - DP layout code * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "optimal.h" #include "match.h" int Argument_main(Argument *arg){ register C4_Model *model = C4_Model_create("test"); register Optimal *optimal; register Match *match; Match_ArgumentSet_create(arg); Argument_process(arg, "optimal.test", NULL, NULL); match = Match_find(Match_Type_DNA2DNA); C4_Model_add_transition(model, "start to end", NULL, NULL, 1, 1, NULL, C4_Label_MATCH, match); C4_Model_close(model); optimal = Optimal_create(model, "optimal test", Optimal_Type_SCORE, FALSE); /**/ Optimal_destroy(optimal); C4_Model_destroy(model); return 0; } exonerate-2.4.0/src/c4/optimal.c0000644000175000017500000004111611162714266013347 00000000000000/****************************************************************\ * * * C4 dynamic programming library - optimal alignment code * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strlen() */ #include "slist.h" #include "optimal.h" Optimal *Optimal_create(C4_Model *model, gchar *name, Optimal_Type type, gboolean use_codegen){ register Optimal *optimal = g_new0(Optimal, 1); register gchar *viterbi_name; g_assert(model); g_assert(!model->is_open); optimal->ref_count = 1; if(name) optimal->name = g_strdup(name); else optimal->name = g_strdup_printf("optimal:%s", model->name); optimal->type = type; if(type & Optimal_Type_SCORE){ viterbi_name = g_strconcat(optimal->name, " find score", NULL); optimal->find_score = Viterbi_create(model, viterbi_name, Viterbi_Mode_FIND_SCORE, FALSE, use_codegen); g_free(viterbi_name); } if(type & Optimal_Type_PATH){ viterbi_name = g_strconcat(optimal->name, " find path", NULL); optimal->find_path = Viterbi_create(model, viterbi_name, Viterbi_Mode_FIND_PATH, FALSE, use_codegen); g_free(viterbi_name); } if(type & Optimal_Type_REDUCED_SPACE){ if(!C4_Model_is_global(model)){ viterbi_name = g_strconcat(optimal->name, " find region", NULL); optimal->find_region = Viterbi_create(model, viterbi_name, Viterbi_Mode_FIND_REGION, FALSE, use_codegen); g_free(viterbi_name); } viterbi_name = g_strconcat(optimal->name, " find checkpoint", NULL); optimal->find_checkpoint_continuation = Viterbi_create(model, viterbi_name, Viterbi_Mode_FIND_CHECKPOINTS, TRUE, use_codegen); g_free(viterbi_name); viterbi_name = g_strconcat(optimal->name, " find path continuation", NULL); optimal->find_path_continuation = Viterbi_create(model, viterbi_name, Viterbi_Mode_FIND_PATH, TRUE, use_codegen); g_free(viterbi_name); /**/ } return optimal; } /* FIXME: name should not be model->name + suffix, * but optimal->name + suffix */ void Optimal_destroy(Optimal *optimal){ g_assert(optimal); if(--optimal->ref_count) return; if(optimal->find_score) Viterbi_destroy(optimal->find_score); if(optimal->find_path) Viterbi_destroy(optimal->find_path); if(optimal->find_region) Viterbi_destroy(optimal->find_region); if(optimal->find_checkpoint_continuation) Viterbi_destroy(optimal->find_checkpoint_continuation); if(optimal->find_path_continuation) Viterbi_destroy(optimal->find_path_continuation); g_free(optimal->name); g_free(optimal); return; } Optimal *Optimal_share(Optimal *optimal){ g_assert(optimal); optimal->ref_count++; return optimal; } static void Optimal_add_codegen(Viterbi *viterbi, GPtrArray *codegen_list){ register Codegen *codegen; if(viterbi){ codegen = Viterbi_make_Codegen(viterbi); g_ptr_array_add(codegen_list, codegen); } return; } GPtrArray *Optimal_make_Codegen_list(Optimal *optimal){ register GPtrArray *codegen_list = g_ptr_array_new(); Optimal_add_codegen(optimal->find_score, codegen_list); Optimal_add_codegen(optimal->find_path, codegen_list); Optimal_add_codegen(optimal->find_region, codegen_list); Optimal_add_codegen(optimal->find_checkpoint_continuation, codegen_list); Optimal_add_codegen(optimal->find_path_continuation, codegen_list); g_assert(codegen_list->len); return codegen_list; } C4_Score Optimal_find_score(Optimal *optimal, Region *region, gpointer user_data, SubOpt *subopt){ register C4_Score score; register Viterbi_Data *vd; g_assert(optimal->find_score); vd = Viterbi_Data_create(optimal->find_score, region); score = Viterbi_calculate(optimal->find_score, region, vd, user_data, subopt); Viterbi_Data_destroy(vd); return score; } static Region *Optimal_find_region(Optimal *optimal, Region *region, gpointer user_data, C4_Score threshold, SubOpt *subopt, C4_Score *region_score){ register Region *alignment_region; register Viterbi_Data *vd; register C4_Score score; g_assert(optimal->type & Optimal_Type_REDUCED_SPACE); if(!optimal->find_region) /* Must be a global model */ return Region_share(region); /* Already know region */ g_assert(optimal->find_region); vd = Viterbi_Data_create(optimal->find_region, region); score = Viterbi_calculate(optimal->find_region, region, vd, user_data, subopt); if(score < threshold) return NULL; alignment_region = Region_share(vd->alignment_region); (*region_score) = score; g_assert(Region_is_valid(alignment_region)); g_assert(Region_is_within(region, alignment_region)); Viterbi_Data_destroy(vd); return alignment_region; } /**/ static C4_Score Optimal_find_checkpoints_continuation(Optimal *optimal, Region *region, gpointer user_data, SubOpt *subopt, GPtrArray **sub_vsa_list, C4_State *first_state, C4_Score *first_cell, C4_State *final_state, C4_Score *final_cell){ register C4_Score score = C4_IMPOSSIBLY_LOW_SCORE; register Viterbi_Data *vd; vd = Viterbi_Data_create(optimal->find_checkpoint_continuation, region); g_assert(vd->checkpoint); g_assert(!vd->continuation); Viterbi_Data_set_continuation(vd, first_state, first_cell, final_state, final_cell); g_assert(vd->continuation); score = Viterbi_calculate(optimal->find_checkpoint_continuation, region, vd, user_data, subopt); (*sub_vsa_list) = Viterbi_Checkpoint_traceback( optimal->find_checkpoint_continuation, vd, region, first_state, final_cell); Viterbi_Data_destroy(vd); return score; } static C4_Score Optimal_find_checkpoints_recur(Optimal *optimal, Region *region, gpointer user_data, SubOpt *subopt, SList *vsa_list, C4_State *first_state, C4_Score *first_cell, C4_State *final_state, C4_Score *final_cell){ register Viterbi_SubAlignment *vsa, *prev_vsa = NULL, *next_vsa; register C4_Score *sub_first_cell; register C4_State *sub_final_state; register C4_Score score; register gint i; register gboolean used_vsa = TRUE, prev_used_vsa = TRUE; GPtrArray *sub_vsa_list = NULL; score = Optimal_find_checkpoints_continuation( optimal, region, user_data, subopt, &sub_vsa_list, first_state, first_cell, final_state, final_cell); g_assert(sub_vsa_list); for(i = sub_vsa_list->len-1; i >= 0; i--){ vsa = sub_vsa_list->pdata[i]; g_assert(vsa); if(Viterbi_use_reduced_space(optimal->find_path, vsa->region)){ sub_first_cell = prev_vsa ? prev_vsa->final_cell : first_cell; if(i){ next_vsa = sub_vsa_list->pdata[i-1]; g_assert(next_vsa); sub_final_state = next_vsa->first_state; } else { sub_final_state = final_state; } Optimal_find_checkpoints_recur(optimal, vsa->region, user_data, subopt, vsa_list, vsa->first_state, sub_first_cell, sub_final_state, vsa->final_cell); used_vsa = FALSE; } else { SList_queue(vsa_list, vsa); used_vsa = TRUE; } if(prev_vsa && (!prev_used_vsa)) Viterbi_SubAlignment_destroy(prev_vsa); prev_vsa = vsa; prev_used_vsa = used_vsa; } g_ptr_array_free(sub_vsa_list, TRUE); return score; } static Alignment *Optimal_find_path_quadratic_space_continuation( Optimal *optimal, Region *region, gpointer user_data, SubOpt *subopt, C4_State *first_state, C4_Score *first_cell, C4_State *final_state, C4_Score *final_cell){ register Alignment *alignment; register C4_Score score; register Viterbi_Data *vd = Viterbi_Data_create(optimal->find_path_continuation, region); register AlignmentOperation *ao; g_assert(!Viterbi_use_reduced_space(optimal->find_path, region)); Viterbi_Data_set_continuation(vd, first_state, first_cell, final_state, final_cell); score = Viterbi_calculate(optimal->find_path_continuation, region, vd, user_data, subopt); g_assert(vd->curr_query_end == region->query_length); g_assert(vd->curr_target_end == region->target_length); g_assert(vd->continuation->final_cell[0] == score); alignment = Viterbi_Data_create_Alignment(vd, optimal->find_path_continuation->model, score, region); ao = alignment->operation_list->pdata[0]; g_assert(ao); g_assert(first_state->id == ao->transition->input->id); ao = alignment->operation_list->pdata [alignment->operation_list->len-1]; g_assert(ao); g_assert(final_state->id == ao->transition->output->id); Viterbi_Data_clear_continuation(vd); g_assert(score == alignment->score); Viterbi_Data_destroy(vd); return alignment; } static Alignment *Optimal_compute_subalignments(Optimal *optimal, Region *region, gpointer user_data, SubOpt *subopt, SList *vsa_list, C4_Score score, gint cell_size){ register gint i; register Viterbi_SubAlignment *vsa = NULL; register Alignment *sub_alignment, *alignment; register AlignmentOperation *ao; register C4_State *sub_final_state; register C4_Score *sub_first_cell; register SListNode *node; register C4_Transition *transition; register C4_Score *dummy_cell = g_new0(C4_Score, cell_size); alignment = Alignment_create(optimal->find_path->model, region, score); SList_for_each(vsa_list, node){ if(vsa){ sub_first_cell = vsa->final_cell; } else { sub_first_cell = dummy_cell; } if(node->next){ vsa = node->next->data; g_assert(vsa); sub_final_state = vsa->first_state; } else { sub_final_state = optimal->find_checkpoint_continuation ->model->end_state->state; } vsa = node->data; g_assert(vsa); g_assert(vsa->region); sub_alignment = Optimal_find_path_quadratic_space_continuation( optimal, vsa->region, user_data, subopt, vsa->first_state, sub_first_cell, sub_final_state, vsa->final_cell); for(i = 0; i < sub_alignment->operation_list->len; i++){ ao = sub_alignment->operation_list->pdata[i]; /* Map transitions back to find_path model */ transition = optimal->find_path->model->transition_list->pdata [ao->transition->id]; Alignment_add(alignment, transition, ao->length); } Alignment_destroy(sub_alignment); } g_free(dummy_cell); return alignment; } static Alignment *Optimal_find_path_reduced_space(Optimal *optimal, Region *region, gpointer user_data, SubOpt *subopt){ register Alignment *alignment; register SListSet *slist_set = SListSet_create(); register SList *vsa_list = SList_create(slist_set); register SListNode *node; register gint cell_size = optimal->find_checkpoint_continuation->cell_size; register C4_Score score, *dummy_first_cell = g_new0(C4_Score, cell_size), *dummy_final_cell = g_new0(C4_Score, cell_size); register C4_Model *model; register Viterbi_SubAlignment *vsa; g_assert(Viterbi_use_reduced_space(optimal->find_path, region)); model = optimal->find_checkpoint_continuation->model; score = Optimal_find_checkpoints_recur(optimal, region, user_data, subopt, vsa_list, model->start_state->state, dummy_first_cell, model->end_state->state, dummy_final_cell); alignment = Optimal_compute_subalignments(optimal, region, user_data, subopt, vsa_list, score, cell_size); SList_for_each(vsa_list, node){ vsa = node->data; Viterbi_SubAlignment_destroy(vsa); } SList_destroy(vsa_list); SListSet_destroy(slist_set); g_free(dummy_first_cell); g_free(dummy_final_cell); return alignment; } /**/ static Alignment *Optimal_find_path_quadratic_space( Optimal *optimal, Region *region, gpointer user_data, SubOpt *subopt){ register Alignment *alignment; register C4_Score score; register Viterbi_Data *vd = Viterbi_Data_create(optimal->find_path, region); g_assert(!Viterbi_use_reduced_space(optimal->find_path, region)); score = Viterbi_calculate(optimal->find_path, region, vd, user_data, subopt); alignment = Viterbi_Data_create_Alignment(vd, optimal->find_path->model, score, region); g_assert(score == alignment->score); Viterbi_Data_destroy(vd); return alignment; } /**/ Alignment *Optimal_find_path(Optimal *optimal, Region *region, gpointer user_data, C4_Score threshold, SubOpt *subopt){ register Alignment *alignment = NULL; register Region *alignment_region; C4_Score region_score = C4_IMPOSSIBLY_LOW_SCORE; /**/ g_assert(!optimal->find_path->model->is_open); if(Viterbi_use_reduced_space(optimal->find_path, region)){ alignment_region = Optimal_find_region(optimal, region, user_data, threshold, subopt, ®ion_score); if(!alignment_region) /* No region when score below threshold */ return NULL; if(Viterbi_use_reduced_space(optimal->find_path, alignment_region)){ alignment = Optimal_find_path_reduced_space(optimal, alignment_region, user_data, subopt); } else { alignment = Optimal_find_path_quadratic_space( optimal, alignment_region, user_data, subopt); } g_assert(alignment); g_assert(Region_is_same(alignment_region, alignment->region)); #ifndef G_DISABLE_ASSERT if(alignment->score != region_score){ g_warning("Region score DIFF have:[%d] expect:[%d]", alignment->score, region_score); } #endif /* G_DISABLE_ASSERT */ g_assert(alignment->score == region_score); Region_destroy(alignment_region); } else { alignment = Optimal_find_path_quadratic_space( optimal, region, user_data, subopt); } g_assert(optimal->find_path); g_assert(alignment->model == optimal->find_path->model); g_assert(Alignment_is_valid(alignment, region, user_data)); if(alignment->score < threshold){ Alignment_destroy(alignment); return NULL; } return alignment; } exonerate-2.4.0/src/c4/layout.c0000644000175000017500000004035011162714265013215 00000000000000/****************************************************************\ * * * C4 dynamic programming library - DP layout code * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "layout.h" #include "matrix.h" /**/ static gboolean Layout_model_has_state_active(C4_Model *model, C4_State *state, gint query_pos, gint target_pos, gint query_length, gint target_length){ if(query_pos < 0) return FALSE; if(target_pos < 0) return FALSE; if(query_pos > query_length) return FALSE; if(target_pos > target_length) return FALSE; /**/ if(state == model->start_state->state){ switch(model->start_state->scope){ case C4_Scope_ANYWHERE: break; case C4_Scope_EDGE: if((query_pos != 0) && (target_pos != 0)) return FALSE; break; case C4_Scope_QUERY: if(query_pos != 0) return FALSE; break; case C4_Scope_TARGET: if(target_pos != 0) return FALSE; break; case C4_Scope_CORNER: if((query_pos != 0) || (target_pos != 0)) return FALSE; break; default: g_assert(!"Bad start scope type"); break; } } /**/ if(state == model->end_state->state){ switch(model->end_state->scope){ case C4_Scope_ANYWHERE: break; case C4_Scope_EDGE: if((query_pos != query_length) && (target_pos != target_length)) return FALSE; break; case C4_Scope_QUERY: if(query_pos != query_length) return FALSE; break; case C4_Scope_TARGET: if(target_pos != target_length) return FALSE; break; case C4_Scope_CORNER: if((query_pos != query_length) || (target_pos != target_length)) return FALSE; break; default: g_assert(!"Bad end scope type"); break; } } return TRUE; } static gboolean Layout_model_state_is_valid_input(C4_Model *model, C4_State *state, gint query_pos, gint target_pos, gint query_length, gint target_length){ register gint i; register C4_Transition *transition; if(state == model->start_state->state) return TRUE; for(i = 0; i < state->input_transition_list->len; i++){ transition = state->input_transition_list->pdata[i]; if((transition->advance_query == 0) && (transition->advance_target == 0)){ /* If silent */ if(Layout_model_state_is_valid_input(model, transition->input, query_pos, target_pos, query_length, target_length)) return TRUE; } else { /* Emits */ if(Layout_model_has_state_active(model, transition->input, query_pos-transition->advance_query, target_pos-transition->advance_target, query_length, target_length)) return TRUE; } } return FALSE; } /* The input state must be in scope, * and be traceable to a non-silent transition * from an state which is also in scope. */ static gboolean Layout_transition_is_valid(C4_Model *model, C4_Transition *transition, gint dst_query_pos, gint dst_target_pos, gint query_length, gint target_length){ /* Check transition input is valid */ if(!Layout_model_has_state_active(model, transition->input, dst_query_pos-transition->advance_query, dst_target_pos-transition->advance_target, query_length, target_length)){ return FALSE; } /* Check transition output is valid */ if(!Layout_model_has_state_active(model, transition->output, dst_query_pos, dst_target_pos, query_length, target_length)){ return FALSE; } /* Check input score will be present. * * This must be disabled for continuation DP to work properly. * also generated code is smaller without it, * and the DP seems to be just as fast. */ if(FALSE){ /* DISABLED */ if(!Layout_model_state_is_valid_input(model, transition->input, dst_query_pos-transition->advance_query, dst_target_pos-transition->advance_target, query_length, target_length)){ return FALSE; } } return TRUE; } gboolean Layout_is_transition_valid(Layout *layout, C4_Model *model, C4_Transition *transition, gint dst_query_pos, gint dst_target_pos, gint query_length, gint target_length){ register Layout_Row *row; register Layout_Cell *cell; register Layout_Mask *mask; g_assert(layout); row = layout->row_list->pdata[MIN(dst_target_pos, (layout->row_list->len-1))]; g_assert(row); cell = row->cell_list->pdata[MIN(dst_query_pos, (row->cell_list->len-1))]; g_assert(cell); if(dst_query_pos == query_length){ if(dst_target_pos == target_length){ /* QT */ mask = cell->corner; if(!mask) mask = cell->end_target; if(!mask) mask = cell->normal; } else { /* Q */ mask = cell->end_query; if(!mask) mask = cell->normal; } } else { if(dst_target_pos == target_length){ /* T */ mask = cell->end_target; if(!mask) mask = cell->normal; } else { /* - */ mask = cell->normal; } } g_assert(mask); g_assert(Layout_transition_is_valid(model, transition, dst_query_pos, dst_target_pos, query_length, target_length) == g_array_index(mask->transition_mask, gboolean, transition->id)); return g_array_index(mask->transition_mask, gboolean, transition->id); } /* We could just call Layout_transition_is_valid(), but this way is slightly faster. FIXME: should change Layout to allow for faster lookup here. (currently makes code generation fast, but interp slow ...) */ /**/ static Layout_Mask *Layout_Mask_create(C4_Model *model, gint query_pos, gint target_pos, gint query_length, gint target_length){ register Layout_Mask *mask = g_new(Layout_Mask, 1); register gint i; register C4_Transition *transition; gboolean is_valid; mask->transition_mask = g_array_new(FALSE, TRUE, sizeof(gboolean)); for(i = 0; i < model->transition_list->len; i++){ transition = model->transition_list->pdata[i]; is_valid = Layout_transition_is_valid(model, transition, query_pos, target_pos, query_length, target_length); g_array_append_val(mask->transition_mask, is_valid); } return mask; } static void Layout_Mask_destroy(Layout_Mask *mask){ g_array_free(mask->transition_mask, TRUE); g_free(mask); return; } static gboolean Layout_Mask_is_same(Layout_Mask *a, Layout_Mask *b){ register gint i; register gboolean value_a, value_b; if(!(a || b)) return TRUE; if(!(a && b)) return FALSE; if(a->transition_mask->len != b->transition_mask->len) return FALSE; for(i = 0; i < a->transition_mask->len; i++){ value_a = g_array_index(a->transition_mask, gboolean, i); value_b = g_array_index(b->transition_mask, gboolean, i); if(value_a != value_b) return FALSE; } return TRUE; } static void Layout_Mask_info(Layout_Mask *mask, C4_Model *model, gchar *name){ register gint i, count = 0; register gboolean is_valid; if(!mask) return; g_assert(mask->transition_mask->len == model->transition_list->len); g_print("["); for(i = 0; i < mask->transition_mask->len; i++){ is_valid = g_array_index(mask->transition_mask, gboolean, i); if(is_valid){ count++; g_print("#"); } else { g_print(" "); } } g_print("] [%s] [%d]\n", name, count); return; } /**/ static Layout_Cell *Layout_Cell_create(C4_Model *model, gint query_pos, gint target_pos, gint query_length, gint target_length){ register Layout_Cell *cell = g_new(Layout_Cell, 1); cell->normal = Layout_Mask_create(model, query_pos, target_pos, query_length, target_length); cell->end_query = Layout_Mask_create(model, query_pos, target_pos, query_pos, target_length); cell->end_target = Layout_Mask_create(model, query_pos, target_pos, query_length, target_pos); cell->corner = Layout_Mask_create(model, query_pos, target_pos, query_pos, target_pos); /* Skip EndQuery if same as Normal */ if(Layout_Mask_is_same(cell->normal, cell->end_query)){ Layout_Mask_destroy(cell->end_query); cell->end_query = NULL; } /* Skip Corner if same as EndTarget */ if(Layout_Mask_is_same(cell->corner, cell->end_target)){ Layout_Mask_destroy(cell->corner); cell->corner = NULL; /* Also skip EndTarget if same as Normal */ if(Layout_Mask_is_same(cell->normal, cell->end_target)){ Layout_Mask_destroy(cell->end_target); cell->end_target = NULL; } } /* FIXME: disabled as this was incorrect for some non-local models */ #if 0 /* Skip EndTarget (and Corner if present) if same as Normal * and Corner?Corner:Normal is same as EndQuery?EndQuery:Normal */ if(Layout_Mask_is_same(cell->normal, cell->end_target) && Layout_Mask_is_same(cell->corner ? cell->corner : cell->end_target, cell->end_query ? cell->end_query : cell->normal)){ Layout_Mask_destroy(cell->end_target); cell->end_target = NULL; if(cell->corner){ Layout_Mask_destroy(cell->corner); cell->corner = NULL; } } #endif /* 0 */ return cell; } static void Layout_Cell_destroy(Layout_Cell *cell){ g_assert(cell); g_assert(cell->normal); Layout_Mask_destroy(cell->normal); if(cell->end_query) Layout_Mask_destroy(cell->end_query); if(cell->end_target) Layout_Mask_destroy(cell->end_target); if(cell->corner) Layout_Mask_destroy(cell->corner); g_free(cell); return; } static gboolean Layout_Cell_is_same(Layout_Cell *a, Layout_Cell *b){ if(!Layout_Mask_is_same(a->normal, b->normal)) return FALSE; if(!Layout_Mask_is_same(a->end_query, b->end_query)) return FALSE; if(!Layout_Mask_is_same(a->end_target, b->end_target)) return FALSE; if(!Layout_Mask_is_same(a->corner, b->corner)) return FALSE; return TRUE; } static void Layout_Cell_info(Layout_Cell *cell, C4_Model *model){ Layout_Mask_info(cell->normal, model, "normal"); Layout_Mask_info(cell->end_query, model, "end_query"); Layout_Mask_info(cell->end_target, model, "end_target"); Layout_Mask_info(cell->corner, model, "corner"); return; } /**/ static Layout_Row *Layout_Row_create(C4_Model *model, gint row_number, gint query_length, gint target_length){ register Layout_Row *row = g_new(Layout_Row, 1); register Layout_Cell *cell; row->cell_list = g_ptr_array_new(); do { cell = Layout_Cell_create(model, row->cell_list->len, row_number, query_length, target_length); if(row->cell_list->len >= model->max_query_advance) if(Layout_Cell_is_same(cell, row->cell_list->pdata[row->cell_list->len-1])){ Layout_Cell_destroy(cell); break; } g_ptr_array_add(row->cell_list, cell); g_assert(row->cell_list->len < 1024); /* safety check */ } while(TRUE); return row; } static void Layout_Row_destroy(Layout_Row *row){ register gint i; for(i = 0; i < row->cell_list->len; i++) Layout_Cell_destroy(row->cell_list->pdata[i]); g_ptr_array_free(row->cell_list, TRUE); g_free(row); return; } static gboolean Layout_Row_is_same(Layout_Row *a, Layout_Row *b){ register gint i; register Layout_Cell *cell_a, *cell_b; g_assert(a); g_assert(b); if(a->cell_list->len != b->cell_list->len) return FALSE; for(i = 0; i < a->cell_list->len; i++){ cell_a = a->cell_list->pdata[i]; cell_b = b->cell_list->pdata[i]; if(!Layout_Cell_is_same(cell_a, cell_b)) return FALSE; } return TRUE; } static void Layout_Row_info(Layout_Row *row, C4_Model *model){ register gint i; g_print("Layout_Cell count: [%d]\n", row->cell_list->len); for(i = 0; i < row->cell_list->len; i++){ g_print("Layout_Cell [%d]:\n", i); Layout_Cell_info(row->cell_list->pdata[i], model); } return; } /**/ Layout *Layout_create(C4_Model *model){ register Layout *layout = g_new(Layout, 1); register Layout_Row *row; g_assert(model); g_assert(!model->is_open); layout->row_list = g_ptr_array_new(); do { row = Layout_Row_create(model, layout->row_list->len, 1024, 1024); if(layout->row_list->len >= model->max_target_advance) if(Layout_Row_is_same(row, layout->row_list->pdata[layout->row_list->len-1])){ Layout_Row_destroy(row); break; } g_ptr_array_add(layout->row_list, row); g_assert(layout->row_list->len < 1024); /* safety check */ } while(TRUE); return layout; } /* FIXME: optimisation * may be possible to trim the pattern for some models * where dimensions are <= layout->max_{query,target}_advance */ void Layout_destroy(Layout *layout){ register gint i; for(i = 0; i < layout->row_list->len; i++) Layout_Row_destroy(layout->row_list->pdata[i]); g_ptr_array_free(layout->row_list, TRUE); g_free(layout); return; } void Layout_info(Layout *layout, C4_Model *model){ register gint i; register C4_Transition *transition; g_print("Layout for model [%s]\n", model->name); for(i = 0; i < model->transition_list->len; i++){ transition = model->transition_list->pdata[i]; g_message("transition [%d] [%s]", i, transition->name); } g_print("Layout_Row count: [%d]\n", layout->row_list->len); for(i = 0; i < layout->row_list->len; i++){ g_print("Layout_Row [%d]:\n", i); Layout_Row_info(layout->row_list->pdata[i], model); } return; } /**/ exonerate-2.4.0/src/c4/viterbi.c0000644000175000017500000021434511162714266013354 00000000000000/****************************************************************\ * * * C4 dynamic programming library - the Viterbi implementation * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strlen() */ #include "viterbi.h" #include "matrix.h" #include "cgutil.h" #ifdef USE_COMPILED_MODELS #include "c4_model_archive.h" #endif /* USE_COMPILED_MODELS */ /**/ Viterbi_ArgumentSet *Viterbi_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static Viterbi_ArgumentSet vas = {32}; if(arg){ as = ArgumentSet_create("Viterbi algorithm options"); ArgumentSet_add_option(as, 'D', "dpmemory", "Mb", "Maximum memory to use for DP tracebacks (Mb)", "32", Argument_parse_int, &vas.traceback_memory_limit); Argument_absorb_ArgumentSet(arg, as); } return &vas; } /**/ static gsize Viterbi_get_cell_size(Viterbi *viterbi){ register gint cell_size = 1 + viterbi->model->total_shadow_designations; if(viterbi->mode == Viterbi_Mode_FIND_REGION){ if(viterbi->model->start_state->scope != C4_Scope_CORNER){ if(viterbi->model->start_state->scope != C4_Scope_QUERY) cell_size++; /* For query_start shadow */ if(viterbi->model->start_state->scope != C4_Scope_TARGET) cell_size++; /* For target_start shadow */ } } if(viterbi->mode == Viterbi_Mode_FIND_CHECKPOINTS) cell_size++; /* For checkpoint shadow */ return cell_size; } Viterbi *Viterbi_create(C4_Model *model, gchar *name, Viterbi_Mode mode, gboolean use_continuation, gboolean use_codegen){ register Viterbi *viterbi = g_new(Viterbi, 1); register Codegen_ArgumentSet *cas = Codegen_ArgumentSet_create(NULL); g_assert(model); g_assert(name); viterbi->vas = Viterbi_ArgumentSet_create(NULL); viterbi->name = Codegen_clean_path_component(name); if(use_continuation){ viterbi->model = C4_Model_copy(model); /* Make model global for continuation */ C4_Model_configure_start_state(viterbi->model, C4_Scope_CORNER, viterbi->model->start_state->cell_start_func, viterbi->model->start_state->cell_start_macro); C4_Model_configure_end_state(viterbi->model, C4_Scope_CORNER, viterbi->model->end_state->cell_end_func, viterbi->model->end_state->cell_end_macro); } else { viterbi->model = C4_Model_share(model); } viterbi->func = NULL; if(use_codegen && cas->use_compiled){ #ifdef USE_COMPILED_MODELS viterbi->func = Bootstrapper_lookup(viterbi->name); if(!viterbi->func) g_warning("Could not find compiled implementation of [%s]", viterbi->name); #else /* USE_COMPILED_MODELS */ g_warning("No compiled models - will be very slow"); #endif /* USE_COMPILED_MODELS */ } viterbi->mode = mode; viterbi->use_continuation = use_continuation; viterbi->cell_size = Viterbi_get_cell_size(viterbi); viterbi->layout = Layout_create(viterbi->model); return viterbi; } void Viterbi_destroy(Viterbi *viterbi){ C4_Model_destroy(viterbi->model); Layout_destroy(viterbi->layout); g_free(viterbi->name); g_free(viterbi); return; } /**/ static gsize Viterbi_get_row_size(Viterbi *viterbi, Region *region){ register gint mat_size; mat_size = Matrix4d_size(viterbi->model->max_target_advance+1, region->query_length+1, viterbi->model->state_list->len, viterbi->cell_size, sizeof(C4_Score)); if(!mat_size) return 0; return sizeof(Viterbi_Row) + mat_size; } static gsize Viterbi_traceback_memory_size(Viterbi *viterbi, Region *region){ return Matrix3d_size(region->query_length+1, region->target_length+1, viterbi->model->state_list->len, sizeof(C4_Transition*)); } gboolean Viterbi_use_reduced_space(Viterbi *viterbi, Region *region){ register gsize row_memory = Viterbi_get_row_size(viterbi, region); register gsize traceback_memory = Viterbi_traceback_memory_size(viterbi, region); register gsize memory_limit = viterbi->vas->traceback_memory_limit << 20; g_assert(Region_is_valid(region)); if(region->query_length <= (viterbi->model->max_query_advance * 6)) return FALSE; if(region->target_length <= (viterbi->model->max_target_advance * 6)) return FALSE; if((!row_memory) || (!traceback_memory)){ /* overflow */ /* Region_print(region, "using redspace (overflow)"); */ return TRUE; } if((row_memory + traceback_memory ) > memory_limit){ /* Region_print(region, "using redspace (memory limit)"); */ return TRUE; } return FALSE; } /**/ static Viterbi_Row *Viterbi_Row_create(Viterbi *viterbi, Region *region){ register Viterbi_Row *vr = g_new0(Viterbi_Row, 1); register gint i, j, k; g_assert(region); vr->region_start_query_id = -1; vr->region_start_target_id = -1; vr->checkpoint_id = -1; vr->cell_size = 1 + viterbi->model->total_shadow_designations; if(viterbi->mode == Viterbi_Mode_FIND_REGION){ if(viterbi->model->start_state->scope != C4_Scope_CORNER){ if(viterbi->model->start_state->scope != C4_Scope_QUERY) vr->region_start_query_id = vr->cell_size++; if(viterbi->model->start_state->scope != C4_Scope_TARGET) vr->region_start_target_id = vr->cell_size++; } } if(viterbi->mode == Viterbi_Mode_FIND_CHECKPOINTS) vr->checkpoint_id = vr->cell_size++; g_assert(vr->cell_size == viterbi->cell_size); vr->score_matrix = (C4_Score****) Matrix4d_create(viterbi->model->max_target_advance+1, region->query_length+1, viterbi->model->state_list->len, vr->cell_size, sizeof(C4_Score)); for(i = 0; i < viterbi->model->max_target_advance+1; i++) for(j = 0; j < region->query_length+1; j++) for(k = 0; k < viterbi->model->state_list->len; k++) vr->score_matrix[i][j][k][0] = C4_IMPOSSIBLY_LOW_SCORE; return vr; } static void Viterbi_Row_destroy(Viterbi_Row *vr){ g_free(vr->score_matrix); g_free(vr); return; } /**/ static gsize Viterbi_Row_get_size(Viterbi *viterbi, Region *region){ register gint mat_size; mat_size = Matrix4d_size(viterbi->model->max_target_advance+1, region->query_length+1, viterbi->model->state_list->len, viterbi->cell_size, sizeof(C4_Score)); if(!mat_size) return 0; return sizeof(Viterbi_Row) + mat_size; } static gint Viterbi_checkpoint_rows(Viterbi *viterbi, Region *region){ register gsize row_memory = Viterbi_Row_get_size(viterbi, region); register gint avail_rows = ((viterbi->vas->traceback_memory_limit << 20)/row_memory) - 1; register gint max_rows = (region->target_length / (viterbi->model->max_target_advance << 1)) - 2; g_assert(max_rows > 0); if(avail_rows < 1) return 1; return MIN(avail_rows, max_rows); } static C4_Transition ****Viterbi_traceback_memory_create( Viterbi *viterbi, Region *region){ return (C4_Transition****)Matrix3d_create( region->query_length+1, region->target_length+1, viterbi->model->state_list->len, sizeof(C4_Transition*)); } /**/ static Viterbi_Checkpoint *Viterbi_Checkpoint_create(Viterbi *viterbi, Region *region, Viterbi_Row *vr){ register Viterbi_Checkpoint *vc = g_new0(Viterbi_Checkpoint, 1); register gint cp_count = Viterbi_checkpoint_rows(viterbi, region); register gint i; register C4_Score ****cp; g_assert(cp_count > 0); vc->checkpoint_list = g_ptr_array_new(); vc->cell_size = vr->cell_size; for(i = 0; i < cp_count; i++){ cp = (C4_Score****)Matrix4d_create( viterbi->model->max_target_advance+1, region->query_length+1, viterbi->model->state_list->len, vc->cell_size, sizeof(C4_Score)); g_ptr_array_add(vc->checkpoint_list, cp); } vc->section_length = region->target_length / (vc->checkpoint_list->len+1); vc->counter = 0; return vc; } static void Viterbi_Checkpoint_destroy(Viterbi_Checkpoint *vc){ register gint i; for(i = 0; i < vc->checkpoint_list->len; i++) g_free(vc->checkpoint_list->pdata[i]); g_ptr_array_free(vc->checkpoint_list, TRUE); g_free(vc); return; } /**/ static Viterbi_Continuation *Viterbi_Continuation_create( C4_State *first_state, C4_Score *first_cell, C4_State *final_state, C4_Score *final_cell){ register Viterbi_Continuation *vc = g_new(Viterbi_Continuation, 1); g_assert(first_state); g_assert(first_cell); g_assert(final_state); g_assert(final_cell); vc->first_state = first_state; vc->first_cell = first_cell; vc->final_state = final_state; vc->final_cell = final_cell; return vc; } static void Viterbi_Continuation_destroy(Viterbi_Continuation *vc){ g_assert(vc); g_free(vc); return; } /**/ void Viterbi_Data_set_continuation(Viterbi_Data *vd, C4_State *first_state, C4_Score *first_cell, C4_State *final_state, C4_Score *final_cell){ g_assert(!vd->continuation); vd->continuation = Viterbi_Continuation_create(first_state, first_cell, final_state, final_cell); return; } void Viterbi_Data_clear_continuation(Viterbi_Data *vd){ if(vd->continuation){ Viterbi_Continuation_destroy(vd->continuation); vd->continuation = NULL; } return; } /**/ Viterbi_Data *Viterbi_Data_create(Viterbi *viterbi, Region *region){ register Viterbi_Data *vd = g_new0(Viterbi_Data, 1); g_assert(viterbi); g_assert(region); vd->vr = Viterbi_Row_create(viterbi, region); if(viterbi->mode == Viterbi_Mode_FIND_REGION) vd->alignment_region = Region_copy(region); if(viterbi->mode == Viterbi_Mode_FIND_PATH) vd->traceback = Viterbi_traceback_memory_create(viterbi, region); if(viterbi->mode == Viterbi_Mode_FIND_CHECKPOINTS) vd->checkpoint = Viterbi_Checkpoint_create(viterbi, region, vd->vr); return vd; } void Viterbi_Data_destroy(Viterbi_Data *vd){ g_assert(vd); if(vd->alignment_region) Region_destroy(vd->alignment_region); if(vd->traceback) g_free(vd->traceback); if(vd->checkpoint) Viterbi_Checkpoint_destroy(vd->checkpoint); Viterbi_Data_clear_continuation(vd); Viterbi_Row_destroy(vd->vr); g_free(vd); return; } /**/ Alignment *Viterbi_Data_create_Alignment(Viterbi_Data *vd, C4_Model *model, C4_Score score, Region *region){ register Alignment *alignment; register GPtrArray *transition_path = g_ptr_array_new(); register gint i = vd->curr_query_end, j = vd->curr_target_end; register Region *alignment_region; register C4_Transition *transition; g_assert(vd->traceback); if(vd->continuation) transition = vd->traceback[i][j] [vd->continuation->final_state->id]; else transition = vd->traceback[i][j] [model->end_state->state->id]; g_assert(transition); do { g_ptr_array_add(transition_path, transition); i -= transition->advance_query; j -= transition->advance_target; transition = vd->traceback[i][j][transition->input->id]; if(!transition) break; if(transition->input == model->start_state->state){ g_ptr_array_add(transition_path, transition); i -= transition->advance_query; j -= transition->advance_target; break; } if(vd->continuation){ if(!(i|j)){ if(transition->output->id == vd->continuation->first_state->id){ break; /* continuation DP exit */ } } } g_assert(transition); } while(TRUE); alignment_region = Region_create(region->query_start + i, region->target_start + j, vd->curr_query_end - i, vd->curr_target_end - j); alignment = Alignment_create(model, alignment_region, score); for(i = transition_path->len-1; i >= 0; i--){ /* Reverse order */ transition = transition_path->pdata[i]; Alignment_add(alignment, transition, 1); } g_ptr_array_free(transition_path, TRUE); Region_destroy(alignment_region); return alignment; } /**/ static void Viterbi_Row_shadow_start(Viterbi_Row *vr, C4_Model *model, C4_Transition *transition, Region *region, C4_Score *src, gint query_pos, gint target_pos, gpointer user_data){ register gint i; register C4_Shadow *shadow; if(transition->input == model->start_state->state){ if(vr->region_start_query_id != -1){ src[vr->region_start_query_id] = query_pos - transition->advance_query; } if(vr->region_start_target_id != -1){ src[vr->region_start_target_id] = target_pos - transition->advance_target; } } /* Call start funcs from src */ for(i = 0; i < transition->input->src_shadow_list->len; i++){ shadow = transition->input->src_shadow_list->pdata[i]; src[shadow->designation+1] = shadow->start_func( region->query_start + query_pos - transition->advance_query, region->target_start + target_pos - transition->advance_target, user_data); } return; } static void Viterbi_Row_shadow_end(Viterbi_Row *vr, C4_Model *model, C4_Transition *transition, Region *region, C4_Score *src, gint query_pos, gint target_pos, gpointer user_data){ register gint i; register C4_Shadow *shadow; for(i = 0; i < transition->dst_shadow_list->len; i++){ shadow = transition->dst_shadow_list->pdata[i]; shadow->end_func(src[shadow->designation+1], region->query_start + query_pos - transition->advance_query, region->target_start + target_pos - transition->advance_target, user_data); } return; } static void Viterbi_Data_assign(Viterbi *viterbi, Viterbi_Data *vd, C4_Score *src, C4_Score *dst, C4_Score score, gint query_pos, gint target_pos, C4_Transition *transition, Region *region, gpointer user_data){ register gint i; dst[0] = score; /* Call shadow start on src */ Viterbi_Row_shadow_start(vd->vr, viterbi->model, transition, region, src, query_pos, target_pos, user_data); for(i = 1; i < vd->vr->cell_size; i++) /* Shadow transport */ dst[i] = src[i]; if(viterbi->mode == Viterbi_Mode_FIND_PATH) vd->traceback[query_pos][target_pos] [transition->output->id] = transition; return; } static void Viterbi_Data_register_end(Viterbi_Data *vd, C4_Score *cell, gint query_pos, gint target_pos){ vd->curr_query_end = query_pos; vd->curr_target_end = target_pos; if(vd->vr->region_start_query_id != -1){ vd->curr_query_start = cell[vd->vr->region_start_query_id]; } if(vd->vr->region_start_target_id != -1){ vd->curr_target_start = cell[vd->vr->region_start_target_id]; } return; } /**/ static Viterbi_SubAlignment *Viterbi_SubAlignment_create( gint query_start, gint target_start, gint query_length, gint target_length, C4_State *first_state, C4_Score *final_cell, gint cell_length){ register Viterbi_SubAlignment *vsa = g_new(Viterbi_SubAlignment, 1); register gint i; vsa->region = Region_create(query_start, target_start, query_length, target_length); vsa->first_state = first_state; vsa->final_cell = g_new(C4_Score, cell_length); for(i = 0; i < cell_length; i++) vsa->final_cell[i] = final_cell[i]; return vsa; } void Viterbi_SubAlignment_destroy(Viterbi_SubAlignment *vsa){ Region_destroy(vsa->region); g_free(vsa->final_cell); g_free(vsa); return; } /**/ typedef struct { C4_State *state; gint row; gint pos; } Viterbi_Checkpoint_SRP; /* This encoding is safe because we can address the checkpoint matrix. */ static gint Viterbi_Checkpoint_SRP_encode(C4_Model *model, gint state_id, gint row_id, gint query_pos){ return ( ( (query_pos * model->state_list->len) + state_id) * model->max_target_advance) + row_id; } static void Viterbi_Checkpoint_SRP_decode(C4_Model *model, gint srp, Viterbi_Checkpoint_SRP *result){ register gint remainder; result->row = srp % model->max_target_advance; remainder = srp / model->max_target_advance; result->state = model->state_list->pdata [remainder % model->state_list->len]; result->pos = remainder / model->state_list->len; g_assert(srp == Viterbi_Checkpoint_SRP_encode(model, result->state->id, result->row, result->pos)); return; } GPtrArray *Viterbi_Checkpoint_traceback(Viterbi *viterbi, Viterbi_Data *vd, Region *region, C4_State *first_state, C4_Score *final_cell){ register gint prev_row; register Viterbi_SubAlignment *vsa; register gint query_start, target_start; register C4_Score cp_srp; register C4_Score ****checkpoint, *cell = final_cell; register gint i; register GPtrArray *vsa_list = g_ptr_array_new(); Viterbi_Checkpoint_SRP srp; g_assert(vd->checkpoint); Viterbi_Checkpoint_SRP_decode(viterbi->model, vd->checkpoint->last_srp, &srp); query_start = region->query_start + srp.pos; target_start = region->target_start + (vd->checkpoint->section_length * vd->checkpoint->checkpoint_list->len) - srp.row; vsa = Viterbi_SubAlignment_create( query_start, target_start, Region_query_end(region)-query_start, Region_target_end(region)-target_start, srp.state, final_cell, vd->checkpoint->cell_size); g_assert(Region_is_within(region, vsa->region)); g_ptr_array_add(vsa_list, vsa); prev_row = srp.row; for(i = vd->checkpoint->checkpoint_list->len-1; i >= 1; i--){ checkpoint = vd->checkpoint->checkpoint_list->pdata[i]; prev_row = srp.row; cp_srp = checkpoint[prev_row] [vsa->region->query_start -region->query_start] [vsa->first_state->id] [vd->checkpoint->cell_size-1]; Viterbi_Checkpoint_SRP_decode(viterbi->model, cp_srp, &srp); query_start = region->query_start + srp.pos; target_start = vsa->region->target_start - vd->checkpoint->section_length - srp.row + prev_row; cell = checkpoint[prev_row] [vsa->region->query_start-region->query_start] [vsa->first_state->id]; vsa = Viterbi_SubAlignment_create(query_start, target_start, vsa->region->query_start-query_start, vsa->region->target_start-target_start, srp.state, cell, vd->checkpoint->cell_size); g_assert(Region_is_within(region, vsa->region)); g_ptr_array_add(vsa_list, vsa); } /* Set checkpoint */ checkpoint = vd->checkpoint->checkpoint_list->pdata[0]; /* Set final cell */ cell = checkpoint[srp.row] [vsa->region->query_start-region->query_start] [vsa->first_state->id]; vsa = Viterbi_SubAlignment_create(region->query_start, region->target_start, query_start-region->query_start, target_start-region->target_start, first_state, cell, vd->checkpoint->cell_size); g_ptr_array_add(vsa_list, vsa); return vsa_list; } /**/ static void Viterbi_Checkpoint_process(Viterbi_Checkpoint *vc, C4_Model *model, Region *region, gint target_pos, C4_Score ****prev_row){ register gint i, j, k, l; register C4_Score ****checkpoint; if((!(target_pos % vc->section_length)) && target_pos){ if(vc->counter < vc->checkpoint_list->len){ checkpoint = vc->checkpoint_list->pdata[vc->counter++]; for(i = 0; i < model->max_target_advance; i++){ for(j = 0; j <= region->query_length; j++){ for(k = 0; k < model->state_list->len; k++){ /* Copy cell to checkpoint */ for(l = 0; l < vc->cell_size; l++){ checkpoint[i][j][k][l] = prev_row[i][j][k][l]; } /* Load state/pos to last shadow */ prev_row[i][j][k][vc->cell_size-1] = Viterbi_Checkpoint_SRP_encode(model, k, i, j); } } } } } return; } static void Viterbi_Data_finalise(Viterbi_Data *vd, Region *region){ if(vd->alignment_region){ if(vd->vr->region_start_query_id != -1) vd->alignment_region->query_start = vd->curr_query_start + region->query_start; if(vd->vr->region_start_target_id != -1) vd->alignment_region->target_start = vd->curr_target_start + region->target_start; vd->alignment_region->query_length = (vd->curr_query_end - vd->curr_query_start); vd->alignment_region->target_length = (vd->curr_target_end - vd->curr_target_start); /* vd->alignment_region is in the coordinates of the sequences, * not of the current region */ g_assert(Region_is_valid(vd->alignment_region)); } return; } static C4_Score Viterbi_interpreted(Viterbi *viterbi, Region *region, Viterbi_Data *vd, SubOpt_Index *soi, gpointer user_data){ register C4_Score t, score = C4_IMPOSSIBLY_LOW_SCORE; register C4_Score ***swap_row, *src, *dst; register C4_Score ****prev_row = g_new(C4_Score***, viterbi->model->max_target_advance+1); register gint i, j, k, l; register C4_Transition *transition; register C4_Calc *calc; register gboolean *state_is_set = g_new(gboolean, viterbi->model->state_list->len); register gboolean end_is_set = FALSE; register C4_State *final_state = NULL; register C4_Score *dummy_start = g_new(C4_Score, vd->vr->cell_size), *tmp_start; /**/ g_assert(!viterbi->model->is_open); if(viterbi->model->init_func) viterbi->model->init_func(region, user_data); if(vd->continuation) final_state = vd->continuation->final_state; else final_state = viterbi->model->end_state->state; for(i = 0; i < viterbi->model->calc_list->len; i++){ calc = viterbi->model->calc_list->pdata[i]; if(calc && calc->init_func) calc->init_func(region, user_data); } /**/ for(i = 0; i <= viterbi->model->max_target_advance; i++) prev_row[i] = vd->vr->score_matrix[i]; for(j = 0; j <= region->target_length; j++){ SubOpt_Index_set_row(soi, j); for(i = 0; i <= region->query_length; i++){ for(k = 0; k < viterbi->model->state_list->len; k++){ state_is_set[k] = FALSE; prev_row[0][i][k][0] = C4_IMPOSSIBLY_LOW_SCORE; } for(k = 0; k < viterbi->model->transition_list->len; k++){ transition = viterbi->model->transition_list->pdata[k]; if(!Layout_is_transition_valid(viterbi->layout, viterbi->model, transition, i, j, region->query_length, region->target_length)) continue; if(C4_Transition_is_match(transition) && soi && SubOpt_Index_is_blocked_fast(soi, i)) continue; if(vd->continuation && (transition->input == viterbi->model->start_state->state)){ src = vd->continuation->first_cell; dst = prev_row[0][0] [vd->continuation->first_state->id]; for(l = 0; l < vd->vr->cell_size; l++) dst[l] = src[l]; state_is_set[vd->continuation->first_state->id] = TRUE; } src = prev_row[transition->advance_target] [i-transition->advance_query] [transition->input->id]; dst = prev_row[0][i][transition->output->id]; /**/ t = 0; if(transition->input == viterbi->model->start_state->state){ if(vd->continuation){ src = prev_row[0][0] [viterbi->model->start_state->state->id]; t = src[0]; } else { if(viterbi->model ->start_state->cell_start_func){ tmp_start = viterbi->model ->start_state->cell_start_func( region->query_start + i - transition->advance_query, region->target_start + j - transition->advance_target, user_data); for(l = 0; l < vd->vr->cell_size; l++) dummy_start[l] = tmp_start[l]; src = dummy_start; t = src[0]; } } } else { t = src[0]; } /* Call shadow end from src */ Viterbi_Row_shadow_end(vd->vr, viterbi->model, transition, region, src, i, j, user_data); t += C4_Calc_score(transition->calc, region->query_start+i-transition->advance_query, region->target_start+j-transition->advance_target, user_data); /**/ if(transition->calc){ if(transition->calc->protect & C4_Protect_UNDERFLOW){ if(t < C4_IMPOSSIBLY_LOW_SCORE) t = C4_IMPOSSIBLY_LOW_SCORE; } if(transition->calc->protect & C4_Protect_OVERFLOW){ if(t > C4_IMPOSSIBLY_HIGH_SCORE) t = C4_IMPOSSIBLY_HIGH_SCORE; } } if(state_is_set[transition->output->id]){ if(dst[0] < t){ Viterbi_Data_assign(viterbi, vd, src, dst, t, i, j, transition, region, user_data); } } else { state_is_set[transition->output->id] = TRUE; Viterbi_Data_assign(viterbi, vd, src, dst, t, i, j, transition, region, user_data); } } /* corner */ if(state_is_set[viterbi->model->end_state->state->id]){ t = prev_row[0][i][final_state->id][0]; if(end_is_set){ if(score < t){ score = t; Viterbi_Data_register_end(vd, prev_row[0][i][final_state->id], i, j); } } else { score = t; end_is_set = TRUE; Viterbi_Data_register_end(vd, prev_row[0][i][final_state->id], i, j); } if(viterbi->model->end_state->cell_end_func){ viterbi->model->end_state->cell_end_func( prev_row[0][i][viterbi->model->end_state->state->id], vd->vr->cell_size, region->query_start+i, region->target_start+j, user_data); } } } if(viterbi->mode == Viterbi_Mode_FIND_CHECKPOINTS) Viterbi_Checkpoint_process(vd->checkpoint, viterbi->model, region, j, prev_row); /* Rotate rows backwards */ swap_row = prev_row[viterbi->model->max_target_advance]; for(i = viterbi->model->max_target_advance; i > 0; i--) prev_row[i] = prev_row[i-1]; prev_row[0] = swap_row; } /**/ g_assert(end_is_set); Viterbi_Data_finalise(vd, region); if(viterbi->mode == Viterbi_Mode_FIND_CHECKPOINTS){ vd->checkpoint->last_srp = prev_row[1] [region->query_length] [final_state->id] [vd->vr->cell_size-1]; } /**/ for(i = 0; i < viterbi->model->calc_list->len; i++){ calc = viterbi->model->calc_list->pdata[i]; if(calc && calc->exit_func) calc->exit_func(region, user_data); } if(viterbi->model->exit_func) viterbi->model->exit_func(region, user_data); if(vd->continuation){ /* Record copy of final_cell */ src = prev_row[1][region->query_length][final_state->id]; for(i = 0; i < vd->vr->cell_size; i++) vd->continuation->final_cell[i] = src[i]; } g_free(prev_row); g_free(state_is_set); g_free(dummy_start); return score; } /* This is supposed to be slow, but robust. */ /* FIXME: tidy / split to call Viterbi_interpreted_transition() */ /* FIXME: optimisation: avoid doing any calc when input score is -INF * (can do the same with codegen ?) */ C4_Score Viterbi_calculate(Viterbi *viterbi, Region *region, Viterbi_Data *vd, gpointer user_data, SubOpt *subopt){ register C4_Score score; register SubOpt_Index *soi = NULL; g_assert(viterbi); g_assert(Region_is_valid(region)); if(subopt) soi = SubOpt_Index_create(subopt, region); if(viterbi->func){ /* Use compiled version */ score = viterbi->func(viterbi->model, region, vd, soi, user_data); } else { score = Viterbi_interpreted(viterbi, region, vd, soi, user_data); } if(soi) SubOpt_Index_destroy(soi); return score; } /**/ static gchar *Viterbi_expand_macro(gchar *macro, gint advance_query, gint advance_target, C4_Shadow *shadow, gchar *cell){ register GString *str = g_string_sized_new(strlen(macro)); register gchar *expanded_macro, *tmp; register gint i; for(i = 0; macro[i]; i++){ if((macro[i] == '%')){ i++; switch(macro[i]){ case 'Q': i++; switch(macro[i]){ case 'S': g_string_append(str, "region->query_start"); break; case 'P': g_string_append(str, "region->query_start+i-"); g_string_append_c(str, '0' + advance_query); break; case 'R': g_string_append(str, "i-"); g_string_append_c(str, '0' + advance_query); break; case 'E': g_string_append(str, "Region_query_end(region)"); break; case 'L': g_string_append(str, "region->query_length"); break; default: g_error("Bad query macro [%%Q%c] in [%s]", macro[i], macro); break; } break; case 'T': i++; switch(macro[i]){ case 'S': g_string_append(str, "region->target_start"); break; case 'P': g_string_append(str, "region->target_start+j-"); g_string_append_c(str, '0' + advance_target); break; case 'R': g_string_append(str, "j-"); g_string_append_c(str, '0' + advance_target); break; case 'E': g_string_append(str, "Region_target_end(region)"); break; case 'L': g_string_append(str, "region->target_length"); break; default: g_error("Bad target macro [%%T%c] in [%s]", macro[i], macro); break; } break; case 'S': i++; switch(macro[i]){ case 'S': g_assert(shadow); tmp = g_strdup_printf("src[%d]", shadow->designation+1); g_string_append(str, tmp); g_free(tmp); break; default: g_error("Bad shadow macro [%%S%c] in [%s]", macro[i], macro); break; } break; case 'C': g_assert(cell); g_string_append(str, cell); break; case '%': g_string_append(str, "%"); break; default: g_error("Bad macro [%%%c] in [%s]", macro[i], macro); break; } } else { g_string_append_c(str, macro[i]); } } expanded_macro = str->str; g_string_free(str, FALSE); return expanded_macro; } /* Macro expansion rules: * --------------- * %% -> % * %QS -> query_start * %QP -> query_position * %QR -> query_position in region * %QE -> query_end * %QL -> query_length * %TS -> target_start * %TP -> target_position * %TR -> target_position in region * %TE -> target_end * %TL -> target_length * %SS -> shadow_score * %C -> cell */ /**/ static gboolean Viterbi_implement_require_shadow(Viterbi *viterbi){ register gint i; register C4_Shadow *shadow; for(i = 0; i < viterbi->model->shadow_list->len; i++){ shadow = viterbi->model->shadow_list->pdata[i]; g_assert(shadow->start_func); if(!shadow->start_macro) return TRUE; g_assert(shadow->end_func); if(!shadow->end_macro) return TRUE; } return FALSE; } static gboolean Viterbi_implement_require_transition(Viterbi *viterbi){ register gint i; register C4_Calc *calc; if(viterbi->use_continuation || (viterbi->mode == Viterbi_Mode_FIND_PATH)) return TRUE; for(i = 0; i < viterbi->model->calc_list->len; i++){ calc = viterbi->model->calc_list->pdata[i]; if(calc->calc_func && (!calc->calc_macro)) return TRUE; } return FALSE; } static gboolean Viterbi_implement_require_calc(Viterbi *viterbi){ register gint i; register C4_Calc *calc; for(i = 0; i < viterbi->model->calc_list->len; i++){ calc = viterbi->model->calc_list->pdata[i]; if(calc->init_func && (!calc->init_macro)) return TRUE; if(calc->exit_func && (!calc->exit_macro)) return TRUE; } return FALSE; } static void Viterbi_implement_shadow_start(Viterbi *viterbi, Codegen *codegen, C4_Transition *transition){ register gint i; register C4_Shadow *shadow; register gchar *expanded_macro; if(transition->input == viterbi->model->start_state->state){ if(viterbi->mode == Viterbi_Mode_FIND_REGION){ if(viterbi->model->start_state->scope != C4_Scope_CORNER){ if(viterbi->model->start_state->scope != C4_Scope_QUERY) Codegen_printf(codegen, "src[vd->vr->region_start_query_id] = i-%d;\n", transition->advance_query); if(viterbi->model->start_state->scope != C4_Scope_TARGET) Codegen_printf(codegen, "src[vd->vr->region_start_target_id] = j-%d;\n", transition->advance_target); } } } /* Call start funcs from src */ for(i = 0; i < transition->input->src_shadow_list->len; i++){ shadow = transition->input->src_shadow_list->pdata[i]; if(shadow->start_macro){ expanded_macro = Viterbi_expand_macro(shadow->start_macro, transition->advance_query, transition->advance_target, NULL, NULL); Codegen_printf(codegen, "src[%d] = %s;\n", shadow->designation+1, expanded_macro); g_free(expanded_macro); } else { g_assert(shadow->start_func); Codegen_printf(codegen, "shadow = " "transition->input->output_shadow_list->pdata[%d];\n", i); Codegen_printf(codegen, "src[%d] = shadow->start_func(" "region->query_start+i-%d, region->target_start+j-%d," " user_data);\n", shadow->designation+1, transition->advance_query, transition->advance_target); } } return; } static void Viterbi_implement_shadow_end(Viterbi *viterbi, Codegen *codegen, C4_Transition *transition){ register gint i; register C4_Shadow *shadow; register gchar *expanded_macro; for(i = 0; i < transition->dst_shadow_list->len; i++){ shadow = transition->dst_shadow_list->pdata[i]; if(shadow->end_macro){ expanded_macro = Viterbi_expand_macro(shadow->end_macro, transition->advance_query, transition->advance_target, shadow, NULL); Codegen_printf(codegen, "%s;\n", expanded_macro); g_free(expanded_macro); } else { g_assert(shadow->end_func); Codegen_printf(codegen, "shadow = " "transition->input->input_shadow_list->pdata[%d];\n", i); Codegen_printf(codegen, "shadow->end_func(src[%d]," " region->query_start+i-%d, region->target_start+j-%d," " user_data);\n", shadow->designation+1, transition->advance_query, transition->advance_target); } } return; } static void Viterbi_implement_dp_set_cell(Viterbi *viterbi, Codegen *codegen, C4_Transition *transition, gint cell_size){ register gint i; /* Call shadow start on src */ Viterbi_implement_shadow_start(viterbi, codegen, transition); Codegen_printf(codegen, "dst[0] = t;\n"); for(i = 1; i < cell_size; i++) Codegen_printf(codegen, "dst[%d] = src[%d];\n", i, i); /* FIXME: optimisation: * do not need to do copy for transitions * when the shadow is out of scope. */ if(viterbi->mode == Viterbi_Mode_FIND_PATH){ Codegen_printf(codegen, "vd->traceback[i][j][%d] = transition;\n", transition->output->id); } return; } static void Viterbi_implement_register_end(Viterbi *viterbi, Codegen *codegen, gchar *end_cell){ if((viterbi->mode == Viterbi_Mode_FIND_REGION) || (viterbi->mode == Viterbi_Mode_FIND_PATH)){ Codegen_printf(codegen, "vd->curr_query_end = i;\n"); Codegen_printf(codegen, "vd->curr_target_end = j;\n"); } if(viterbi->mode == Viterbi_Mode_FIND_REGION){ if(viterbi->model->start_state->scope != C4_Scope_CORNER){ if(viterbi->model->start_state->scope != C4_Scope_QUERY){ Codegen_printf(codegen, "vd->curr_query_start = " "%s[vd->vr->region_start_query_id];\n", end_cell); } if(viterbi->model->start_state->scope != C4_Scope_TARGET){ Codegen_printf(codegen, "vd->curr_target_start = " "%s[vd->vr->region_start_target_id];\n", end_cell); } } } return; } static void Viterbi_implement_dp_cell(Viterbi *viterbi, Codegen *codegen, Layout_Mask *mask, gint cell_size){ register gint i, j; register C4_Transition *transition; register gchar *end_cell, *expanded_macro; register gboolean is_valid, t_is_set; g_assert(mask); /**/ /* Zero dst cell */ for(i = 0; i < viterbi->model->state_list->len; i++) Codegen_printf(codegen, "prev_row_0[i][%d][0] = %d;\n", i, C4_IMPOSSIBLY_LOW_SCORE); for(i = 0; i < viterbi->model->state_list->len; i++) Codegen_printf(codegen, "state_is_set[%d] = FALSE;\n", i); for(i = 0; i < viterbi->model->transition_list->len; i++){ is_valid = g_array_index(mask->transition_mask, gboolean, i); if(!is_valid) continue; transition = viterbi->model->transition_list->pdata[i]; g_assert(transition->id == i); /* Open SubOpt conditional */ if(C4_Transition_is_match(transition)){ Codegen_printf(codegen, "if(!(soi && SubOpt_Index_is_blocked_fast(soi, i))){\n"); Codegen_indent(codegen, 1); } if(transition->input == viterbi->model->start_state->state){ if(viterbi->use_continuation){ Codegen_printf(codegen, "src = vd->continuation->first_cell;\n"); Codegen_printf(codegen, "dst = prev_row_0[0]" "[vd->continuation->first_state->id];\n"); for(j = 0; j < cell_size; j++) Codegen_printf(codegen, "dst[%d] = src[%d];\n", j, j); Codegen_printf(codegen, "state_is_set[vd->continuation->first_state->id] = TRUE;\n"); } } Codegen_printf(codegen, "/* transition [%s] */\n", transition->name); Codegen_printf(codegen, "src = prev_row_%d[i-%d][%d];\n", transition->advance_target, transition->advance_query, transition->input->id); Codegen_printf(codegen, "dst = prev_row_0[i][%d];\n", transition->output->id); t_is_set = FALSE; if(Viterbi_implement_require_transition(viterbi)) Codegen_printf(codegen, "transition = model->transition_list->pdata[%d];\n", transition->id); /* Call shadow end from dst */ Viterbi_implement_shadow_end(viterbi, codegen, transition); if(transition->calc){ if(transition->calc->calc_func){ if(transition->calc->calc_macro){ expanded_macro = Viterbi_expand_macro( transition->calc->calc_macro, transition->advance_query, transition->advance_target, NULL, NULL); Codegen_printf(codegen, "t %s= %s;\n", t_is_set?"+":"",expanded_macro); t_is_set = TRUE; g_free(expanded_macro); } else { Codegen_printf(codegen, "t %s= transition->calc->calc_func(\n" "region->query_start+i-%d,\n" "region->target_start+j-%d,\n" "user_data);\n", t_is_set?"+":"", transition->advance_query, transition->advance_target); t_is_set = TRUE; } } else { if(transition->calc->max_score){ Codegen_printf(codegen, "t %c= %d;\n", t_is_set?"+":"", transition->calc->max_score); t_is_set = TRUE; } } } if(transition->input == viterbi->model->start_state->state){ if(viterbi->use_continuation){ Codegen_printf(codegen, "t %s= prev_row_0[0][%d][0];\n", t_is_set?"+":"", viterbi->model->start_state->state->id); t_is_set = TRUE; } else { if(viterbi->model->start_state->cell_start_func){ if(viterbi->model->start_state->cell_start_macro){ expanded_macro = Viterbi_expand_macro( viterbi->model->start_state->cell_start_macro, transition->advance_query, transition->advance_target, NULL, NULL); Codegen_printf(codegen, "src = %s;\n", expanded_macro); g_free(expanded_macro); } else { Codegen_printf(codegen, "src = model->start_state->cell_start_func(\n" "region->query_start+i-%d,\n" "region->target_start+j-%d,\n" "user_data);\n", transition->advance_query, transition->advance_target); } Codegen_printf(codegen, "t %s= src[0];\n", t_is_set?"+":""); t_is_set = TRUE; } } } else { Codegen_printf(codegen, "t %s= src[0];\n", t_is_set?"+":""); t_is_set = TRUE; } /* Check against overflow and underflow */ if(t_is_set){ if(transition->calc){ if(transition->calc->protect & C4_Protect_UNDERFLOW){ Codegen_printf(codegen, "if(t < C4_IMPOSSIBLY_LOW_SCORE)\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "t = C4_IMPOSSIBLY_LOW_SCORE;\n"); Codegen_indent(codegen, -1); } if(transition->calc->protect & C4_Protect_OVERFLOW){ Codegen_printf(codegen, "if(t > C4_IMPOSSIBLY_HIGH_SCORE)\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "t = C4_IMPOSSIBLY_HIGH_SCORE;\n"); Codegen_indent(codegen, -1); } } } else { Codegen_printf(codegen, "t = 0;\n"); } /* Challenge with t */ Codegen_printf(codegen, "if(state_is_set[%d]){\n", transition->output->id); Codegen_indent(codegen, 1); Codegen_printf(codegen, "if(dst[0] < t){\n"); Codegen_indent(codegen, 1); Viterbi_implement_dp_set_cell(viterbi, codegen, transition, cell_size); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -2); Codegen_printf(codegen, "} else {\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "state_is_set[%d] = TRUE;\n", transition->output->id); Viterbi_implement_dp_set_cell(viterbi, codegen, transition, cell_size); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); /* Close SubOpt conditional */ if(C4_Transition_is_match(transition)){ Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); } } /* Take score if end set */ Codegen_printf(codegen, "if(state_is_set[%d]){\n", viterbi->model->end_state->state->id); Codegen_indent(codegen, 1); if(viterbi->use_continuation){ end_cell = g_strdup_printf("prev_row_0[i]" "[vd->continuation->final_state->id]"); } else { end_cell = g_strdup_printf("prev_row_0[i][%d]", viterbi->model->end_state->state->id); } Codegen_printf(codegen, "if(end_is_set){\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "if(score < %s[0]){\n", end_cell); Codegen_indent(codegen, 1); Codegen_printf(codegen, "score = %s[0];\n", end_cell); Viterbi_implement_register_end(viterbi, codegen, end_cell); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -2); Codegen_printf(codegen, "} else {\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "score = %s[0];\n", end_cell); Codegen_printf(codegen, "end_is_set = TRUE;\n"); Viterbi_implement_register_end(viterbi, codegen, end_cell); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); if(viterbi->model->end_state->cell_end_func){ if(viterbi->model->end_state->cell_end_macro){ expanded_macro = Viterbi_expand_macro( viterbi->model->end_state->cell_end_macro, 0, 0, NULL, end_cell); Codegen_printf(codegen, "%s;\n", expanded_macro); g_free(expanded_macro); } else { Codegen_printf(codegen, "model->end_state->cell_end_func(\n" "%s, %d, region->query_start+i,\n" "region->target_start+j,\n" "user_data);\n", end_cell, cell_size); } } g_free(end_cell); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); return; } /* FIXME: tidy / remove redundancy */ /* FIXME: optimisation * - one-time calculation of {input,output}_{query,target}_pos * - avoid unnecessary assignments * - do not add onto zero scores * - only protect scores after addition/subtraction * - other things to simplify generated code */ /* FIXME: optimisation : need to find repeated rows and cells * in the Layout (before max_advance_{query,target}), * and remove resulting redundnacy in codegen. */ /* FIXME: optimisation : allow start/end transitions for global * and semi-global models to be performed outside of main * alignment (eg. for Find_Path_continuation() etc). * Do this by removing corner start/end transitions from model, * and performing them as part of the continuation DP. */ static void Viterbi_implement_opc(Viterbi *viterbi, Codegen *codegen, gint row_id, gint cell_id, gint cell_size){ register Layout_Row *row = viterbi->layout->row_list->pdata[row_id]; register Layout_Cell *cell = row->cell_list->pdata[cell_id]; register gchar *next_row; /* The Keyword That Dare Not Speak Its Name: */ gchar tktdnsin[5] = {0x67, 0x6F, 0x74, 0x6F, 0x00}; Codegen_printf(codegen, "/* CELL [%d,%d] */\n", row_id, cell_id); Codegen_printf(codegen, "i++;\n"); if(row_id >= (viterbi->layout->row_list->len-2)) next_row = g_strdup("nth_row"); else next_row = g_strdup_printf("row_%d", row_id+1); if(cell->end_target){ Codegen_printf(codegen, "if(j == region->target_length){\n"); Codegen_indent(codegen, 1); if(cell->corner){ Codegen_printf(codegen, "if(i == region->query_length){\n"); Codegen_indent(codegen, 1); Viterbi_implement_dp_cell(viterbi, codegen, cell->corner, cell_size); Codegen_printf(codegen, "%s end_of_dp;\n", tktdnsin); Codegen_indent(codegen, -1); Codegen_printf(codegen, "} else {\n"); Codegen_indent(codegen, 1); Viterbi_implement_dp_cell(viterbi, codegen, cell->end_target, cell_size); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); } else { Viterbi_implement_dp_cell(viterbi, codegen, cell->end_target, cell_size); Codegen_printf(codegen, "if(i == region->query_length){\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "%s end_of_dp;\n", tktdnsin); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); } Codegen_indent(codegen, -1); Codegen_printf(codegen, "} else {\n"); Codegen_indent(codegen, 1); if(cell->end_query){ Codegen_printf(codegen, "if(i == region->query_length){\n"); Codegen_indent(codegen, 1); Viterbi_implement_dp_cell(viterbi, codegen, cell->end_query, cell_size); Codegen_printf(codegen, "%s %s;\n", tktdnsin, next_row); Codegen_indent(codegen, -1); Codegen_printf(codegen, "} else {\n"); Codegen_indent(codegen, 1); Viterbi_implement_dp_cell(viterbi, codegen, cell->normal, cell_size); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); } else { Viterbi_implement_dp_cell(viterbi, codegen, cell->normal, cell_size); Codegen_printf(codegen, "if(i == region->query_length){\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "%s %s;\n", tktdnsin, next_row); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); } Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); } else { /* cell->end_target missing */ if(cell->end_query){ Codegen_printf(codegen, "if(i == region->query_length){\n"); Codegen_indent(codegen, 1); Viterbi_implement_dp_cell(viterbi, codegen, cell->end_query, cell_size); Codegen_printf(codegen, "if(j == region->target_length){\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "%s end_of_dp;\n", tktdnsin); Codegen_indent(codegen, -1); Codegen_printf(codegen, "} else {\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "%s %s;\n", tktdnsin, next_row); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); Codegen_indent(codegen, -1); Codegen_printf(codegen, "} else {\n"); Codegen_indent(codegen, 1); Viterbi_implement_dp_cell(viterbi, codegen, cell->normal, cell_size); Codegen_printf(codegen, "if(j == region->target_length){\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "%s end_of_dp;\n", tktdnsin); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); } else { Viterbi_implement_dp_cell(viterbi, codegen, cell->normal, cell_size); Codegen_printf(codegen, "if(j == region->target_length){\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "if(i == region->query_length){\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "%s end_of_dp;\n", tktdnsin); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); Codegen_indent(codegen, -1); Codegen_printf(codegen, "} else {\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "if(i == region->query_length){\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "%s %s;\n", tktdnsin, next_row); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); } } g_free(next_row); return; } static void Viterbi_implement_dp_row(Viterbi *viterbi, Codegen *codegen, gint row_id, gint cell_size){ register gint i, j, cp_row; register Layout_Row *row = viterbi->layout->row_list->pdata[row_id]; /**/ if(row_id){ if(row_id == (viterbi->layout->row_list->len-1)) Codegen_printf(codegen, "nth_row:\n"); else Codegen_printf(codegen, "row_%d:\n", row_id); if(viterbi->mode == Viterbi_Mode_FIND_CHECKPOINTS){ Codegen_printf(codegen, "if((!(j %% vd->checkpoint->section_length)) && j){\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "if(vd->checkpoint->counter < " "vd->checkpoint->checkpoint_list->len){\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "checkpoint = " "vd->checkpoint->checkpoint_list->pdata" "[vd->checkpoint->counter++];\n"); for(cp_row = 0; cp_row < viterbi->model->max_target_advance; cp_row++){ Codegen_printf(codegen, "for(i = 0; i <= region->query_length; i++){\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "for(k = 0; k < %d; k++){\n", viterbi->model->state_list->len); Codegen_indent(codegen, 1); /* Copy cell to checkpoint */ for(j = 0; j < cell_size; j++) Codegen_printf(codegen, "checkpoint[%d][i][k][%d]" " = prev_row_%d[i][k][%d];\n", cp_row, j, cp_row, j); /* Load state/pos to last shadow */ Codegen_printf(codegen, "prev_row_%d[i][k][%d] = " "(((i*%d)+k)*%d)+%d;\n", cp_row, cell_size-1, viterbi->model->state_list->len, viterbi->model->max_target_advance, cp_row); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); } Codegen_indent(codegen, -1); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); Codegen_printf(codegen, "}\n"); } Codegen_printf(codegen, "j++;\n"); /* Set SubOpt row */ Codegen_printf(codegen, "SubOpt_Index_set_row(soi, j);\n"); /* Rotate prev_row_%d pointers backwards */ Codegen_printf(codegen, "swap_row = prev_row_%d;\n", viterbi->model->max_target_advance); for(i = viterbi->model->max_target_advance; i > 0; i--) Codegen_printf(codegen, "prev_row_%d = prev_row_%d;\n", i, i-1); Codegen_printf(codegen, "prev_row_0 = swap_row;\n"); } Codegen_printf(codegen, "/* start block */\n"); Codegen_printf(codegen, "i = -1;\n"); /* Start of cell */ for(i = 0; i < (row->cell_list->len-1); i++){ Codegen_printf(codegen, "/* explicit cell [%d] */\n", i); Viterbi_implement_opc(viterbi, codegen, row_id, i, cell_size); } /* Last cell */ Codegen_printf(codegen, "/* last cell [%d] */\n", i); Codegen_printf(codegen, "do {\n"); Codegen_indent(codegen, 1); Viterbi_implement_opc(viterbi, codegen, row_id, i, cell_size); Codegen_indent(codegen, -1); Codegen_printf(codegen, "} while(TRUE);\n"); /* End of cell */ return; } /* +--------+ +-----------+ * | NORMAL | <-- | QUERY END | * +--------+ +-----------+ * ^ * | * +------------+ +--------+ * | TARGET END | <-- | CORNER | * +------------+ +--------+ */ /**/ static void Viterbi_implement_finalise(Viterbi *viterbi, Codegen *codegen){ if(viterbi->mode == Viterbi_Mode_FIND_REGION){ if(viterbi->model->start_state->scope != C4_Scope_CORNER){ if(viterbi->model->start_state->scope != C4_Scope_QUERY){ Codegen_printf(codegen, "vd->alignment_region->query_start" " = vd->curr_query_start + region->query_start;\n"); } if(viterbi->model->start_state->scope != C4_Scope_TARGET) Codegen_printf(codegen, "vd->alignment_region->target_start" " = vd->curr_target_start + region->target_start;\n"); } Codegen_printf(codegen, "vd->alignment_region->query_length" " = vd->curr_query_end - vd->curr_query_start;\n"); Codegen_printf(codegen, "vd->alignment_region->target_length" " = vd->curr_target_end - vd->curr_target_start;\n"); /* vd->alignment_region is in the coordinates of the sequences, * not of the current region */ } return; } static void Viterbi_implement_dp(Viterbi *viterbi, Codegen *codegen){ register gint i; register gint cell_size = Viterbi_get_cell_size(viterbi); Codegen_printf(codegen, "/* Implementing [%d] explicit rows */\n", viterbi->layout->row_list->len); Codegen_printf(codegen, "C4_Score %s%s{\n", viterbi->name, Viterbi_DP_Func_ARGS_STR); Codegen_indent(codegen, 1); Codegen_printf(codegen, "register C4_Score score" " = C4_IMPOSSIBLY_LOW_SCORE;\n"); /* Preliminary code */ Codegen_printf(codegen, "register gint i, j = 0;\n"); for(i = 0; i <= viterbi->model->max_target_advance; i++){ Codegen_printf(codegen, "register C4_Score ***prev_row_%d" " = vd->vr->score_matrix[%d];\n", i, i); } Codegen_printf(codegen, "register C4_Score ***swap_row," " *src, *dst, t;\n"); if(Viterbi_implement_require_shadow(viterbi)){ Codegen_printf(codegen, "register C4_Transition *transition;\n"); Codegen_printf(codegen, "register C4_Shadow *shadow;\n"); } else { if(Viterbi_implement_require_transition(viterbi)) Codegen_printf(codegen, "register C4_Transition *transition;\n"); } if(Viterbi_implement_require_calc(viterbi)) Codegen_printf(codegen, "register C4_Calc *calc;\n"); if(viterbi->mode == Viterbi_Mode_FIND_CHECKPOINTS){ Codegen_printf(codegen, "register C4_Score ****checkpoint;\n"); Codegen_printf(codegen, "register gint k;\n"); } Codegen_printf(codegen, "register gboolean *state_is_set\n" " = g_new(gboolean, %d);\n", viterbi->model->state_list->len); Codegen_printf(codegen, "register gboolean end_is_set = FALSE;\n"); /* Local code must go between declarations and first block */ for(i = 0; i < viterbi->model->local_code_list->len; i++) Codegen_printf(codegen, viterbi->model->local_code_list->pdata[i]); CGUtil_prep(codegen, viterbi->model, TRUE); /* Set initial SubOpt row */ Codegen_printf(codegen, "SubOpt_Index_set_row(soi, 0);\n"); /* Start of viterbi */ for(i = 0; i < (viterbi->layout->row_list->len-1); i++){ Codegen_printf(codegen, "/* explicit row [%d] */\n", i); Viterbi_implement_dp_row(viterbi, codegen, i, cell_size); } /* Last row */ Codegen_printf(codegen, "/* last row [%d] */\n", i); Codegen_printf(codegen, "do {\n"); Codegen_indent(codegen, 1); Viterbi_implement_dp_row(viterbi, codegen, i, cell_size); Codegen_indent(codegen, -1); Codegen_printf(codegen, "} while(TRUE);\n"); /* End code */ Codegen_indent(codegen, -1); Codegen_printf(codegen, "end_of_dp:\n"); Codegen_indent(codegen, 1); /* End of viterbi */ /**/ CGUtil_prep(codegen, viterbi->model, FALSE); Viterbi_implement_finalise(viterbi, codegen); /* Copy final cell */ if(viterbi->use_continuation){ Codegen_printf(codegen, "src = prev_row_0[region->query_length]" "[vd->continuation->final_state->id];\n"); for(i = 0; i < cell_size; i++) Codegen_printf(codegen, "vd->continuation->final_cell[%d] = src[%d];\n", i, i); } if(viterbi->mode == Viterbi_Mode_FIND_CHECKPOINTS){ g_assert(viterbi->use_continuation); Codegen_printf(codegen, "vd->checkpoint->last_srp = prev_row_0" "[region->query_length]" "[vd->continuation->final_state->id]" "[vd->vr->cell_size-1];\n"); } Codegen_printf(codegen, "g_free(state_is_set);\n"); Codegen_printf(codegen, "return score;\n"); Codegen_printf(codegen, "}\n\n"); Codegen_indent(codegen, -1); return; } Codegen *Viterbi_make_Codegen(Viterbi *viterbi){ register Codegen *codegen = Codegen_create(NULL, viterbi->name); CGUtil_print_header(codegen, viterbi->model); Codegen_printf(codegen, "/* Exhaustive Viterbi DP implementation. */\n" "\n" "#include \"viterbi.h\"\n" "\n"); Codegen_printf(codegen, "/* Viterbi model:\n" " * [%s]\n" " *\n" " * max query advance: %d\n" " * max target advance: %d\n" " *\n" " * scope [%s]->[%s]\n" " */\n\n", viterbi->name, viterbi->model->max_query_advance, viterbi->model->max_target_advance, C4_Scope_get_name(viterbi->model->start_state->scope), C4_Scope_get_name(viterbi->model->end_state->scope)); Viterbi_implement_dp(viterbi, codegen); CGUtil_print_footer(codegen); CGUtil_compile(codegen, viterbi->model); return codegen; } /* FIXME: optimisation : could reuse the Layout * for different DP modes (although will still need separate * global Layout for continuation DP). */ exonerate-2.4.0/src/c4/subopt.c0000644000175000017500000003305511162714266013221 00000000000000/****************************************************************\ * * * C4 dynamic programming library - suboptimal alignments * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For qsort() */ #include "subopt.h" SubOpt *SubOpt_create(gint query_length, gint target_length){ register SubOpt *subopt = g_new(SubOpt, 1); subopt->ref_count = 1; subopt->query_length = query_length; subopt->target_length = target_length; subopt->range_tree = RangeTree_create(); subopt->path_count = 0; return subopt; } SubOpt *SubOpt_share(SubOpt *subopt){ subopt->ref_count++; return subopt; } void SubOpt_destroy(SubOpt *subopt){ if(--subopt->ref_count) return; RangeTree_destroy(subopt->range_tree, NULL, NULL); g_free(subopt); return; } /* Greatest Common Divisor : Euclid's algorithm */ static gint SubOpt_get_gcd(gint a, gint b){ register gint t; g_assert(a > 0); g_assert(b > 0); while(a > 0){ if(a < b){ t = a; a = b; b = t; } a -= b; } return b; } static gboolean SubOpt_check_pos(SubOpt *subopt, gint query_pos, gint target_pos){ return RangeTree_check_pos(subopt->range_tree, query_pos, target_pos); } static void SubOpt_add_AlignmentOperation(SubOpt *subopt, AlignmentOperation *ao, gint query_pos, gint target_pos){ register gint i; register gint gcd = SubOpt_get_gcd(ao->transition->advance_query, ao->transition->advance_target); register gint q_move = ao->transition->advance_query / gcd, t_move = ao->transition->advance_target / gcd; register gint q_limit = query_pos, t_limit = target_pos; register gint qp, tp; /* Add operation */ for(i = 0; i < ao->length; i++){ /* Block positions in transition */ qp = q_limit; tp = t_limit; q_limit += ao->transition->advance_query; t_limit += ao->transition->advance_target; while(qp < q_limit){ /* We need to ensure that the position is unused, * otherwise lead-out positions could clash with lead-in * positions from a previous alignment. */ if(!SubOpt_check_pos(subopt, /* qp + ao->transition->advance_query, tp + ao->transition->advance_target)){ */ qp, tp)){ RangeTree_add(subopt->range_tree, /* qp + ao->transition->advance_query, tp + ao->transition->advance_target, */ qp, tp, GINT_TO_POINTER(subopt->path_count)); } qp += q_move; tp += t_move; } } /* Block leading in positions */ qp = query_pos - ao->transition->advance_query + q_move; tp = target_pos - ao->transition->advance_target + t_move; while(qp < query_pos){ /* Only insert if not already present */ if(!SubOpt_check_pos(subopt, /* qp + ao->transition->advance_query, tp + ao->transition->advance_target)){ */ qp, tp)){ /* FIXME: */ if((qp >= 0) && (tp >= 0)) RangeTree_add(subopt->range_tree, /* qp + ao->transition->advance_query, tp + ao->transition->advance_target, */ qp, tp, GINT_TO_POINTER(subopt->path_count)); } qp += q_move; tp += t_move; } return; } /* FIXME: tidy */ void SubOpt_add_alignment(SubOpt *subopt, Alignment *alignment){ register gint i; register gint query_pos, target_pos; register AlignmentOperation *ao; query_pos = alignment->region->query_start; target_pos = alignment->region->target_start; for(i = 0; i < alignment->operation_list->len; i++){ ao = alignment->operation_list->pdata[i]; if(C4_Transition_is_match(ao->transition)){ SubOpt_add_AlignmentOperation(subopt, ao, query_pos, target_pos); } query_pos += (ao->transition->advance_query * ao->length); target_pos += (ao->transition->advance_target * ao->length); } subopt->path_count++; return; } /**/ typedef struct { gpointer user_data; SubOpt_FindFunc find_func; } SubOpt_FindData; static gboolean SubOpt_RangeTree_find(gint x, gint y, gpointer info, gpointer user_data){ register SubOpt_FindData *sufd = user_data; register gint path_id = GPOINTER_TO_INT(info); if(sufd->find_func(x, y, path_id, sufd->user_data)) return TRUE; return FALSE; } static gboolean SubOpt_overlap_find_func(gint query_pos, gint target_pos, gint path_id, gpointer user_data){ return TRUE; } gboolean SubOpt_find(SubOpt *subopt, Region *region, SubOpt_FindFunc find_func, gpointer user_data){ SubOpt_FindData sufd; sufd.user_data = user_data; sufd.find_func = find_func; return RangeTree_find(subopt->range_tree, region->query_start, region->query_length, region->target_start, region->target_length, SubOpt_RangeTree_find, &sufd); } gboolean SubOpt_overlaps_alignment(SubOpt *subopt, Alignment *alignment){ register gint i, j; register AlignmentOperation *ao; Region region; register gint qp = alignment->region->query_start, tp = alignment->region->target_start; for(i = 0; i < alignment->operation_list->len; i++){ ao = alignment->operation_list->pdata[i]; if(C4_Transition_is_match(ao->transition)){ for(j = 0; j < ao->length; j++){ region.query_start = qp; region.target_start = tp; region.query_length = ao->transition->advance_query; region.target_length = ao->transition->advance_target; if(SubOpt_find(subopt, ®ion, SubOpt_overlap_find_func, NULL)) return TRUE; qp += ao->transition->advance_query; tp += ao->transition->advance_target; } } else { qp += (ao->transition->advance_query * ao->length); tp += (ao->transition->advance_target * ao->length); } } return FALSE; } /**/ static SubOpt_Index_Row *SubOpt_Index_Row_create(gint target_pos){ register SubOpt_Index_Row *soir = g_new(SubOpt_Index_Row, 1); soir->target_pos = target_pos; soir->total = 0; soir->query_pos = NULL; return soir; } static void SubOpt_Index_Row_destroy(SubOpt_Index_Row *soir){ if(soir->query_pos) g_free(soir->query_pos); g_free(soir); return; } /**/ typedef struct { gint query_pos; gint target_pos; } SubOpt_Point; static gboolean SubOpt_RangeTree_traverse(gint x, gint y, gpointer info, gpointer user_data){ register GPtrArray *point_list = user_data; register SubOpt_Point *sop = g_new(SubOpt_Point, 1); sop->query_pos = x; sop->target_pos = y; g_ptr_array_add(point_list, sop); return FALSE; } static int SubOpt_sort_by_target_then_query_pos(const void *a, const void *b){ register SubOpt_Point **point_a = (SubOpt_Point**)a, **point_b = (SubOpt_Point**)b; register gint target_diff = (*point_a)->target_pos - (*point_b)->target_pos; if(!target_diff) return (*point_a)->query_pos - (*point_b)->query_pos; return target_diff; } SubOpt_Index *SubOpt_Index_create(SubOpt *subopt, Region *region){ register SubOpt_Index *soi; register GPtrArray *point_list = g_ptr_array_new(); register gint i, j, start; register SubOpt_Point *point, *prev_point = NULL; register SubOpt_Index_Row *soir = NULL; /* Copy all points in region into point_list */ RangeTree_find(subopt->range_tree, region->query_start, region->query_length+1, region->target_start, region->target_length+1, SubOpt_RangeTree_traverse, point_list); if(!point_list->len){ g_ptr_array_free(point_list, TRUE); return NULL; /* Found no points in region */ } /* Convert all points to region coordinates */ for(i = 0; i < point_list->len; i++){ point = point_list->pdata[i]; point->query_pos -= region->query_start; point->target_pos -= region->target_start; } /* Sort points to correct order for DP */ qsort(point_list->pdata, point_list->len, sizeof(gpointer), SubOpt_sort_by_target_then_query_pos); soi = g_new(SubOpt_Index, 1); soi->region = Region_share(region); soi->row_list = g_ptr_array_new(); soi->curr_row_index = 0; soi->curr_query_index = 0; /* Set the row sizes */ for(i = 0; i < point_list->len; i++){ point = point_list->pdata[i]; if((!soir) || (soir->target_pos != point->target_pos)){ soir = SubOpt_Index_Row_create(point->target_pos); soir->total = 1; g_ptr_array_add(soi->row_list, soir); soir->query_pos = GINT_TO_POINTER(i); /* << Hack */ } else { g_assert(soir->target_pos == point->target_pos); soir->total++; } } /* Fill the rows */ for(i = 0; i < soi->row_list->len; i++){ soir = soi->row_list->pdata[i]; g_assert(soir->total); start = GPOINTER_TO_INT(soir->query_pos); /* >> Hack */ soir->query_pos = g_new(gint, soir->total+1); prev_point = NULL; for(j = 0; j < soir->total; j++){ point = point_list->pdata[start+j]; soir->query_pos[j] = point->query_pos; g_assert(soir->target_pos == point->target_pos); g_assert((!prev_point) /* Ensure points are unique */ || (prev_point->query_pos != point->query_pos)); prev_point = point; } /* Tag on dummy end position (query_length+1) */ soir->query_pos[soir->total] = subopt->query_length + 1; } /**/ for(i = 0; i < point_list->len; i++){ point = point_list->pdata[i]; g_free(point); } g_ptr_array_free(point_list, TRUE); soi->curr_row = soi->blank_row; soi->blank_row = SubOpt_Index_Row_create(subopt->target_length+1); soi->blank_row->total = 1; soi->blank_row->query_pos = g_new(gint, 1); soi->blank_row->query_pos[0] = subopt->query_length+1; g_ptr_array_add(soi->row_list, soi->blank_row); return soi; } void SubOpt_Index_destroy(SubOpt_Index *soi){ register gint i; register SubOpt_Index_Row *soir; /* Free all rows (including final blank row) */ for(i = 0; i < soi->row_list->len; i++){ soir = soi->row_list->pdata[i]; SubOpt_Index_Row_destroy(soir); } g_ptr_array_free(soi->row_list, TRUE); Region_destroy(soi->region); g_free(soi); return; } void SubOpt_Index_set_row(SubOpt_Index *soi, gint target_pos){ register SubOpt_Index_Row *soir = NULL; g_assert(target_pos >= 0); if(!soi) return; /**/ soir = soi->row_list->pdata[soi->curr_row_index]; while(soir->target_pos < target_pos){ if(soi->curr_row_index < soi->row_list->len){ soi->curr_row_index++; soir = soi->row_list->pdata[soi->curr_row_index]; } else { soir = NULL; break; } } if(soir){ while(soir->target_pos > target_pos){ if(soi->curr_row_index > 0){ soi->curr_row_index--; soir = soi->row_list->pdata[soi->curr_row_index]; } else { soir = NULL; break; } } if(soir && (soir->target_pos == target_pos)){ soi->curr_row = soir; } else { soi->curr_row = soi->blank_row; } } /**/ soi->curr_query_index = 0; return; } gboolean SubOpt_Index_is_blocked(SubOpt_Index *soi, gint query_pos){ g_assert(query_pos >= 0); while(soi->curr_row->query_pos[soi->curr_query_index] < query_pos){ soi->curr_query_index++; } while(soi->curr_row->query_pos[soi->curr_query_index] > query_pos){ if(!soi->curr_query_index) break; soi->curr_query_index--; } return (soi->curr_row->query_pos[soi->curr_query_index] == query_pos); } /* FIXME: optimisation: use fwd and rev only versions of this for sdp */ /**/ exonerate-2.4.0/src/c4/cgutil.c0000644000175000017500000000735111162714263013171 00000000000000/****************************************************************\ * * * Utilities to facilitate code generation DP implementations * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "cgutil.h" void CGUtil_print_header(Codegen *codegen, C4_Model *model){ register gint i; Codegen_printf(codegen, "/* Code automatically generated by C4 DP library\n" " * Do not attempt to edit this code: it is spagetti.\n" " *\n" " * Model: %s\n" " */\n\n" " /* ---< START >--- */\n\n", model->name); for(i = 0; i < model->global_code_list->len; i++) Codegen_printf(codegen, model->global_code_list->pdata[i]); Codegen_printf(codegen, "\n"); return; } void CGUtil_print_footer(Codegen *codegen){ Codegen_printf(codegen, "/* ---< END >--- */\n\n"); return; } void CGUtil_compile(Codegen *codegen, C4_Model *model){ register gint i; register GString *cflags; cflags = g_string_new( " -I" SOURCE_ROOT_DIR "/src/c4"); g_string_append(cflags, " -I" SOURCE_ROOT_DIR "/src/struct"); g_string_append(cflags, " -I" SOURCE_ROOT_DIR "/src/general"); g_string_append(cflags, " -I" SOURCE_ROOT_DIR "/src/sequence"); g_string_append(cflags, " " GLIB_CFLAGS); for(i = 0; i < model->cflags_add_list->len; i++) g_string_append(cflags, model->cflags_add_list->pdata[i]); Codegen_compile(codegen, cflags->str, NULL); g_string_free(cflags, TRUE); return; } void CGUtil_prep(Codegen *codegen, C4_Model *model, gboolean is_init){ register gint i; register C4_Calc *calc; register C4_PrepFunc prep_func; register gchar *macro, *type = is_init ? "init" : "exit"; /**/ if(is_init){ if(model->init_macro){ Codegen_printf(codegen, "%s;\n", model->init_macro); } else { if(model->init_func){ Codegen_printf(codegen, "model->init_func(region, user_data);\n"); } } } /**/ Codegen_printf(codegen, "/* Transition calc %s */\n", type); for(i = 0; i < model->calc_list->len; i++){ calc = model->calc_list->pdata[i]; macro = is_init ? calc->init_macro : calc->exit_macro; if(macro){ Codegen_printf(codegen, "%s;\n", macro); } else { prep_func = is_init ? calc->init_func : calc->exit_func; if(prep_func){ Codegen_printf(codegen, "calc = model->calc_list->pdata[%d];\n", i); Codegen_printf(codegen, "calc->%s_func(region, user_data);\n", type); } } } /**/ if(!is_init){ if(model->exit_macro){ Codegen_printf(codegen, "%s;\n", model->exit_macro); } else { if(model->exit_func){ Codegen_printf(codegen, "model->exit_func(region, user_data);\n"); } } } /**/ return; } /**/ exonerate-2.4.0/src/c4/layout.test.c0000644000175000017500000000206311162714265014172 00000000000000/****************************************************************\ * * * C4 dynamic programming library - optimal alignment code * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "optimal.h" int main(void){ g_warning("Test not implemented for [%s]", __FILE__); return 0; } exonerate-2.4.0/src/c4/viterbi.test.c0000644000175000017500000000216511162714274014324 00000000000000/****************************************************************\ * * * C4 dynamic programming library - the Viterbi implementation * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "viterbi.h" int Argument_main(Argument *arg){ g_message("Check [%s]", Viterbi_DP_Func_ARGS_STR); g_warning("Test [%s] does nothing", __FILE__); return 0; } exonerate-2.4.0/src/c4/opair.test.c0000644000175000017500000000214211162714266013766 00000000000000/****************************************************************\ * * * C4 dynamic programming library - code for optimal pairs * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "opair.h" int Argument_main(Argument *arg){ g_warning("Test not implemented [%s]", __FILE__); return 0; } /* FIXME: implement this test */ exonerate-2.4.0/src/c4/opair.c0000644000175000017500000000436511162714265013020 00000000000000/****************************************************************\ * * * C4 dynamic programming library - code for optimal pairs * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "opair.h" /**/ OPair *OPair_create(Optimal *optimal, SubOpt *subopt, gint query_length, gint target_length, gpointer user_data){ register OPair *opair = g_new(OPair, 1); g_assert(optimal); g_assert(query_length >= 0); g_assert(target_length >= 0); opair->optimal = Optimal_share(optimal); opair->user_data = user_data; opair->subopt = SubOpt_share(subopt); opair->region = Region_create(0, 0, query_length, target_length); opair->prev_score = C4_IMPOSSIBLY_HIGH_SCORE; return opair; } void OPair_destroy(OPair *opair){ SubOpt_destroy(opair->subopt); Optimal_destroy(opair->optimal); Region_destroy(opair->region); g_free(opair); return; } Alignment *OPair_next_path(OPair *opair, C4_Score threshold){ register Alignment *alignment; alignment = Optimal_find_path(opair->optimal, opair->region, opair->user_data, threshold, opair->subopt); if(alignment){ if(alignment->score > opair->prev_score) g_warning("Score [%d] greater than previous [%d]", alignment->score, opair->prev_score); g_assert(alignment->score <= opair->prev_score); opair->prev_score = alignment->score; } return alignment; } /**/ exonerate-2.4.0/src/c4/subopt.test.c0000644000175000017500000000204611162714266014173 00000000000000/****************************************************************\ * * * C4 dynamic programming library - suboptimal alignments * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "subopt.h" int main(void){ g_warning("[%s] does nothing", __FILE__); return 0; } exonerate-2.4.0/src/c4/cgutil.test.c0000644000175000017500000000207011162714263014140 00000000000000/****************************************************************\ * * * Utilities to facilitate code generation DP implementations * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "cgutil.h" gint Argument_main(Argument *arg){ g_warning("[%s] does nothing", __FILE__); return 0; } exonerate-2.4.0/src/c4/region.h0000644000175000017500000000440211162714266013167 00000000000000/****************************************************************\ * * * C4 dynamic programming library - code for regions * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_REGION_H #define INCLUDED_REGION_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include typedef struct { gint ref_count; /* -1 when static region */ gint query_start; gint target_start; gint query_length; gint target_length; } Region; gboolean Region_is_valid(Region *region); Region *Region_create_blank(void); Region *Region_create(gint query_start, gint target_start, gint query_length, gint target_length); void Region_destroy(Region *region); Region *Region_share(Region *region); Region *Region_copy(Region *region); void Region_set_static(Region *region); void Region_init_static(Region *region, gint query_start, gint target_start, gint query_length, gint target_length); gboolean Region_is_within(Region *outer, Region *inner); gboolean Region_is_same(Region *region_a, Region *region_b); void Region_print(Region *region, gchar *name); #define Region_area(region) \ ((region)->query_length * (region)->target_length) #define Region_query_end(region) \ ((region)->query_start+(region)->query_length) #define Region_target_end(region) \ ((region)->target_start+(region)->target_length) #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_REGION_H */ exonerate-2.4.0/src/c4/c4.h0000644000175000017500000003042711222243077012212 00000000000000/****************************************************************\ * * * C4 dynamic programming library - code for models * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_C4_H #define INCLUDED_C4_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "region.h" #include "threadref.h" typedef gint C4_Score; #define C4_IMPOSSIBLY_LOW_SCORE -987654321 #define C4_IMPOSSIBLY_HIGH_SCORE -(C4_IMPOSSIBLY_LOW_SCORE) /**/ typedef C4_Score (*C4_CalcFunc)(gint query_pos, gint target_pos, gpointer user_data); /* Receives the position at the transition source */ typedef void (*C4_PrepFunc)(Region *region, gpointer user_data); /* To prepare subject region before starting DP */ typedef C4_Score (*C4_StartFunc)(gint query_pos, gint target_pos, gpointer user_data); /* Receives the position at the transition source */ typedef void (*C4_EndFunc)(C4_Score score, gint query_pos, gint target_pos, gpointer user_data); /* Receives the position at the transition end */ typedef C4_Score *(*C4_CellStartFunc)(gint query_pos, gint target_pos, gpointer user_data); /* Receives the position at the transition source */ typedef void (*C4_CellEndFunc)(C4_Score *cell, gint cell_size, gint query_pos, gint target_pos, gpointer user_data); /* Receives the position at the transition end */ /**/ typedef struct { gchar *name; gint id; /* Only set once the model is closed */ GPtrArray *input_transition_list; GPtrArray *output_transition_list; GPtrArray *src_shadow_list; /* Shadows starting at this state */ } C4_State; typedef enum { C4_Protect_NONE = (0<<0), C4_Protect_OVERFLOW = (1<<0), C4_Protect_UNDERFLOW = (1<<1) } C4_Protect; typedef struct { gchar *name; gint id; /* Only set once the model is closed */ C4_Score max_score; C4_CalcFunc calc_func; /* If NULL, use max_score */ gchar *calc_macro; /* If NULL, use calc_func */ C4_PrepFunc init_func; /* If NULL, no initialisation */ gchar *init_macro; /* If NULL, use init_func */ C4_PrepFunc exit_func; /* If NULL, no initialisation */ gchar *exit_macro; /* If NULL, use exit_func */ C4_Protect protect; } C4_Calc; /* When {calc,init,exit}_macro is supplied, * {calc,init,exit}_func must be set. */ typedef enum { C4_Scope_ANYWHERE, /* Anywhere */ C4_Scope_EDGE, /* Query || Target */ C4_Scope_QUERY, /* Query only */ C4_Scope_TARGET, /* Target only */ C4_Scope_CORNER /* Query && Target */ } C4_Scope; typedef struct { C4_State *state; C4_Scope scope; C4_CellStartFunc cell_start_func; gchar *cell_start_macro; } C4_StartState; /* Start cell will be set to zero if cell_start_func is NULL */ typedef struct { C4_State *state; C4_Scope scope; C4_CellEndFunc cell_end_func; gchar *cell_end_macro; } C4_EndState; typedef enum { C4_Label_NONE, C4_Label_MATCH, C4_Label_GAP, C4_Label_NER, C4_Label_5SS, C4_Label_3SS, C4_Label_INTRON, C4_Label_SPLIT_CODON, C4_Label_FRAMESHIFT } C4_Label; typedef struct { gchar *name; gint id; /* Only set once the model is closed */ C4_State *input; C4_State *output; C4_Calc *calc; /* Zero scores are emitted for a NULL calc */ gint advance_query; gint advance_target; C4_Label label; gpointer label_data; GPtrArray *dst_shadow_list; /* Shadows ending on transition */ } C4_Transition; typedef struct { gint id; /* Only set once the model is closed */ gchar *name; GPtrArray *src_state_list; GPtrArray *dst_transition_list; C4_StartFunc start_func; gchar *start_macro; C4_EndFunc end_func; gchar *end_macro; gint designation; /* Only set once the model is closed */ } C4_Shadow; typedef struct { gchar *name; gint id; gint advance_query; gint advance_target; C4_Calc *calc; GPtrArray *transition_list; /* Transitions using this portal */ } C4_Portal; typedef struct { gchar *name; gint id; C4_State *span_state; gint min_query; gint max_query; gint min_target; gint max_target; C4_Transition *query_loop; /* May be NULL */ C4_Transition *target_loop; /* May be NULL */ } C4_Span; typedef struct { gchar *name; GPtrArray *global_code_list; GPtrArray *local_code_list; GPtrArray *cflags_add_list; C4_PrepFunc init_func; /* If NULL, no initialisation */ gchar *init_macro; /* If NULL, use init_func */ C4_PrepFunc exit_func; /* If NULL, no initialisation */ gchar *exit_macro; /* If NULL, use exit_func */ ThreadRef *thread_ref; gboolean is_open; GPtrArray *state_list; GPtrArray *transition_list; GPtrArray *shadow_list; GPtrArray *calc_list; GPtrArray *portal_list; GPtrArray *span_list; C4_StartState *start_state; C4_EndState *end_state; gint max_query_advance; /* For convenience */ gint max_target_advance; /* For convenience */ gint total_shadow_designations; } C4_Model; /**/ C4_Model *C4_Model_create(gchar *name); /* Model is created open */ void C4_Model_destroy(C4_Model *model); C4_Model *C4_Model_share(C4_Model *model); C4_State **C4_Model_build_state_map(C4_Model *src, C4_Model *dst); C4_Transition **C4_Model_build_transition_map(C4_Model *src, C4_Model *dst); /**/ void C4_Model_make_stereo(C4_Model *model, gchar *suffix_a, gchar *suffix_b); void C4_Model_insert(C4_Model *target, C4_Model *insert, C4_State *src, C4_State *dst); /* Inserts into between and */ /**/ void C4_Model_rename(C4_Model *model, gchar *name); C4_State *C4_Model_add_state(C4_Model *model, gchar *name); C4_Calc *C4_Model_add_calc(C4_Model *model, gchar *name, C4_Score max_score, C4_CalcFunc calc_func, gchar *calc_macro, C4_PrepFunc init_func, gchar *init_macro, C4_PrepFunc exit_func, gchar *exit_macro, C4_Protect protect); /* calc_macro may contain %[QT][SPRE] * which are expanded to {query,target}_{start,position,region_pos,end} * in the generated code. */ C4_Transition *C4_Model_add_transition(C4_Model *model, gchar *name, C4_State *input, C4_State *output, gint advance_query, gint advance_target, C4_Calc *calc, C4_Label label, gpointer label_data); /* NULL input implies START state, NULL output implies END state */ GPtrArray *C4_Model_select_transitions(C4_Model *model, C4_Label label); C4_Transition *C4_Model_select_single_transition(C4_Model *model, C4_Label label); #define C4_Transition_is_match(transition) \ (transition->label == C4_Label_MATCH) #define C4_Transition_is_span(transition) \ ((transition->input == transition->output) \ && (!transition->calc)) /* Transition advance or target must be set, otherwise would be cyclic */ C4_Shadow *C4_Model_add_shadow(C4_Model *model, gchar *name, C4_State *src, C4_Transition *dst, C4_StartFunc start_func, gchar *start_macro, C4_EndFunc end_func, gchar *end_macro); /* NULL src state implies START state, * NULL output transition implies all transitions to the END state */ void C4_Shadow_add_src_state(C4_Shadow *shadow, C4_State *src); void C4_Shadow_add_dst_transition(C4_Shadow *shadow, C4_Transition *dst); C4_Portal *C4_Model_add_portal(C4_Model *model, gchar *name, C4_Calc *calc, gint advance_query, gint advance_target); /* If terminal_range or join_range are NULL, a default is used. */ C4_Span *C4_Model_add_span(C4_Model *model, gchar *name, C4_State *span_state, gint min_query, gint max_query, gint min_target, gint max_target); /* If range is NULL, a default is used. */ void C4_Model_append_codegen(C4_Model *model, gchar *global_code, gchar *local_code, gchar *cflags_add); void C4_Model_clear_codegen(C4_Model *model); void C4_Model_configure_extra(C4_Model *model, C4_PrepFunc init_func, gchar *init_macro, C4_PrepFunc exit_func, gchar *exit_macro); /* init_{func,macro} is called/included at start of DP functions, * (after variable declarations, before anything else) * exit_{func,macro} is called/included at the end of DP functions. * * {init,exit}_{func,macro} will be stripped from derived models, * but {global,local}_code will not. */ void C4_Model_configure_start_state(C4_Model *model, C4_Scope scope, C4_CellStartFunc cell_start_func, gchar *cell_start_macro); void C4_Model_configure_end_state(C4_Model *model, C4_Scope scope, C4_CellEndFunc cell_end_func, gchar *cell_end_macro); /* C4_Model_configure_{start,end}_state() * will work on open or closed models. */ void C4_Model_print(C4_Model *model); void C4_Model_dump_graphviz(C4_Model *model); void C4_Model_open(C4_Model *model); void C4_Model_close(C4_Model *model); /**/ /* Utility functions */ gchar *C4_Scope_get_name(C4_Scope scope); gchar *C4_Label_get_name(C4_Label label); gboolean C4_Model_path_is_possible(C4_Model *model, C4_State *src, C4_State *dst); void C4_Calc_init(C4_Calc *calc, Region *region, gpointer user_data); void C4_Calc_exit(C4_Calc *calc, Region *region, gpointer user_data); C4_Score C4_Calc_score(C4_Calc *calc, gint query_pos, gint target_pos, gpointer user_data); C4_Model *C4_Model_copy(C4_Model *model); /* Returns a closed copy of the model */ void C4_Model_remove_state(C4_Model *model, C4_State *state); void C4_Model_remove_transition(C4_Model *model, C4_Transition *transition); gboolean C4_Model_is_global(C4_Model *model); gboolean C4_Model_is_local(C4_Model *model); void C4_Model_remove_all_shadows(C4_Model *model); /**/ typedef struct { C4_Model *original; C4_Model *derived; C4_Transition **transition_map; } C4_DerivedModel; /* The transition_map maps derived_model transitions * back to original_model transitions. */ C4_DerivedModel *C4_DerivedModel_create(C4_Model *original_model, C4_State *src, C4_State *dst, C4_Scope start_scope, C4_CellStartFunc cell_start_func, gchar *cell_start_macro, C4_Scope end_scope, C4_CellEndFunc cell_end_func, gchar *cell_end_macro); /* A C4_DerivedModel will be created closed */ void C4_DerivedModel_destroy(C4_DerivedModel *derived_model); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_C4_H */ exonerate-2.4.0/src/c4/alignment.h0000644000175000017500000000674311215747532013675 00000000000000/****************************************************************\ * * * C4 dynamic programming library - alignment code * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_ALIGNMENT_H #define INCLUDED_ALIGNMENT_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "c4.h" #include "region.h" #include "sequence.h" #include "submat.h" #include "translate.h" #include "argument.h" typedef struct { gint alignment_width; gboolean forward_strand_coords; } Alignment_ArgumentSet; Alignment_ArgumentSet *Alignment_ArgumentSet_create(Argument *arg); typedef struct { C4_Transition *transition; gint length; } AlignmentOperation; typedef struct { Region *region; gint ref_count; GPtrArray *operation_list; C4_Score score; C4_Model *model; } Alignment; Alignment *Alignment_create(C4_Model *model, Region *region, C4_Score score); Alignment *Alignment_share(Alignment *alignment); void Alignment_destroy(Alignment *alignment); void Alignment_add(Alignment *alignment, C4_Transition *transition, gint length); void Alignment_display(Alignment *alignment, Sequence *query, Sequence *target, Submat *dna_submat, Submat *protein_submat, Translate *translate, FILE *fp); /* submat and translate may be NULL * when not required by the alignment model */ void Alignment_display_sugar(Alignment *alignment, Sequence *query, Sequence *target, FILE *fp); void Alignment_display_cigar(Alignment *alignment, Sequence *query, Sequence *target, FILE *fp); void Alignment_display_vulgar(Alignment *alignment, Sequence *query, Sequence *target, FILE *fp); void Alignment_display_gff(Alignment *alignment, Sequence *query, Sequence *target, Translate *translate, gboolean report_on_query, gboolean report_on_genomic, gint result_id, gpointer user_data, FILE *fp); void Alignment_display_ryo(Alignment *alignment, Sequence *query, Sequence *target, gchar *format, Translate *translate, gint rank, gpointer user_data, gpointer self_data, FILE *fp); /**/ gboolean Alignment_is_valid(Alignment *alignment, Region *region, gpointer user_data); void Alignment_import_derived(Alignment *alignment, Alignment *to_add, C4_DerivedModel *derived_model); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_ALIGNMENT_H */ exonerate-2.4.0/src/c4/codegen.h0000644000175000017500000000441011162714265013306 00000000000000/****************************************************************\ * * * Code generation module for C4 * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_CODEGEN_H #define INCLUDED_CODEGEN_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include #include "argument.h" typedef struct { gboolean use_compiled; } Codegen_ArgumentSet; Codegen_ArgumentSet *Codegen_ArgumentSet_create(Argument *arg); gchar *Codegen_clean_path_component(gchar *name); gboolean Codegen_file_exists(gchar *path); gboolean Codegen_directory_exists(gchar *path); /* FIXME: Use glib-2 functions instead when changing over */ typedef struct { FILE *fp; /* FILE descriptor for the code */ gchar *name; /* Name of the file/function */ gchar *code_path; /* Path to the .c file */ gchar *object_path; /* Path to the .o file */ gint indent; /* Current indentation level */ gchar *func_prototype; gchar *func_return; } Codegen; Codegen *Codegen_create(gchar *directory, gchar *name); void Codegen_destroy(Codegen *codegen); /* If directory is NULL, the default is used ( ~/.c4_plugins ) */ void Codegen_indent(Codegen *codegen, gint indent_change); void Codegen_printf(Codegen *codegen, gchar *format, ...); void Codegen_compile(Codegen *c, gchar *add_ccflags, gchar *add_ldflags); /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_CODEGEN_H */ exonerate-2.4.0/src/c4/optimal.h0000644000175000017500000000452011162714266013352 00000000000000/****************************************************************\ * * * C4 dynamic programming library - optimal alignment code * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_OPTIMAL_H #define INCLUDED_OPTIMAL_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "viterbi.h" #include "subopt.h" /**/ typedef enum { Optimal_Type_SCORE = (1 << 0), Optimal_Type_PATH = (1 << 1), Optimal_Type_REDUCED_SPACE = (1 << 2) } Optimal_Type; typedef struct { gchar *name; gint ref_count; Optimal_Type type; Viterbi *find_score; Viterbi *find_path; Viterbi *find_region; Viterbi *find_checkpoint_continuation; Viterbi *find_path_continuation; } Optimal; /**/ Optimal *Optimal_create(C4_Model *model, gchar *name, Optimal_Type type, gboolean use_codegen); /* If name is NULL, "optimal:model->name" is used */ void Optimal_destroy(Optimal *optimal); Optimal *Optimal_share(Optimal *optimal); GPtrArray *Optimal_make_Codegen_list(Optimal *optimal); /* Returns a list of the Codegen objects created */ C4_Score Optimal_find_score(Optimal *optimal, Region *region, gpointer user_data, SubOpt *subopt); /* Subopt is required for Optimal_find_score() in BSDP */ Alignment *Optimal_find_path(Optimal *optimal, Region *region, gpointer user_data, C4_Score threshold, SubOpt *subopt); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_OPTIMAL_H */ exonerate-2.4.0/src/c4/layout.h0000644000175000017500000000355211162714265013225 00000000000000/****************************************************************\ * * * C4 dynamic programming library - DP layout code * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_LAYOUT_H #define INCLUDED_LAYOUT_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "c4.h" /**/ typedef struct { GArray *transition_mask; } Layout_Mask; typedef struct { Layout_Mask *normal; Layout_Mask *end_query; Layout_Mask *end_target; Layout_Mask *corner; } Layout_Cell; typedef struct { GPtrArray *cell_list; } Layout_Row; typedef struct { GPtrArray *row_list; } Layout; /**/ Layout *Layout_create(C4_Model *model); void Layout_destroy(Layout *layout); void Layout_info(Layout *layout, C4_Model *model); gboolean Layout_is_transition_valid(Layout *layout, C4_Model *model, C4_Transition *transition, gint dst_query_pos, gint dst_target_pos, gint query_length, gint target_length); /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_LAYOUT_H */ exonerate-2.4.0/src/c4/viterbi.h0000644000175000017500000001260311162714266013352 00000000000000/****************************************************************\ * * * C4 dynamic programming library - the Viterbi implementation * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_VITERBI_H #define INCLUDED_VITERBI_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "c4.h" #include "region.h" #include "alignment.h" #include "argument.h" #include "layout.h" #include "codegen.h" #include "subopt.h" typedef struct { gint traceback_memory_limit; } Viterbi_ArgumentSet; Viterbi_ArgumentSet *Viterbi_ArgumentSet_create(Argument *arg); /**/ typedef struct { C4_Score ****score_matrix; /* 1st element is score */ gint cell_size; /* score, model,built-in shadows */ gint region_start_query_id; /* Cell for qy shadow */ gint region_start_target_id; /* Cell for tg shadow */ gint checkpoint_id; /* Cell for checkpoint */ } Viterbi_Row; /* Score matrix[max_target_advance] * [query_length] * [state_list->len] * [1 + shadow_total] * * Cell ids are set to -1 when not in use. */ typedef struct { GPtrArray *checkpoint_list; gint cell_size; C4_Score last_srp; gint section_length; gint counter; } Viterbi_Checkpoint; /* Each checkpoint is a (C4_Score****) as in Viterbi_Row. * cell_size is the same as in Viterbi_Row */ typedef struct { C4_State *first_state; /* Must be set before DP */ C4_Score *first_cell; /* Must be set before DP */ C4_State *final_state; /* Will be set after DP */ C4_Score *final_cell; /* Will be set after DP */ } Viterbi_Continuation; typedef struct { Viterbi_Row *vr; /* For Viterbi_Mode_FIND_{REGION,PATH} */ gint curr_query_end; gint curr_target_end; /* For Viterbi_Mode_FIND_REGION */ gint curr_query_start; gint curr_target_start; Region *alignment_region; /* For Viterbi_Mode_FIND_PATH */ C4_Transition ****traceback; /* For Viterbi_Mode_FIND_CHECKPOINTS */ Viterbi_Checkpoint *checkpoint; Viterbi_Continuation *continuation; } Viterbi_Data; #define Viterbi_DP_Func_ARGS_STR \ "(C4_Model *model, Region *region, \ Viterbi_Data *vd, SubOpt_Index *soi, \ gpointer user_data)" typedef C4_Score (*Viterbi_DP_Func)( \ C4_Model *model, Region *region, \ Viterbi_Data *vd, SubOpt_Index *soi, \ gpointer user_data); /* Avoiding stringify this with # * because of gcc 3.1 preprocessor weirdness. */ typedef enum { Viterbi_Mode_FIND_SCORE, Viterbi_Mode_FIND_PATH, Viterbi_Mode_FIND_REGION, Viterbi_Mode_FIND_CHECKPOINTS } Viterbi_Mode; typedef struct { Viterbi_ArgumentSet *vas; gchar *name; C4_Model *model; Viterbi_DP_Func func; Viterbi_Mode mode; gboolean use_continuation; gsize cell_size; Layout *layout; } Viterbi; Viterbi *Viterbi_create(C4_Model *model, gchar *name, Viterbi_Mode mode, gboolean use_continuation, gboolean use_codegen); void Viterbi_destroy(Viterbi *viterbi); /* Will issue a warning if a compiled version of a requested * function is not available unless codegen is disabled. */ gboolean Viterbi_use_reduced_space(Viterbi *viterbi, Region *region); /**/ Viterbi_Data *Viterbi_Data_create(Viterbi *Viterbi, Region *region); void Viterbi_Data_destroy(Viterbi_Data *vd); Alignment *Viterbi_Data_create_Alignment(Viterbi_Data *vd, C4_Model *model, C4_Score score, Region *region); void Viterbi_Data_set_continuation(Viterbi_Data *vd, C4_State *first_state, C4_Score *first_cell, C4_State *final_state, C4_Score *final_cell); void Viterbi_Data_clear_continuation(Viterbi_Data *vd); typedef struct { Region *region; C4_State *first_state; C4_Score *final_cell; } Viterbi_SubAlignment; GPtrArray *Viterbi_Checkpoint_traceback(Viterbi *viterbi, Viterbi_Data *vd, Region *region, C4_State *first_state, C4_Score *final_cell); void Viterbi_SubAlignment_destroy(Viterbi_SubAlignment *vsa); C4_Score Viterbi_calculate(Viterbi *viterbi, Region *region, Viterbi_Data *vd, gpointer user_data, SubOpt *subopt); Codegen *Viterbi_make_Codegen(Viterbi *viterbi); /* Creates and compiles codegen */ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_VITERBI_H */ exonerate-2.4.0/src/c4/opair.h0000644000175000017500000000322111162714266013014 00000000000000/****************************************************************\ * * * C4 dynamic programming library - code for optimal pairs * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_OPAIR_H #define INCLUDED_OPAIR_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "c4.h" #include "alignment.h" #include "optimal.h" #include "subopt.h" typedef struct { gpointer user_data; Optimal *optimal; Region *region; SubOpt *subopt; C4_Score prev_score; } OPair; OPair *OPair_create(Optimal *optimal, SubOpt *subopt, gint query_length, gint target_length, gpointer user_data); void OPair_destroy(OPair *opair); /* user_data is provided for the Optimal functions */ Alignment *OPair_next_path(OPair *hpair, C4_Score threshold); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_OPAIR_H */ exonerate-2.4.0/src/c4/subopt.h0000644000175000017500000000661011162714266013223 00000000000000/****************************************************************\ * * * C4 dynamic programming library - suboptimal alignments * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_SUBOPT_H #define INCLUDED_SUBOPT_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "alignment.h" #include "region.h" #include "rangetree.h" /* Points are stored in a RangeTree, * then put in row/col lists for dp lookup. */ typedef struct { gint ref_count; gint query_length; gint target_length; RangeTree *range_tree; gint path_count; } SubOpt; SubOpt *SubOpt_create(gint query_length, gint target_length); SubOpt *SubOpt_share(SubOpt *subopt); void SubOpt_destroy(SubOpt *subopt); void SubOpt_add_alignment(SubOpt *subopt, Alignment *alignment); typedef gboolean (*SubOpt_FindFunc)(gint query_pos, gint target_pos, gint path_id, gpointer user_data); gboolean SubOpt_find(SubOpt *subopt, Region *region, SubOpt_FindFunc find_func, gpointer user_data); gboolean SubOpt_overlaps_alignment(SubOpt *subopt, Alignment *alignment); /**/ typedef struct { gint *query_pos; gint target_pos; gint total; /* Number of query positions in this row */ } SubOpt_Index_Row; typedef struct { Region *region; GPtrArray *row_list; /* Contains SubOpt_Index_Row */ SubOpt_Index_Row *curr_row; gint curr_row_index; gint curr_query_index; SubOpt_Index_Row *blank_row; } SubOpt_Index; SubOpt_Index *SubOpt_Index_create(SubOpt *subopt, Region *region); void SubOpt_Index_destroy(SubOpt_Index *soi); void SubOpt_Index_set_row(SubOpt_Index *soi, gint target_pos); /* FIXME: could make SubOpt_Index_set_row a macro for speed */ gboolean SubOpt_Index_is_blocked(SubOpt_Index *soi, gint query_pos); #define SubOpt_Index_is_blocked_fast(soi, q_pos) \ ((soi->curr_row->query_pos[ soi->curr_query_index] < q_pos) \ ?(soi->curr_row->query_pos[++soi->curr_query_index] == q_pos) \ :(soi->curr_row->query_pos[ soi->curr_query_index] == q_pos)) /* For SubOpt_Index_set_row() and SubOpt_Index_is_blocked_fast(), * query_pos and target_pos should use the coordinates * of the indexed region, not of the whole sequence * * It is assumed that target_pos increments between * each call to SubOpt_Index_set_row(). * * Multiple calls to SubOpt_Index_is_blocked() must be allowed, * for models with multiple match states. */ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_SUBOPT_H */ exonerate-2.4.0/src/c4/cgutil.h0000644000175000017500000000300111162714263013162 00000000000000/****************************************************************\ * * * Utilities to facilitate code generation DP implementations * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_CGUTIL_H #define INCLUDED_CGUTIL_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include #include "c4.h" #include "codegen.h" void CGUtil_print_header(Codegen *codegen, C4_Model *model); void CGUtil_print_footer(Codegen *codegen); void CGUtil_compile(Codegen *codegen, C4_Model *model); void CGUtil_prep(Codegen *codegen, C4_Model *model, gboolean is_init); /* void CGUtil_model_init(Codegen *codegen); void CGUtil_model_end(Codegen *codegen); */ /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_CODEGEN_H */ exonerate-2.4.0/src/bsdp/0000777000175000017500000000000011222703703012230 500000000000000exonerate-2.4.0/src/bsdp/Makefile.am0000644000175000017500000000565111222243471014210 00000000000000 TESTS = heuristic.test bsdp.test hpair.test sar.test noinst_PROGRAMS = $(TESTS) INCLUDES = -I$(top_srcdir)/src/struct \ -I$(top_srcdir)/src/comparison \ -I$(top_srcdir)/src/sequence \ -I$(top_srcdir)/src/general \ -I$(top_srcdir)/src/c4 \ -DSOURCE_ROOT_DIR="\"@source_root_dir@\"" \ -DGLIB_CFLAGS="\"@glib_cflags@\"" \ -DHOSTTYPE="\"@host@\"" noinst_HEADERS = heuristic.h bsdp.h hpair.h sar.h SEQUENCE_OBJ = $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o \ -lm C4_OBJ = $(top_srcdir)/src/c4/c4.o \ $(top_srcdir)/src/c4/alignment.o \ $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/codegen.o \ $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/viterbi.o \ $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/c4/cgutil.o ALIGNMENT_OBJ = $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(C4_OBJ) $(SEQUENCE_OBJ) heuristic_test_SOURCES = heuristic.test.c heuristic.c heuristic_test_LDADD = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(ALIGNMENT_OBJ) bsdp_test_SOURCES = bsdp.test.c bsdp.c bsdp_test_LDADD = $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/struct/recyclebin.o hpair_test_SOURCES = hpair.test.c hpair.c heuristic.c bsdp.c sar.c hpair_test_LDADD = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(ALIGNMENT_OBJ) sar_test_SOURCES = sar.test.c sar.c heuristic.c sar_test_LDADD = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(ALIGNMENT_OBJ) # Files to clear away MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/src/bsdp/Makefile.in0000644000175000017500000003410511222703650014215 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ TESTS = heuristic.test bsdp.test hpair.test sar.test noinst_PROGRAMS = $(TESTS) INCLUDES = -I$(top_srcdir)/src/struct -I$(top_srcdir)/src/comparison -I$(top_srcdir)/src/sequence -I$(top_srcdir)/src/general -I$(top_srcdir)/src/c4 -DSOURCE_ROOT_DIR="\"@source_root_dir@\"" -DGLIB_CFLAGS="\"@glib_cflags@\"" -DHOSTTYPE="\"@host@\"" noinst_HEADERS = heuristic.h bsdp.h hpair.h sar.h SEQUENCE_OBJ = $(top_srcdir)/src/struct/sparsecache.o $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o $(top_srcdir)/src/sequence/alphabet.o $(top_srcdir)/src/sequence/splice.o $(top_srcdir)/src/sequence/translate.o $(top_srcdir)/src/general/argument.o $(top_srcdir)/src/general/lineparse.o $(top_srcdir)/src/general/threadref.o -lm C4_OBJ = $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/optimal.o $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/region.o $(top_srcdir)/src/c4/layout.o $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/subopt.o $(top_srcdir)/src/c4/cgutil.o ALIGNMENT_OBJ = $(top_srcdir)/src/comparison/match.o $(top_srcdir)/src/sequence/codonsubmat.o $(top_srcdir)/src/sequence/submat.o $(C4_OBJ) $(SEQUENCE_OBJ) heuristic_test_SOURCES = heuristic.test.c heuristic.c heuristic_test_LDADD = $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/rangetree.o $(ALIGNMENT_OBJ) bsdp_test_SOURCES = bsdp.test.c bsdp.c bsdp_test_LDADD = $(top_srcdir)/src/general/argument.o $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/recyclebin.o hpair_test_SOURCES = hpair.test.c hpair.c heuristic.c bsdp.c sar.c hpair_test_LDADD = $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/rangetree.o $(top_srcdir)/src/comparison/hspset.o $(top_srcdir)/src/comparison/wordhood.o $(ALIGNMENT_OBJ) sar_test_SOURCES = sar.test.c sar.c heuristic.c sar_test_LDADD = $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/rangetree.o $(ALIGNMENT_OBJ) # Files to clear away MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = PROGRAMS = $(noinst_PROGRAMS) DEFS = @DEFS@ -I. -I$(srcdir) CPPFLAGS = @CPPFLAGS@ LDFLAGS = @LDFLAGS@ LIBS = @LIBS@ heuristic_test_OBJECTS = heuristic.test.o heuristic.o heuristic_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/c4/c4.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/layout.o $(top_srcdir)/src/c4/viterbi.o \ $(top_srcdir)/src/c4/subopt.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o heuristic_test_LDFLAGS = bsdp_test_OBJECTS = bsdp.test.o bsdp.o bsdp_test_DEPENDENCIES = $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/recyclebin.o bsdp_test_LDFLAGS = hpair_test_OBJECTS = hpair.test.o hpair.o heuristic.o bsdp.o sar.o hpair_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/c4/c4.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/layout.o $(top_srcdir)/src/c4/viterbi.o \ $(top_srcdir)/src/c4/subopt.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o hpair_test_LDFLAGS = sar_test_OBJECTS = sar.test.o sar.o heuristic.o sar_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/c4/c4.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/layout.o $(top_srcdir)/src/c4/viterbi.o \ $(top_srcdir)/src/c4/subopt.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o sar_test_LDFLAGS = CFLAGS = @CFLAGS@ COMPILE = $(CC) $(DEFS) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) CCLD = $(CC) LINK = $(CCLD) $(AM_CFLAGS) $(CFLAGS) $(LDFLAGS) -o $@ HEADERS = $(noinst_HEADERS) DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best SOURCES = $(heuristic_test_SOURCES) $(bsdp_test_SOURCES) $(hpair_test_SOURCES) $(sar_test_SOURCES) OBJECTS = $(heuristic_test_OBJECTS) $(bsdp_test_OBJECTS) $(hpair_test_OBJECTS) $(sar_test_OBJECTS) all: all-redirect .SUFFIXES: .SUFFIXES: .S .c .o .s $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps src/bsdp/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status mostlyclean-noinstPROGRAMS: clean-noinstPROGRAMS: -test -z "$(noinst_PROGRAMS)" || rm -f $(noinst_PROGRAMS) distclean-noinstPROGRAMS: maintainer-clean-noinstPROGRAMS: .c.o: $(COMPILE) -c $< .s.o: $(COMPILE) -c $< .S.o: $(COMPILE) -c $< mostlyclean-compile: -rm -f *.o core *.core clean-compile: distclean-compile: -rm -f *.tab.c maintainer-clean-compile: heuristic.test: $(heuristic_test_OBJECTS) $(heuristic_test_DEPENDENCIES) @rm -f heuristic.test $(LINK) $(heuristic_test_LDFLAGS) $(heuristic_test_OBJECTS) $(heuristic_test_LDADD) $(LIBS) bsdp.test: $(bsdp_test_OBJECTS) $(bsdp_test_DEPENDENCIES) @rm -f bsdp.test $(LINK) $(bsdp_test_LDFLAGS) $(bsdp_test_OBJECTS) $(bsdp_test_LDADD) $(LIBS) hpair.test: $(hpair_test_OBJECTS) $(hpair_test_DEPENDENCIES) @rm -f hpair.test $(LINK) $(hpair_test_LDFLAGS) $(hpair_test_OBJECTS) $(hpair_test_LDADD) $(LIBS) sar.test: $(sar_test_OBJECTS) $(sar_test_DEPENDENCIES) @rm -f sar.test $(LINK) $(sar_test_LDFLAGS) $(sar_test_OBJECTS) $(sar_test_LDADD) $(LIBS) tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = src/bsdp distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done check-TESTS: $(TESTS) @failed=0; all=0; \ srcdir=$(srcdir); export srcdir; \ for tst in $(TESTS); do \ if test -f $$tst; then dir=.; \ else dir="$(srcdir)"; fi; \ if $(TESTS_ENVIRONMENT) $$dir/$$tst; then \ all=`expr $$all + 1`; \ echo "PASS: $$tst"; \ elif test $$? -ne 77; then \ all=`expr $$all + 1`; \ failed=`expr $$failed + 1`; \ echo "FAIL: $$tst"; \ fi; \ done; \ if test "$$failed" -eq 0; then \ banner="All $$all tests passed"; \ else \ banner="$$failed of $$all tests failed"; \ fi; \ dashes=`echo "$$banner" | sed s/./=/g`; \ echo "$$dashes"; \ echo "$$banner"; \ echo "$$dashes"; \ test "$$failed" -eq 0 info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am $(MAKE) $(AM_MAKEFLAGS) check-TESTS check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall: uninstall-am all-am: Makefile $(PROGRAMS) $(HEADERS) all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-noinstPROGRAMS mostlyclean-compile \ mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-noinstPROGRAMS clean-compile clean-tags clean-generic \ mostlyclean-am clean: clean-am distclean-am: distclean-noinstPROGRAMS distclean-compile distclean-tags \ distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-noinstPROGRAMS \ maintainer-clean-compile maintainer-clean-tags \ maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: mostlyclean-noinstPROGRAMS distclean-noinstPROGRAMS \ clean-noinstPROGRAMS maintainer-clean-noinstPROGRAMS \ mostlyclean-compile distclean-compile clean-compile \ maintainer-clean-compile tags mostlyclean-tags distclean-tags \ clean-tags maintainer-clean-tags distdir check-TESTS info-am info \ dvi-am dvi check check-am installcheck-am installcheck install-exec-am \ install-exec install-data-am install-data install-am install \ uninstall-am uninstall all-redirect all-am all installdirs \ mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/src/bsdp/heuristic.test.c0000644000175000017500000000215111162714262015271 00000000000000/****************************************************************\ * * * C4 dynamic programming library - code for heuristics * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "heuristic.h" int Argument_main(Argument *arg){ g_warning("Test not implemented for [%s]", __FILE__); return 0; } /* FIXME: implement this test */ exonerate-2.4.0/src/bsdp/heuristic.c0000644000175000017500000011054511162714262014322 00000000000000/****************************************************************\ * * * C4 dynamic programming library - code for heuristics * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "heuristic.h" #include "matrix.h" #include "optimal.h" #include "region.h" /**/ #if 0 gchar *Heuristic_Refinement_to_string(Heuristic_Refinement refinement){ register gchar *name = NULL; switch(refinement){ case Heuristic_Refinement_NONE: name = "none"; break; case Heuristic_Refinement_FULL: name = "full"; break; case Heuristic_Refinement_REGION: name = "region"; break; default: g_error("Unknown Heuristic Refinement [%d]", refinement); break; } return name; } Heuristic_Refinement Heuristic_Refinement_from_string(gchar *str){ gchar *name[Heuristic_Refinement_TOTAL] = {"none", "full", "region"}; Heuristic_Refinement refinement[Heuristic_Refinement_TOTAL] = {Heuristic_Refinement_NONE, Heuristic_Refinement_FULL, Heuristic_Refinement_REGION}; register gint i; for(i = 0; i < Heuristic_Refinement_TOTAL; i++) if(!g_strcasecmp(name[i], str)) return refinement[i]; g_error("Unknown refinement type [%s]", str); return Heuristic_Refinement_NONE; /* Not reached */ } static gchar *Heuristic_Argument_parse_refinement(gchar *arg_string, gpointer data){ register Heuristic_Refinement *refinement = (Heuristic_Refinement*)data; gchar *refine_str; register gchar *ret_val = Argument_parse_string(arg_string, &refine_str); if(ret_val) return ret_val; (*refinement) = Heuristic_Refinement_from_string(refine_str); return NULL; } #endif /* 0 */ Heuristic_ArgumentSet *Heuristic_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static Heuristic_ArgumentSet has; if(arg){ as = ArgumentSet_create("Heuristic Options"); /**/ ArgumentSet_add_option(as, '\0', "terminalrangeint", NULL, "Internal terminal range", "12", Argument_parse_int, &has.terminal_range_internal); ArgumentSet_add_option(as, '\0', "terminalrangeext", NULL, "External terminal range", "12", Argument_parse_int, &has.terminal_range_external); /**/ ArgumentSet_add_option(as, '\0', "joinrangeint", NULL, "Internal join range", "12", Argument_parse_int, &has.join_range_internal); ArgumentSet_add_option(as, '\0', "joinrangeext", NULL, "External join range", "12", Argument_parse_int, &has.join_range_external); /**/ ArgumentSet_add_option(as, '\0', "spanrangeint", NULL, "Internal span range", "12", Argument_parse_int, &has.span_range_internal); ArgumentSet_add_option(as, '\0', "spanrangeext", NULL, "External span range", "12", Argument_parse_int, &has.span_range_external); /**/ #if 0 ArgumentSet_add_option(as, '\0', "refine", NULL, "Alignment refinement strategy [none|full|region]", "none", Heuristic_Argument_parse_refinement, &has.refinement); ArgumentSet_add_option(as, '\0', "refineboundary", NULL, "Refinement region boundary", "32", Argument_parse_int, &has.refinement_boundary); #endif /* 0 */ /**/ Argument_absorb_ArgumentSet(arg, as); } return &has; } /**/ static Heuristic_Range *Heuristic_Range_create(gint internal, gint external, C4_Portal *portal){ register Heuristic_Range *heuristic_range = g_new(Heuristic_Range, 1); g_assert(internal); g_assert(external); heuristic_range->internal_query = internal * portal->advance_query; heuristic_range->internal_target = internal * portal->advance_target; heuristic_range->external_query = external * portal->advance_query; heuristic_range->external_target = external * portal->advance_target; return heuristic_range; } static void Heuristic_Range_destroy(Heuristic_Range *heuristic_range){ g_free(heuristic_range); return; } /**/ static void Heuristic_Bound_report_end_func(C4_Score *cell, gint cell_size, gint query_pos, gint target_pos, gpointer user_data){ register C4_Score **matrix = user_data; matrix[query_pos][target_pos] = cell[0]; return; } static gchar *Heuristic_Bound_report_end_macro = "matrix[%QP][%TP] = %C[0]"; static Heuristic_Bound *Heuristic_Bound_create(C4_Model *model, gint query_range, gint target_range){ register Heuristic_Bound *heuristic_bound = g_new(Heuristic_Bound, 1); register gint i, j; register Optimal *optimal; register C4_Model *bound_model = C4_Model_copy(model); register gchar *optimal_name; register Region *region; register C4_Calc *calc; C4_Model_open(bound_model); heuristic_bound->query_range = query_range; heuristic_bound->target_range = target_range; heuristic_bound->matrix = (C4_Score**)Matrix2d_create( query_range+1, target_range+1, sizeof(C4_Score)); /* Ensure bounds set to -INF for correct behaviour with: * when later calling Heuristic_Bound_max_region_convert() */ for(i = 0; i <= query_range; i++) for(j = 0; j <= target_range; j++) heuristic_bound->matrix[i][j] = C4_IMPOSSIBLY_LOW_SCORE; /* Empty all calcs */ for(i = 0; i < bound_model->calc_list->len; i++){ calc = bound_model->calc_list->pdata[i]; if(calc){ calc->calc_func = NULL; calc->init_func = NULL; calc->exit_func = NULL; } } /* Remove all shadows */ C4_Model_remove_all_shadows(bound_model); /* Strip of any extras, (no codegen anyway) */ C4_Model_clear_codegen(bound_model); /* Strip any start_func, add an end_func */ C4_Model_configure_extra(bound_model, NULL, NULL, NULL, NULL); C4_Model_configure_start_state(bound_model, model->start_state->scope, NULL, NULL); C4_Model_configure_end_state(bound_model, C4_Scope_ANYWHERE, Heuristic_Bound_report_end_func, Heuristic_Bound_report_end_macro); /* Collect the bound scores */ C4_Model_close(bound_model); optimal_name = g_strdup_printf("Bound:%s", bound_model->name); optimal = Optimal_create(bound_model, optimal_name, Optimal_Type_SCORE, FALSE); /* No codegen for bound models (as only used at startup) */ g_free(optimal_name); region = Region_create(0, 0, query_range, target_range); Optimal_find_score(optimal, region, heuristic_bound->matrix, NULL); Region_destroy(region); Optimal_destroy(optimal); C4_Model_destroy(bound_model); return heuristic_bound; } static void Heuristic_Bound_destroy(Heuristic_Bound *heuristic_bound){ g_free(heuristic_bound->matrix); g_free(heuristic_bound); return; } static void Heuristic_Bound_max_region_convert( Heuristic_Bound *heuristic_bound){ register gint i, j; for(i = 1; i <= heuristic_bound->query_range; i++){ for(j = 1; j <= heuristic_bound->target_range; j++){ if(heuristic_bound->matrix[i][j] < heuristic_bound->matrix[i-1][j-1]) heuristic_bound->matrix[i][j] = heuristic_bound->matrix[i-1][j-1]; if(heuristic_bound->matrix[i][j] < heuristic_bound->matrix[i-1][j]) heuristic_bound->matrix[i][j] = heuristic_bound->matrix[i-1][j]; if(heuristic_bound->matrix[i][j] < heuristic_bound->matrix[i][j-1]) heuristic_bound->matrix[i][j] = heuristic_bound->matrix[i][j-1]; } } return; } /* Converts the scores in the Heuristic_Bound, * so that each cell contains the max of all preceeding cells. */ /**/ static Heuristic_Join *Heuristic_Join_create(C4_Model *model, Heuristic_Match *src, Heuristic_Match *dst, Heuristic_ArgumentSet *has){ register Heuristic_Join *heuristic_join; register C4_Model *derived; register gchar *optimal_name; heuristic_join = g_new(Heuristic_Join, 1); heuristic_join->src_range = Heuristic_Range_create( has->join_range_internal, has->join_range_external, src->portal); heuristic_join->dst_range = Heuristic_Range_create( has->join_range_internal, has->join_range_external, dst->portal); heuristic_join->join_model = C4_DerivedModel_create(model, src->transition->output, dst->transition->output, C4_Scope_CORNER, NULL, NULL, C4_Scope_CORNER, NULL, NULL); derived = heuristic_join->join_model->derived; heuristic_join->bound = Heuristic_Bound_create(derived, heuristic_join->src_range->internal_query + heuristic_join->src_range->external_query + heuristic_join->src_range->internal_query + heuristic_join->src_range->external_query, heuristic_join->dst_range->internal_target + heuristic_join->dst_range->external_target + heuristic_join->dst_range->internal_target + heuristic_join->dst_range->external_target); optimal_name = g_strdup_printf("HeuristicJoin:[%s]", derived->name); heuristic_join->optimal = Optimal_create(derived, optimal_name, Optimal_Type_SCORE |Optimal_Type_PATH, TRUE); g_free(optimal_name); return heuristic_join; } static void Heuristic_Join_destroy(Heuristic_Join *heuristic_join){ Heuristic_Range_destroy(heuristic_join->src_range); Heuristic_Range_destroy(heuristic_join->dst_range); C4_DerivedModel_destroy(heuristic_join->join_model); Heuristic_Bound_destroy(heuristic_join->bound); Optimal_destroy(heuristic_join->optimal); g_free(heuristic_join); return; } /**/ static Heuristic_Terminal *Heuristic_Terminal_create(C4_Model *model, C4_Portal *portal, C4_Transition *transition, gboolean is_start, Heuristic_ArgumentSet *has){ register Heuristic_Terminal *heuristic_terminal = g_new(Heuristic_Terminal, 1); register C4_Model *derived; register gchar *optimal_name; heuristic_terminal->range = Heuristic_Range_create( has->terminal_range_internal, has->terminal_range_external, portal); if(is_start){ heuristic_terminal->terminal_model = C4_DerivedModel_create( model, model->start_state->state, transition->output, model->start_state->scope, NULL, NULL, C4_Scope_CORNER, NULL, NULL); } else { /* end terminal */ heuristic_terminal->terminal_model = C4_DerivedModel_create( model, transition->output, model->end_state->state, C4_Scope_CORNER, NULL, NULL, model->end_state->scope, NULL, NULL); } derived = heuristic_terminal->terminal_model->derived; heuristic_terminal->bound = Heuristic_Bound_create(derived, (heuristic_terminal->range->internal_query + heuristic_terminal->range->external_query), (heuristic_terminal->range->internal_target + heuristic_terminal->range->external_target)); Heuristic_Bound_max_region_convert(heuristic_terminal->bound); optimal_name = g_strdup_printf("Terminal [%s]", derived->name); heuristic_terminal->optimal = Optimal_create(derived, optimal_name, Optimal_Type_SCORE |Optimal_Type_PATH, TRUE); g_free(optimal_name); return heuristic_terminal; } static void Heuristic_Terminal_destroy( Heuristic_Terminal *heuristic_terminal){ Heuristic_Range_destroy(heuristic_terminal->range); C4_DerivedModel_destroy(heuristic_terminal->terminal_model); Heuristic_Bound_destroy(heuristic_terminal->bound); Optimal_destroy(heuristic_terminal->optimal); g_free(heuristic_terminal); return; } /**/ static Heuristic_Match *Heuristic_Match_create(C4_Model *model, C4_Portal *portal, C4_Transition *transition, gint id, Heuristic_ArgumentSet *has){ register Heuristic_Match *match = g_new(Heuristic_Match, 1); match->id = id; match->portal = portal; match->transition = transition; match->start_terminal = Heuristic_Terminal_create(model, portal, transition, TRUE, has); match->end_terminal = Heuristic_Terminal_create(model, portal, transition, FALSE, has); return match; } static void Heuristic_Match_destroy(Heuristic_Match *match){ Heuristic_Terminal_destroy(match->start_terminal); Heuristic_Terminal_destroy(match->end_terminal); g_free(match); return; } /**/ static C4_Score Heuristic_Span_score(Heuristic_Span *heuristic_span, gint query_length, gint target_length){ /* FIXME: need to implement properly for non-zero span scores */ return 0; } void Heuristic_Span_add_traceback(Heuristic_Span *heuristic_span, Alignment *alignment, gint query_length, gint target_length){ if(query_length){ g_assert(heuristic_span->span->query_loop); Alignment_add(alignment, heuristic_span->span->query_loop, query_length / heuristic_span->span->query_loop->advance_query); } if(target_length){ g_assert(heuristic_span->span->target_loop); Alignment_add(alignment, heuristic_span->span->target_loop, target_length / heuristic_span->span->target_loop->advance_target); } return; } static void Heuristic_Span_src_report_end_func(C4_Score *cell, gint cell_size, gint query_pos, gint target_pos, gpointer user_data){ register Heuristic_Data *heuristic_data = user_data; /* This requires heuristic_data to be the 1st member of user_data */ register Heuristic_Span *heuristic_span = heuristic_data->heuristic_span; register gint real_query_pos, real_target_pos; register gint i; g_assert(heuristic_span); g_assert(heuristic_span->curr_src_region); real_query_pos = query_pos - heuristic_span->curr_src_region->query_start; real_target_pos = target_pos - heuristic_span->curr_src_region->target_start; g_assert(real_query_pos >= 0); g_assert(real_target_pos >= 0); g_assert(real_query_pos <= heuristic_span->src_bound->query_range); g_assert(real_target_pos <= heuristic_span->src_bound->target_range); for(i = 0; i < cell_size; i++) heuristic_span->src_integration_matrix [real_query_pos][real_target_pos][i] = cell[i]; return; } static C4_Score *Heuristic_Span_dst_init_start_func( gint query_pos, gint target_pos, gpointer user_data){ register Heuristic_Data *heuristic_data = user_data; /* This requires heuristic_data to be the 1st member of user_data */ register Heuristic_Span *heuristic_span = heuristic_data->heuristic_span; register gint real_query_pos, real_target_pos; register Heuristic_Span_Cell *span_cell; g_assert(heuristic_span); g_assert(heuristic_span->curr_dst_region); real_query_pos = query_pos - heuristic_span->curr_dst_region->query_start; real_target_pos = target_pos - heuristic_span->curr_dst_region->target_start; g_assert(real_query_pos >= 0); g_assert(real_target_pos >= 0); g_assert(real_query_pos <= heuristic_span->dst_bound->query_range); g_assert(real_target_pos <= heuristic_span->dst_bound->target_range); span_cell = &heuristic_span->dst_integration_matrix [real_query_pos][real_target_pos]; if((span_cell->query_pos == -1) || (span_cell->target_pos == -1)) return heuristic_span->dummy_cell; return heuristic_span->src_integration_matrix [span_cell->query_pos - heuristic_span->curr_src_region->query_start] [span_cell->target_pos - heuristic_span->curr_src_region->target_start]; } static Heuristic_Span *Heuristic_Span_create(C4_Model *model, C4_State *src, C4_State *dst, C4_Portal *src_portal, C4_Portal *dst_portal, C4_Span *span, Heuristic_ArgumentSet *has){ register Heuristic_Span *heuristic_span = g_new(Heuristic_Span, 1); register gchar *optimal_name; register gint cell_size = 1 + model->total_shadow_designations; heuristic_span->span = span; /**/ heuristic_span->src_range = Heuristic_Range_create( has->span_range_internal, has->span_range_external, src_portal); heuristic_span->dst_range = Heuristic_Range_create( has->span_range_internal, has->span_range_external, dst_portal); /**/ heuristic_span->src_model = C4_DerivedModel_create(model, src, span->span_state, C4_Scope_CORNER, NULL, NULL, C4_Scope_ANYWHERE, Heuristic_Span_src_report_end_func, NULL); heuristic_span->dst_model = C4_DerivedModel_create(model, span->span_state, dst, C4_Scope_ANYWHERE, Heuristic_Span_dst_init_start_func, NULL, C4_Scope_CORNER, NULL, NULL); heuristic_span->src_traceback_model = C4_DerivedModel_create(model, src, span->span_state, C4_Scope_CORNER, NULL, NULL, C4_Scope_CORNER, NULL, NULL); /**/ heuristic_span->src_bound = Heuristic_Bound_create( heuristic_span->src_model->derived, (heuristic_span->src_range->internal_query + heuristic_span->src_range->external_query), (heuristic_span->src_range->internal_target + heuristic_span->src_range->external_target)); heuristic_span->dst_bound = Heuristic_Bound_create( heuristic_span->dst_model->derived, (heuristic_span->dst_range->internal_query + heuristic_span->dst_range->external_query), (heuristic_span->dst_range->internal_target + heuristic_span->dst_range->external_target)); Heuristic_Bound_max_region_convert(heuristic_span->src_bound); Heuristic_Bound_max_region_convert(heuristic_span->dst_bound); /**/ optimal_name = g_strdup_printf("Src span [%s]", heuristic_span->src_model->derived->name); heuristic_span->src_optimal = Optimal_create( heuristic_span->src_model->derived, optimal_name, Optimal_Type_SCORE|Optimal_Type_PATH, TRUE); g_free(optimal_name); /**/ optimal_name = g_strdup_printf("Dst span [%s]", heuristic_span->dst_model->derived->name); heuristic_span->dst_optimal = Optimal_create( heuristic_span->dst_model->derived, optimal_name, Optimal_Type_SCORE|Optimal_Type_PATH, TRUE); g_free(optimal_name); /**/ optimal_name = g_strdup_printf("Src span traceback [%s]", heuristic_span->src_model->derived->name); heuristic_span->src_traceback_optimal = Optimal_create( heuristic_span->src_traceback_model->derived, optimal_name, Optimal_Type_PATH, TRUE); g_free(optimal_name); /**/ heuristic_span->src_integration_matrix = (C4_Score***)Matrix3d_create( heuristic_span->src_bound->query_range+1, heuristic_span->src_bound->target_range+1, cell_size, sizeof(C4_Score)); heuristic_span->dst_integration_matrix = (Heuristic_Span_Cell**)Matrix2d_create( heuristic_span->dst_bound->query_range+1, heuristic_span->dst_bound->target_range+1, sizeof(Heuristic_Span_Cell)); heuristic_span->curr_src_region = NULL; heuristic_span->curr_dst_region = NULL; heuristic_span->dummy_cell = g_new0(C4_Score, cell_size); heuristic_span->dummy_cell[0] = C4_IMPOSSIBLY_LOW_SCORE; return heuristic_span; } /* FIXME: tidy : can use automatic optimal names instead ? */ static void Heuristic_Span_destroy(Heuristic_Span *heuristic_span){ if(heuristic_span->curr_src_region) Region_destroy(heuristic_span->curr_src_region); if(heuristic_span->curr_dst_region) Region_destroy(heuristic_span->curr_dst_region); Heuristic_Range_destroy(heuristic_span->src_range); Heuristic_Range_destroy(heuristic_span->dst_range); C4_DerivedModel_destroy(heuristic_span->src_model); C4_DerivedModel_destroy(heuristic_span->src_traceback_model); C4_DerivedModel_destroy(heuristic_span->dst_model); Heuristic_Bound_destroy(heuristic_span->src_bound); Heuristic_Bound_destroy(heuristic_span->dst_bound); Optimal_destroy(heuristic_span->src_optimal); Optimal_destroy(heuristic_span->src_traceback_optimal); Optimal_destroy(heuristic_span->dst_optimal); g_free(heuristic_span->src_integration_matrix); g_free(heuristic_span->dst_integration_matrix); g_free(heuristic_span->dummy_cell); g_free(heuristic_span); return; } static void Heuristic_Span_clear(Heuristic_Span *heuristic_span, gint query_length, gint target_length){ register gint i, j; g_assert(query_length <= heuristic_span->src_bound->query_range); g_assert(target_length <= heuristic_span->src_bound->target_range); for(i = 0; i <= heuristic_span->src_bound->query_range; i++) for(j = 0; j <= heuristic_span->src_bound->target_range; j++) heuristic_span->src_integration_matrix[i][j][0] = C4_IMPOSSIBLY_LOW_SCORE; return; } void Heuristic_Span_register(Heuristic_Span *heuristic_span, Region *src_region, Region *dst_region){ g_assert(heuristic_span); g_assert(src_region); g_assert(dst_region); if(heuristic_span->curr_src_region) Region_destroy(heuristic_span->curr_src_region); if(heuristic_span->curr_dst_region) Region_destroy(heuristic_span->curr_dst_region); heuristic_span->curr_src_region = Region_share(src_region); heuristic_span->curr_dst_region = Region_share(dst_region); Heuristic_Span_clear(heuristic_span, src_region->query_length, src_region->target_length); return; } /* Algorithm: * for each dst matrix pos * set dst matrix -INF * for each valid src matrix pos * challenge with score */ void Heuristic_Span_integrate(Heuristic_Span *heuristic_span, Region *src_region, Region *dst_region){ register gint i, j, x, y; register gint init_query, finish_query, init_target, finish_target; register gint prev_init_query = -1, prev_finish_query = -1, prev_init_target = -1, prev_finish_target = -1; register C4_Score candidate_score, top_score = 0; register gint top_query_pos = -1, top_target_pos = -1; /**/ g_assert(heuristic_span); g_assert(src_region); g_assert(dst_region); for(i = 0; i <= dst_region->query_length; i++){ for(j = 0; j <= dst_region->target_length; j++){ heuristic_span->dst_integration_matrix[i][j].query_pos = -1; heuristic_span->dst_integration_matrix[i][j].target_pos = -1; init_query = MAX(src_region->query_start, (dst_region->query_start+i - heuristic_span->span->max_query)); init_target = MAX(src_region->target_start, (dst_region->target_start+j - heuristic_span->span->max_target)); finish_query = MIN((src_region->query_start +src_region->query_length), (dst_region->query_start+i - heuristic_span->span->min_query)); finish_target = MIN((src_region->target_start +src_region->target_length), (dst_region->target_start+j - heuristic_span->span->min_target)); if((init_query != prev_init_query) || (init_target != prev_init_target) || (finish_query != prev_finish_query) || (finish_target != prev_finish_target)){ top_score = C4_IMPOSSIBLY_LOW_SCORE; top_query_pos = -1; top_target_pos = -1; for(x = init_query; x <= finish_query; x++){ for(y = init_target; y <= finish_target; y++){ candidate_score = heuristic_span->src_integration_matrix [x-src_region->query_start] [y-src_region->target_start][0] + Heuristic_Span_score(heuristic_span, dst_region->query_start+i-x, dst_region->target_start+j-y); if(top_score < candidate_score){ top_score = candidate_score; top_query_pos = x; top_target_pos = y; } } } } heuristic_span->dst_integration_matrix[i][j].query_pos = top_query_pos; heuristic_span->dst_integration_matrix[i][j].target_pos = top_target_pos; prev_init_query = init_query; prev_init_target = init_target; prev_finish_query = finish_query; prev_finish_target = finish_target; } } #if 0 g_message("SRC int matrix"); for(i = 0; i <= src_region->query_length; i++){ for(j = 0; j <= src_region->target_length; j++){ g_print("[%d],", heuristic_span->src_integration_matrix[i][j][0]); } g_print("\n"); } g_message("DST int matrix"); for(i = 0; i <= dst_region->query_length; i++){ for(j = 0; j <= dst_region->target_length; j++){ g_print("[%d],", Heuristic_Span_dst_init_start_func( dst_region->query_start+i, dst_region->target_start+j, NULL, heuristic_span)[0]); } g_print("\n"); } #endif /* 0 */ return; } /* FIXME: optimisation: could use a faster algorithm for this * with precomputed ranges or something. */ static gint Heuristic_Span_get_max_query_range( Heuristic_Span *heuristic_span){ return heuristic_span->src_range->external_query + heuristic_span->dst_range->external_query + heuristic_span->span->max_query; } static gint Heuristic_Span_get_max_target_range( Heuristic_Span *heuristic_span){ return heuristic_span->src_range->external_target + heuristic_span->dst_range->external_target + heuristic_span->span->max_target; } /**/ static Heuristic_Pair *Heuristic_Pair_create(C4_Model *model, Heuristic_Match *src, Heuristic_Match *dst, Heuristic_ArgumentSet *has){ register Heuristic_Pair *pair; register gint i; register C4_State *src_state = src->transition->output, *dst_state = dst->transition->output; register C4_Span *span; register Heuristic_Span *heuristic_span; if(!C4_Model_path_is_possible(model, src->transition->output, dst->transition->output)) return NULL; pair = g_new(Heuristic_Pair, 1); pair->src = src; pair->dst = dst; pair->join = Heuristic_Join_create(model, src, dst, has); pair->span_list = g_ptr_array_new(); for(i = 0; i < model->span_list->len; i++){ span = model->span_list->pdata[i]; if(C4_Model_path_is_possible(model, src_state, span->span_state) && C4_Model_path_is_possible(model, span->span_state, dst_state)){ heuristic_span = Heuristic_Span_create(model, src_state, dst_state, src->portal, dst->portal, span, has); g_ptr_array_add(pair->span_list, heuristic_span); } } return pair; } static void Heuristic_Pair_destroy( Heuristic_Pair *match_pair){ register gint i; Heuristic_Join_destroy(match_pair->join); for(i = 0; i < match_pair->span_list->len; i++) Heuristic_Span_destroy(match_pair->span_list->pdata[i]); g_ptr_array_free(match_pair->span_list, TRUE); g_free(match_pair); return; } void Heuristic_Pair_get_max_range(Heuristic_Pair *pair, gint *max_query_join_range, gint *max_target_join_range){ register gint i, range; register Heuristic_Span *heuristic_span; g_assert(pair); g_assert(pair->join); (*max_query_join_range) = pair->join->src_range->external_query + pair->join->dst_range->external_query; (*max_target_join_range) = pair->join->src_range->external_target + pair->join->dst_range->external_target; for(i = 0; i < pair->span_list->len; i++){ heuristic_span = pair->span_list->pdata[i]; range = Heuristic_Span_get_max_query_range(heuristic_span); if((*max_query_join_range) < range); (*max_query_join_range) = range; range = Heuristic_Span_get_max_target_range(heuristic_span); if((*max_target_join_range) < range); (*max_target_join_range) = range; } return; } /**/ Heuristic *Heuristic_create(C4_Model *model){ register Heuristic *heuristic = g_new(Heuristic, 1); register gint i, j, counter; register C4_Transition *transition; register C4_Portal *portal; register Heuristic_Match *src, *dst; g_assert(model); g_assert(!model->is_open); g_assert(model->portal_list->len); /* Must have a portal */ heuristic->name = g_strdup_printf("heuristic:%s", model->name); heuristic->ref_count = 1; heuristic->model = C4_Model_share(model); heuristic->has = Heuristic_ArgumentSet_create(NULL); #if 0 if(heuristic->has->refinement != Heuristic_Refinement_NONE){ heuristic->optimal = Optimal_create(model, NULL, Optimal_Type_SCORE |Optimal_Type_PATH |Optimal_Type_REDUCED_SPACE, TRUE); } else { heuristic->optimal = NULL; } #endif /* 0 */ /**/ heuristic->match_total = 0; for(i = 0; i < model->portal_list->len; i++){ portal = model->portal_list->pdata[i]; heuristic->match_total += portal->transition_list->len; } g_assert(heuristic->match_total); /* Must have some matches */ /* Prepare match_list */ heuristic->match_list = g_new(Heuristic_Match*, heuristic->match_total); for(i = counter = 0; i < model->portal_list->len; i++){ portal = model->portal_list->pdata[i]; for(j = 0; j < portal->transition_list->len; j++){ transition = portal->transition_list->pdata[j]; heuristic->match_list[counter] = Heuristic_Match_create(model, portal, transition, counter, heuristic->has); counter++; } } g_assert(counter == heuristic->match_total); /* Prepare pair_matrix */ heuristic->pair_matrix = (Heuristic_Pair***) Matrix2d_create(heuristic->match_total, heuristic->match_total, sizeof(Heuristic_Pair*)); for(i = 0; i < heuristic->match_total; i++){ src = heuristic->match_list[i]; for(j = 0; j < heuristic->match_total; j++){ dst = heuristic->match_list[j]; heuristic->pair_matrix[i][j] = Heuristic_Pair_create(model, src, dst, heuristic->has); } } return heuristic; } void Heuristic_destroy(Heuristic *heuristic){ register gint i, j; g_assert(heuristic); if(--heuristic->ref_count) return; for(i = 0; i < heuristic->match_total; i++) Heuristic_Match_destroy(heuristic->match_list[i]); for(i = 0; i < heuristic->match_total; i++) for(j = 0; j < heuristic->match_total; j++) if(heuristic->pair_matrix[i][j]) Heuristic_Pair_destroy( heuristic->pair_matrix[i][j]); #if 0 if(heuristic->optimal) Optimal_destroy(heuristic->optimal); #endif /* 0 */ C4_Model_destroy(heuristic->model); g_free(heuristic->match_list); g_free(heuristic->pair_matrix); g_free(heuristic->name); g_free(heuristic); return; } Heuristic *Heuristic_share(Heuristic *heuristic){ g_assert(heuristic); heuristic->ref_count++; return heuristic; } /**/ GPtrArray *Heuristic_get_Optimal_list(Heuristic *heuristic){ register gint i, j, k; register Heuristic_Match *match; register Heuristic_Terminal *heuristic_terminal; register Heuristic_Pair *match_pair; register Heuristic_Join *heuristic_join; register Heuristic_Span *heuristic_span; register GPtrArray *optimal_list = g_ptr_array_new(); g_assert(heuristic); #if 0 /* Refinement optimal is not included here, * as will be included for exhaustive alignment anyway */ #endif /* 0 */ for(i = 0; i < heuristic->match_total; i++){ match = heuristic->match_list[i]; heuristic_terminal = match->start_terminal; g_ptr_array_add(optimal_list, heuristic_terminal->optimal); heuristic_terminal = match->end_terminal; g_ptr_array_add(optimal_list, heuristic_terminal->optimal); } for(i = 0; i < heuristic->match_total; i++){ for(j = 0; j < heuristic->match_total; j++){ if(heuristic->pair_matrix[i][j]){ match_pair = heuristic->pair_matrix[i][j]; heuristic_join = match_pair->join; g_ptr_array_add(optimal_list, heuristic_join->optimal); for(k = 0; k < match_pair->span_list->len; k++){ heuristic_span = match_pair->span_list->pdata[k]; g_ptr_array_add(optimal_list, heuristic_span->src_optimal); g_ptr_array_add(optimal_list, heuristic_span->src_traceback_optimal); g_ptr_array_add(optimal_list, heuristic_span->dst_optimal); } } } } return optimal_list; } /**/ exonerate-2.4.0/src/bsdp/bsdp.test.c0000644000175000017500000000213111162714262014220 00000000000000/****************************************************************\ * * * BSDP : Bounded Sparse Dynamic Programming * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "bsdp.h" #include "c4.h" int Argument_main(Argument *arg){ g_warning("Test not implemented [%s]", __FILE__); return 0; } /* FIXME: implement test */ exonerate-2.4.0/src/bsdp/bsdp.c0000644000175000017500000007576511162714262013271 00000000000000/****************************************************************\ * * * BSDP : Bounded Sparse Dynamic Programming * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "bsdp.h" /**/ BSDP_ArgumentSet *BSDP_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static BSDP_ArgumentSet bas = {0}; if(arg){ as = ArgumentSet_create("BSDP algorithm options"); ArgumentSet_add_option(as, '\0', "joinfilter", NULL, "BSDP join filter threshold", "0", Argument_parse_int, &bas.join_filter); Argument_absorb_ArgumentSet(arg, as); } return &bas; } /**/ static BSDP_Edge *BSDP_Edge_create(BSDP *bsdp, gpointer edge_data, BSDP_Node *dst, C4_Score bound_score){ register BSDP_Edge *bsdp_edge = RecycleBin_alloc(bsdp->edge_recycle); g_assert(dst); bsdp_edge->edge_data = edge_data; bsdp_edge->dst = dst; bsdp_edge->join_score = bound_score; bsdp_edge->stored_partial = 0; bsdp_edge->mailbox = -1; return bsdp_edge; } static void BSDP_Edge_destroy(BSDP *bsdp, BSDP_Edge *bsdp_edge){ if(bsdp->destroy_edge_data_func) bsdp->destroy_edge_data_func(bsdp_edge->edge_data); RecycleBin_recycle(bsdp->edge_recycle, bsdp_edge); return; } /**/ static BSDP_Node *BSDP_Node_create(gpointer node_data, C4_Score node_score, gboolean is_valid_start, gboolean is_valid_end, C4_Score start_bound, C4_Score end_bound){ register BSDP_Node *bsdp_node = g_new(BSDP_Node, 1); bsdp_node->mask = BSDP_Node_Mask_IS_NEW; if(is_valid_start){ bsdp_node->mask |= BSDP_Node_Mask_IS_VALID_START; bsdp_node->start_score = start_bound; } if(is_valid_end){ bsdp_node->mask |= BSDP_Node_Mask_IS_VALID_END; bsdp_node->end_score = end_bound; } bsdp_node->node_data = node_data; bsdp_node->node_score = node_score; bsdp_node->stored_total = node_score; /* BSDP_Edge list is only created when required */ bsdp_node->edge.list = NULL; bsdp_node->start_mailbox = -1; bsdp_node->end_mailbox = -1; return bsdp_node; } static void BSDP_Node_destroy_pqueue_func(gpointer data, gpointer user_data){ register BSDP *bsdp = user_data; register BSDP_Edge *bsdp_edge = (BSDP_Edge*)data; BSDP_Edge_destroy(bsdp, bsdp_edge); return; } static void BSDP_Node_destroy(BSDP *bsdp, BSDP_Node *bsdp_node){ register gint i; if(bsdp_node->mask & BSDP_Node_Mask_IS_USED){ if(bsdp_node->edge.used) /* Can be NULL if alignment end */ BSDP_Edge_destroy(bsdp, bsdp_node->edge.used); } else if(bsdp_node->mask & BSDP_Node_Mask_IS_INITIALISED){ PQueue_destroy(bsdp_node->edge.pqueue, BSDP_Node_destroy_pqueue_func, bsdp); } else if(bsdp_node->mask & BSDP_Node_Mask_IS_NEW){ if(bsdp_node->edge.list){ for(i = 0; i < bsdp_node->edge.list->len; i++) BSDP_Edge_destroy(bsdp, bsdp_node->edge.list->pdata[i]); g_ptr_array_free(bsdp_node->edge.list, TRUE); } } if(bsdp->destroy_node_data_func) bsdp->destroy_node_data_func(bsdp_node->node_data); g_free(bsdp_node); return; } /**/ static gboolean BSDP_Node_pqueue_compare(gpointer low, gpointer high, gpointer user_data){ register BSDP_Node *low_node = low, *high_node = high; return low_node->stored_total > high_node->stored_total; } BSDP *BSDP_create(BSDP_ConfirmEdgeFunc confirm_edge_func, BSDP_ConfirmTerminalFunc confirm_start_func, BSDP_ConfirmTerminalFunc confirm_end_func, BSDP_UpdateEdgeFunc update_edge_func, BSDP_UpdateTerminalFunc update_start_func, BSDP_UpdateTerminalFunc update_end_func, BSDP_DestroyFunc destroy_node_data_func, BSDP_DestroyFunc destroy_edge_data_func, gpointer user_data){ register BSDP *bsdp = g_new(BSDP, 1); g_assert(confirm_edge_func); g_assert(confirm_start_func); g_assert(confirm_end_func); g_assert(update_edge_func); g_assert(update_start_func); g_assert(update_end_func); bsdp->bas = BSDP_ArgumentSet_create(NULL); bsdp->confirm_edge_func = confirm_edge_func; bsdp->confirm_start_func = confirm_start_func; bsdp->confirm_end_func = confirm_end_func; bsdp->update_edge_func = update_edge_func; bsdp->update_start_func = update_start_func; bsdp->update_end_func = update_end_func; bsdp->destroy_node_data_func = destroy_node_data_func; bsdp->destroy_edge_data_func = destroy_edge_data_func; bsdp->user_data = user_data; bsdp->edge_recycle = RecycleBin_create("BSDP_Edge", sizeof(BSDP_Edge), 1024); bsdp->node_list = NULL; bsdp->node_pqueue = NULL; bsdp->path_count = 0; bsdp->pqueue_set = PQueueSet_create(); bsdp->edge_filter = NULL; if(bsdp->bas->join_filter){ bsdp->potential_recycle = RecycleBin_create( "BSDP_Potential", sizeof(BSDP_Potential), 1024); } else { bsdp->potential_recycle = NULL; } return bsdp; } void BSDP_destroy(BSDP *bsdp){ register gint i; if(bsdp->node_list) for(i = 0; i < bsdp->node_list->len; i++) BSDP_Node_destroy(bsdp, bsdp->node_list->pdata[i]); if(bsdp->node_list) g_ptr_array_free(bsdp->node_list, TRUE); RecycleBin_destroy(bsdp->edge_recycle); /* Don't need to free the bsdp->pqueue, as we free the set */ PQueueSet_destroy(bsdp->pqueue_set); if(bsdp->edge_filter) g_free(bsdp->edge_filter); if(bsdp->potential_recycle) RecycleBin_destroy(bsdp->potential_recycle); g_free(bsdp); return; } /**/ gint BSDP_add_node(BSDP *bsdp, gpointer node_data, C4_Score node_score, gboolean is_valid_start, gboolean is_valid_end, C4_Score start_bound, C4_Score end_bound){ register BSDP_Node *bsdp_node; g_assert(node_data); bsdp_node = BSDP_Node_create(node_data, node_score, is_valid_start, is_valid_end, start_bound, end_bound); if(!bsdp->node_list) bsdp->node_list = g_ptr_array_new(); g_ptr_array_add(bsdp->node_list, bsdp_node); return bsdp->node_list->len - 1; } static gboolean BSDP_Potential_pqueue_compare(gpointer low, gpointer high, gpointer user_data){ register BSDP_Potential *low_potential= low, *high_potential = high; return low_potential->score > high_potential->score; } static BSDP_Edge_Filter *BSDP_Edge_Filter_create(BSDP *bsdp){ register BSDP_Edge_Filter *edge_filter = g_new(BSDP_Edge_Filter, 1); edge_filter->src_edge_pqueue = PQueue_create(bsdp->pqueue_set, BSDP_Potential_pqueue_compare, NULL); edge_filter->dst_edge_pqueue = PQueue_create(bsdp->pqueue_set, BSDP_Potential_pqueue_compare, NULL); return edge_filter; } static void BSDP_Edge_Filter_destroy(BSDP_Edge_Filter *edge_filter){ PQueue_destroy(edge_filter->src_edge_pqueue, NULL, NULL); PQueue_destroy(edge_filter->dst_edge_pqueue, NULL, NULL); g_free(edge_filter); return; } /**/ static BSDP_Potential *BSDP_Potential_create(BSDP *bsdp, BSDP_Node *bsdp_node, BSDP_Edge *bsdp_edge){ register BSDP_Potential *bsdp_potential = RecycleBin_alloc(bsdp->potential_recycle); g_assert(bsdp); g_assert(bsdp_node); g_assert(bsdp_edge); bsdp_potential->ref_count = 1; bsdp_potential->score = bsdp_node->start_score + bsdp_node->node_score + bsdp_edge->join_score + bsdp_edge->dst->node_score + bsdp_edge->dst->end_score; bsdp_potential->bsdp_edge = bsdp_edge; bsdp_potential->src = bsdp_node; return bsdp_potential; } static BSDP_Potential *BSDP_Potential_share( BSDP_Potential *bsdp_potential){ g_assert(bsdp_potential); bsdp_potential->ref_count++; return bsdp_potential; } static void BSDP_Potential_destroy(BSDP *bsdp, BSDP_Potential *bsdp_potential){ g_assert(bsdp_potential); if(--bsdp_potential->ref_count) return; BSDP_Edge_destroy(bsdp, bsdp_potential->bsdp_edge); RecycleBin_recycle(bsdp->potential_recycle, bsdp_potential); return; } static BSDP_Edge *BSDP_Potential_release(BSDP *bsdp, BSDP_Potential *bsdp_potential){ register BSDP_Edge *bsdp_edge; g_assert(bsdp_potential); bsdp_edge = bsdp_potential->bsdp_edge; RecycleBin_recycle(bsdp->potential_recycle, bsdp_potential); return bsdp_edge; } static void BSDP_Potential_submit_queue(BSDP *bsdp, BSDP_Potential *bsdp_potential, PQueue *pqueue){ register BSDP_Potential *top, *prev; /* Admit extra edge for tie-breaker removal */ if(PQueue_total(pqueue) <= bsdp->bas->join_filter){ PQueue_push(pqueue, bsdp_potential); } else { top = PQueue_top(pqueue); if(top->score < bsdp_potential->score){ prev = PQueue_pop(pqueue); BSDP_Potential_destroy(bsdp, prev); PQueue_push(pqueue, bsdp_potential); } else { BSDP_Potential_destroy(bsdp, bsdp_potential); } } return; } /**/ static void BSDP_Edge_submit(BSDP *bsdp, BSDP_Edge *bsdp_edge, gint src_node_id, gint dst_node_id){ register BSDP_Edge_Filter *src_filter, *dst_filter; register BSDP_Node *src = bsdp->node_list->pdata[src_node_id]; register BSDP_Potential *bsdp_potential = BSDP_Potential_create(bsdp, src, bsdp_edge); g_assert(bsdp); g_assert(bsdp_edge); /**/ src_filter = bsdp->edge_filter[src_node_id]; if(!src_filter){ bsdp->edge_filter[src_node_id] = BSDP_Edge_Filter_create(bsdp); src_filter = bsdp->edge_filter[src_node_id]; } g_assert(src_filter); /**/ dst_filter = bsdp->edge_filter[dst_node_id]; if(!dst_filter){ bsdp->edge_filter[dst_node_id] = BSDP_Edge_Filter_create(bsdp); dst_filter = bsdp->edge_filter[dst_node_id]; } g_assert(dst_filter); /**/ BSDP_Potential_submit_queue(bsdp, BSDP_Potential_share(bsdp_potential), src_filter->src_edge_pqueue); BSDP_Potential_submit_queue(bsdp, bsdp_potential, dst_filter->dst_edge_pqueue); return; } void BSDP_add_edge(BSDP *bsdp, gpointer edge_data, gint src_node_id, gint dst_node_id, C4_Score bound_score){ register BSDP_Edge *bsdp_edge; register BSDP_Node *src, *dst; g_assert(bsdp); g_assert(edge_data); g_assert(src_node_id >= 0); g_assert(src_node_id < bsdp->node_list->len); g_assert(dst_node_id >= 0); g_assert(dst_node_id < bsdp->node_list->len); src = bsdp->node_list->pdata[src_node_id]; dst = bsdp->node_list->pdata[dst_node_id]; g_assert(src); g_assert(dst); g_assert(src->mask & BSDP_Node_Mask_IS_NEW); g_assert(dst->mask & BSDP_Node_Mask_IS_NEW); bsdp_edge = BSDP_Edge_create(bsdp, edge_data, dst, bound_score); if(bsdp->bas->join_filter){ /* Store in PQueue for src and dst */ if(!bsdp->edge_filter) bsdp->edge_filter = g_new0(BSDP_Edge_Filter*, bsdp->node_list->len); BSDP_Edge_submit(bsdp, bsdp_edge, src_node_id, dst_node_id); } else { /* Just add to edge list */ if(!src->edge.list) src->edge.list = g_ptr_array_new(); g_ptr_array_add(src->edge.list, bsdp_edge); } return; } /**/ static void BSDP_update(BSDP *bsdp, BSDP_Node *bsdp_node, BSDP_Edge *bsdp_edge, gboolean update); static C4_Score BSDP_Node_top_partial(BSDP *bsdp, BSDP_Node *bsdp_node, gboolean update){ register BSDP_Edge *bsdp_edge; register C4_Score score = C4_IMPOSSIBLY_LOW_SCORE; g_assert(bsdp_node); g_assert(bsdp_node->mask & BSDP_Node_Mask_IS_INITIALISED); g_assert(!(bsdp_node->mask & BSDP_Node_Mask_IS_USED)); bsdp_node->mask &= (~BSDP_Node_Mask_SCORED_TERMINAL); /* Score as end if possible */ if(bsdp_node->mask & BSDP_Node_Mask_IS_VALID_END){ score = bsdp_node->node_score + bsdp_node->end_score; bsdp_node->mask |= BSDP_Node_Mask_SCORED_TERMINAL; } /* Get best edge */ do { bsdp_edge = PQueue_top(bsdp_node->edge.pqueue); if(!bsdp_edge) break; if(bsdp_edge->dst->mask & BSDP_Node_Mask_IS_USED){ bsdp_edge = PQueue_pop(bsdp_node->edge.pqueue); g_assert(bsdp_edge); BSDP_Edge_destroy(bsdp, bsdp_edge); } else { break; } } while(TRUE); /* If have edge, score as edge if possible */ if(bsdp_edge){ g_assert(!(bsdp_edge->dst->mask & BSDP_Node_Mask_IS_USED)); if(update){ do { bsdp_edge = PQueue_pop(bsdp_node->edge.pqueue); if(!bsdp_edge) break; if(bsdp_edge->dst->mask & BSDP_Node_Mask_IS_USED){ BSDP_Edge_destroy(bsdp, bsdp_edge); continue; } BSDP_update(bsdp, bsdp_node, bsdp_edge, TRUE); } while(bsdp_edge != PQueue_top(bsdp_node->edge.pqueue)); } if(bsdp_edge && (score < bsdp_edge->stored_partial)){ bsdp_node->mask &= (~BSDP_Node_Mask_SCORED_TERMINAL); score = bsdp_edge->stored_partial; } } return score; } static C4_Score BSDP_Node_stored_total(BSDP *bsdp, BSDP_Node *bsdp_node, gboolean update){ if(!(bsdp_node->mask & BSDP_Node_Mask_IS_VALID_START)) return C4_IMPOSSIBLY_LOW_SCORE; return bsdp_node->start_score + BSDP_Node_top_partial(bsdp, bsdp_node, update); } static void BSDP_update(BSDP *bsdp, BSDP_Node *bsdp_node, BSDP_Edge *bsdp_edge, gboolean update){ bsdp_edge->stored_partial = bsdp_node->node_score + bsdp_edge->join_score + BSDP_Node_top_partial(bsdp, bsdp_edge->dst, update); g_assert(bsdp_node->mask & BSDP_Node_Mask_IS_INITIALISED); g_assert(!(bsdp_node->mask & BSDP_Node_Mask_IS_USED)); PQueue_push(bsdp_node->edge.pqueue, bsdp_edge); return; } static gboolean BSDP_Edge_pqueue_compare(gpointer low, gpointer high, gpointer user_data){ register BSDP_Edge *low_edge = low, *high_edge = high; return low_edge->stored_partial > high_edge->stored_partial; } static void BSDP_initialise_recur(BSDP *bsdp, BSDP_Node *bsdp_node){ register gint i; register BSDP_Edge *bsdp_edge; register GPtrArray *edge_list; if(bsdp_node->mask & BSDP_Node_Mask_IS_INITIALISED) return; edge_list = bsdp_node->edge.list; bsdp_node->edge.pqueue = PQueue_create(bsdp->pqueue_set, BSDP_Edge_pqueue_compare, NULL); bsdp_node->mask &= (~BSDP_Node_Mask_IS_NEW); bsdp_node->mask |= BSDP_Node_Mask_IS_INITIALISED; if(edge_list){ for(i = 0; i < edge_list->len; i++){ bsdp_edge = edge_list->pdata[i]; BSDP_initialise_recur(bsdp, bsdp_edge->dst); BSDP_update(bsdp, bsdp_node, bsdp_edge, FALSE); } g_ptr_array_free(edge_list, TRUE); } return; } static void BSDP_initialise_remove_tiebreakers(BSDP *bsdp, PQueue *pqueue){ register BSDP_Potential *bsdp_potential; register C4_Score score; g_assert(pqueue); if(PQueue_total(pqueue) > bsdp->bas->join_filter){ bsdp_potential = PQueue_pop(pqueue); score = bsdp_potential->score; BSDP_Potential_destroy(bsdp, bsdp_potential); while(PQueue_total(pqueue)){ bsdp_potential = PQueue_top(pqueue); g_assert(bsdp_potential); if(score != bsdp_potential->score) break; bsdp_potential = PQueue_pop(pqueue); BSDP_Potential_destroy(bsdp, bsdp_potential); } } return; } static void BSDP_initialise_filter(BSDP *bsdp, PQueue *pqueue){ register BSDP_Potential *bsdp_potential; register BSDP_Node *bsdp_node; g_assert(pqueue); while((bsdp_potential = PQueue_pop(pqueue))){ if(bsdp_potential->ref_count == 2){ /* In src + dst */ bsdp_node = bsdp_potential->src; if(!bsdp_node->edge.list) bsdp_node->edge.list = g_ptr_array_new(); g_ptr_array_add(bsdp_node->edge.list, bsdp_potential->bsdp_edge); bsdp_potential->ref_count = 0; } else { if(bsdp_potential->ref_count) BSDP_Potential_destroy(bsdp, bsdp_potential); else BSDP_Potential_release(bsdp, bsdp_potential); } } return; } void BSDP_initialise(BSDP *bsdp, C4_Score threshold){ register gint i; register BSDP_Node *bsdp_node; register BSDP_Edge_Filter *edge_filter; if(!bsdp->node_list) return; if(bsdp->edge_filter){ for(i = 0; i < bsdp->node_list->len; i++){ edge_filter = bsdp->edge_filter[i]; if(edge_filter) BSDP_initialise_remove_tiebreakers(bsdp, edge_filter->src_edge_pqueue); } for(i = 0; i < bsdp->node_list->len; i++){ edge_filter = bsdp->edge_filter[i]; if(edge_filter){ BSDP_initialise_filter(bsdp, edge_filter->src_edge_pqueue); BSDP_initialise_filter(bsdp, edge_filter->dst_edge_pqueue); BSDP_Edge_Filter_destroy(edge_filter); } } g_free(bsdp->edge_filter); bsdp->edge_filter = NULL; RecycleBin_destroy(bsdp->potential_recycle); bsdp->potential_recycle = NULL; } for(i = 0; i < bsdp->node_list->len; i++){ /* Initialise scores */ bsdp_node = bsdp->node_list->pdata[i]; BSDP_initialise_recur(bsdp, bsdp_node); bsdp_node->stored_total = BSDP_Node_stored_total(bsdp, bsdp_node, FALSE); if(bsdp_node->stored_total >= threshold){ if(!bsdp->node_pqueue) bsdp->node_pqueue = PQueue_create(bsdp->pqueue_set, BSDP_Node_pqueue_compare, NULL); PQueue_push(bsdp->node_pqueue, bsdp_node); } } return; } /* Score calculations: * ------------------ * * Top_partial(node) = Max(node_score + node_end_score, * Top(node->edge.pqueue) ); * * Edge: stored_partial = src_node_score * + edge_score * + Top_Partial( dst_node ) * * Node: stored_total = start_score * + Top_Partial( node ) */ /**/ static void BSDP_path_validate_recur(BSDP *bsdp, BSDP_Node *bsdp_node){ register BSDP_Edge *bsdp_edge; g_assert(bsdp_node->mask & BSDP_Node_Mask_IS_INITIALISED); g_assert(!(bsdp_node->mask & BSDP_Node_Mask_IS_USED)); if(bsdp_node->mask & BSDP_Node_Mask_SCORED_TERMINAL) return; do { bsdp_edge = PQueue_pop(bsdp_node->edge.pqueue); if(bsdp_edge){ if(bsdp_edge->dst->mask & BSDP_Node_Mask_IS_USED){ BSDP_Edge_destroy(bsdp, bsdp_edge); continue; } BSDP_path_validate_recur(bsdp, bsdp_edge->dst); BSDP_update(bsdp, bsdp_node, bsdp_edge, FALSE); } } while(PQueue_top(bsdp_node->edge.pqueue) != bsdp_edge); return; } static gboolean BSDP_path_is_validated(BSDP *bsdp, C4_Score threshold){ register BSDP_Node *bsdp_node = PQueue_top(bsdp->node_pqueue); register BSDP_Edge *bsdp_edge; register C4_Score score, check_score; g_assert(bsdp_node); g_assert(bsdp_node->mask & BSDP_Node_Mask_IS_VALID_START); score = bsdp_node->stored_total; check_score = bsdp_node->start_score; do { g_assert(bsdp_node); g_assert(bsdp_node->mask & BSDP_Node_Mask_IS_INITIALISED); g_assert(!(bsdp_node->mask & BSDP_Node_Mask_IS_USED)); check_score += bsdp_node->node_score; if((bsdp_node->mask & BSDP_Node_Mask_SCORED_TERMINAL)){ check_score += bsdp_node->end_score; break; } bsdp_edge = PQueue_top(bsdp_node->edge.pqueue); g_assert(bsdp_edge); check_score += bsdp_edge->join_score; bsdp_node = bsdp_edge->dst; } while(bsdp_node); g_assert(check_score == score); g_assert(score >= threshold); return TRUE; } static gboolean BSDP_path_validate(BSDP *bsdp, C4_Score threshold){ register BSDP_Node *bsdp_node; if(bsdp->node_pqueue){ do { bsdp_node = PQueue_pop(bsdp->node_pqueue); if(!bsdp_node) return FALSE; if(bsdp_node->mask & BSDP_Node_Mask_IS_USED) continue; BSDP_path_validate_recur(bsdp, bsdp_node); bsdp_node->stored_total = BSDP_Node_stored_total(bsdp, bsdp_node, TRUE); if(bsdp_node->stored_total >= threshold){ PQueue_push(bsdp->node_pqueue, bsdp_node); } else { continue; } } while(PQueue_top(bsdp->node_pqueue) != bsdp_node); g_assert(BSDP_path_is_validated(bsdp, threshold)); return TRUE; } return FALSE; } static gint BSDP_path_confirm(BSDP *bsdp){ register BSDP_Node *first_node = PQueue_top(bsdp->node_pqueue), *bsdp_node = first_node; register BSDP_Edge *bsdp_edge; register C4_Score prev_score, confirmed_score; register gint confirm_count = 0; do { g_assert(bsdp_node->node_data); g_assert(bsdp_node->mask & BSDP_Node_Mask_IS_INITIALISED); g_assert(!(bsdp_node->mask & BSDP_Node_Mask_IS_NEW)); if(bsdp_node->mask & BSDP_Node_Mask_SCORED_TERMINAL) break; bsdp_edge = PQueue_top(bsdp_node->edge.pqueue); if(!bsdp_edge) break; if(bsdp_edge->mailbox == -1){ /* Not confirmed */ bsdp_edge->mailbox = bsdp->path_count; confirmed_score = bsdp->confirm_edge_func( bsdp_node->node_data, bsdp_edge->edge_data, bsdp_edge->dst->node_data, bsdp->user_data); g_assert(bsdp_edge->join_score >= confirmed_score); if(bsdp_edge->join_score != confirmed_score){ bsdp_edge->join_score = confirmed_score; confirm_count++; } } else { /* Already confirmed */ /* Check for subopt edge clashes if out of date */ if(bsdp_edge->mailbox != bsdp->path_count){ prev_score = bsdp_edge->join_score; bsdp_edge->join_score = bsdp->update_edge_func( bsdp_node->node_data, bsdp_edge->edge_data, bsdp_edge->dst->node_data, bsdp->user_data, prev_score, bsdp_edge->mailbox); g_assert(bsdp_edge->join_score <= prev_score); bsdp_edge->mailbox = bsdp->path_count; if(bsdp_edge->join_score != prev_score) confirm_count++; } } bsdp_node = bsdp_edge->dst; } while(bsdp_node); /* Confirm the start */ g_assert(bsdp_node); if(first_node->mask & BSDP_Node_Mask_CONFIRMED_START){ /* Check for subopt start clashes */ if(first_node->start_mailbox != bsdp->path_count){ prev_score = first_node->start_score; first_node->start_score = bsdp->update_start_func( first_node->node_data, bsdp->user_data, prev_score, first_node->start_mailbox); g_assert(first_node->start_score <= prev_score); first_node->start_mailbox = bsdp->path_count; if(first_node->start_score != prev_score) confirm_count++; } } else { first_node->start_mailbox = bsdp->path_count; confirmed_score = bsdp->confirm_start_func( first_node->node_data, bsdp->user_data); first_node->mask |= BSDP_Node_Mask_CONFIRMED_START; g_assert(first_node->start_score >= confirmed_score); if(first_node->start_score != confirmed_score){ first_node->start_score = confirmed_score; confirm_count++; } } /* Chose the end */ if(bsdp_node->mask & BSDP_Node_Mask_CONFIRMED_END){ /* Check for subopt end clashes */ if(bsdp_node->end_mailbox != bsdp->path_count){ prev_score = bsdp_node->end_score; bsdp_node->end_score = bsdp->update_end_func( bsdp_node->node_data, bsdp->user_data, prev_score, bsdp_node->end_mailbox); g_assert(bsdp_node->end_score <= prev_score); bsdp_node->end_mailbox = bsdp->path_count; if(bsdp_node->end_score != prev_score) confirm_count++; } } else { bsdp_node->end_mailbox = bsdp->path_count; confirmed_score = bsdp->confirm_end_func( bsdp_node->node_data, bsdp->user_data); bsdp_node->mask |= BSDP_Node_Mask_CONFIRMED_END; g_assert(bsdp_node->end_score >= confirmed_score); if(bsdp_node->end_score != confirmed_score){ bsdp_node->end_score = confirmed_score; confirm_count++; } } return confirm_count; } static BSDP_Path *BSDP_Path_create(void){ register BSDP_Path *bsdp_path = g_new(BSDP_Path, 1); bsdp_path->node_list = g_ptr_array_new(); return bsdp_path; } void BSDP_Path_destroy(BSDP_Path *bsdp_path){ g_ptr_array_free(bsdp_path->node_list, TRUE); g_free(bsdp_path); return; } static gboolean BSDP_Path_check(BSDP_Path *bsdp_path){ register C4_Score check_score = 0; register BSDP_Node *start_node, *end_node, *bsdp_node; register gint i; g_assert(bsdp_path->node_list->len); /* Check start node */ start_node = bsdp_path->node_list->pdata[0]; g_assert(start_node->mask & BSDP_Node_Mask_IS_VALID_START); g_assert(start_node->mask & BSDP_Node_Mask_CONFIRMED_START); g_assert(start_node->mask & BSDP_Node_Mask_USED_AS_START); check_score = start_node->start_score; /* Check end node */ end_node = bsdp_path->node_list->pdata[bsdp_path->node_list->len-1]; g_assert(end_node->mask & BSDP_Node_Mask_IS_VALID_END); g_assert(end_node->mask & BSDP_Node_Mask_CONFIRMED_END); g_assert(end_node->mask & BSDP_Node_Mask_USED_AS_END); check_score += end_node->end_score; /* Check all other nodes */ for(i = 0; i < bsdp_path->node_list->len; i++){ bsdp_node = bsdp_path->node_list->pdata[i]; g_assert(!(bsdp_node->mask & BSDP_Node_Mask_IS_NEW)); g_assert(bsdp_node->mask & BSDP_Node_Mask_IS_INITIALISED); g_assert(bsdp_node->mask & BSDP_Node_Mask_IS_USED); check_score += bsdp_node->node_score; if(bsdp_node != end_node){ check_score += bsdp_node->edge.used->join_score; } } g_assert(check_score == bsdp_path->score); return TRUE; } static BSDP_Path *BSDP_path_extract(BSDP *bsdp){ register BSDP_Path *bsdp_path = BSDP_Path_create(); register BSDP_Node *bsdp_node = PQueue_top(bsdp->node_pqueue); register BSDP_Edge *bsdp_edge; bsdp_path->score = bsdp_node->stored_total; g_assert(bsdp_node->mask & BSDP_Node_Mask_CONFIRMED_START); bsdp_node->mask |= BSDP_Node_Mask_USED_AS_START; do { g_assert(bsdp_node); g_assert(bsdp_node->mask & BSDP_Node_Mask_IS_INITIALISED); g_assert(!(bsdp_node->mask & BSDP_Node_Mask_IS_USED)); g_ptr_array_add(bsdp_path->node_list, bsdp_node); bsdp_node->mask |= BSDP_Node_Mask_IS_USED; bsdp_edge = PQueue_pop(bsdp_node->edge.pqueue); PQueue_destroy(bsdp_node->edge.pqueue, BSDP_Node_destroy_pqueue_func, bsdp); bsdp_node->edge.used = bsdp_edge; if(bsdp_node->mask & BSDP_Node_Mask_SCORED_TERMINAL){ g_assert(bsdp_node->mask & BSDP_Node_Mask_CONFIRMED_END); bsdp_node->mask |= BSDP_Node_Mask_USED_AS_END; break; } else { g_assert(bsdp_edge); } bsdp_node = bsdp_edge->dst; } while(bsdp_node); g_assert(BSDP_Path_check(bsdp_path)); return bsdp_path; } BSDP_Path *BSDP_next_path(BSDP *bsdp, C4_Score threshold){ register BSDP_Path *bsdp_path; do { if(!BSDP_path_validate(bsdp, threshold)) return NULL; } while(BSDP_path_confirm(bsdp)); bsdp_path = BSDP_path_extract(bsdp); bsdp->path_count++; return bsdp_path; } /**/ exonerate-2.4.0/src/bsdp/hpair.test.c0000644000175000017500000000214211162714262014375 00000000000000/****************************************************************\ * * * C4 dynamic programming library - code for heuristic pairs * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "hpair.h" int Argument_main(Argument *arg){ g_warning("Test not implemented [%s]", __FILE__); return 0; } /* FIXME: implement this test */ exonerate-2.4.0/src/bsdp/hpair.c0000644000175000017500000007760211162714262013434 00000000000000/****************************************************************\ * * * C4 dynamic programming library - code for heuristic pairs * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "optimal.h" #include "hpair.h" #include "sar.h" #include "rangetree.h" /**/ typedef struct { Heuristic_Match *match; gint hsp_id; /* The id of the hsp in the portal */ SAR_Terminal *sar_start; SAR_Terminal *sar_end; } HPair_NodeData; #define HPair_NodeData_get_hsp_set(hpair, node_data) \ ((HSPset*) \ ((hpair)->portal_data_list->pdata[(node_data)->match->portal->id])) #define HPair_NodeData_get_hsp(hpair, node_data) \ ((HSP*)(HPair_NodeData_get_hsp_set(hpair, node_data) \ ->hsp_list->pdata[(node_data)->hsp_id])) static HPair_NodeData *HPair_NodeData_create(HPair *hpair, Heuristic_Match *match, gint hsp_id){ register HPair_NodeData *hnd = g_new(HPair_NodeData, 1); register HSP *hsp; hnd->match = match; hnd->hsp_id = hsp_id; hsp = HPair_NodeData_get_hsp(hpair, hnd); hnd->sar_start = SAR_Terminal_create(hsp, hpair, match, TRUE); hnd->sar_end = SAR_Terminal_create(hsp, hpair, match, FALSE); return hnd; } static void HPair_NodeData_destroy(HPair_NodeData *hnd){ if(hnd->sar_start) SAR_Terminal_destroy(hnd->sar_start); if(hnd->sar_end) SAR_Terminal_destroy(hnd->sar_end); g_free(hnd); return; } /**/ typedef struct { SAR_Join *sar_join; /* NULL if span edge */ SAR_Span *sar_span; /* NULL if join edge */ } HPair_EdgeData; static HPair_EdgeData *HPair_EdgeData_create(SAR_Join *sar_join, SAR_Span *sar_span){ register HPair_EdgeData *hed = g_new(HPair_EdgeData, 1); g_assert(!(sar_join && sar_span)); g_assert(sar_join || sar_span); hed->sar_join = sar_join; hed->sar_span = sar_span; return hed; } static void HPair_EdgeData_destroy(HPair_EdgeData *hed){ if(hed->sar_join) SAR_Join_destroy(hed->sar_join); if(hed->sar_span) SAR_Span_destroy(hed->sar_span); g_free(hed); return; } /**/ static gboolean HPair_SubOpt_check_diagonal(gint query_pos, gint target_pos, gint path_id, gpointer user_data){ register HSP *hsp = user_data; register gint diagonal = (target_pos*HSP_query_advance(hsp)) - (query_pos*HSP_target_advance(hsp)); if(diagonal == HSP_diagonal(hsp)) return TRUE; return FALSE; } /* FIXME: optimisation: precompute HSP_diagonal() */ static gboolean HPair_check_entry(HPair *hpair, HSP *hsp, Region *region){ Region search_region; Region_init_static(&search_region, HSP_query_cobs(hsp), HSP_target_cobs(hsp), region->query_start-HSP_query_cobs(hsp), region->target_start-HSP_target_cobs(hsp)); return SubOpt_find(hpair->subopt, &search_region, HPair_SubOpt_check_diagonal, hsp); } /* Returns true if subopt clashes with region -> HSP cobs */ static gboolean HPair_check_exit(HPair *hpair, HSP *hsp, Region *region){ Region search_region; Region_init_static(&search_region, Region_query_end(region), Region_target_end(region), HSP_query_cobs(hsp)-Region_query_end(region), HSP_target_cobs(hsp)-Region_target_end(region)); return SubOpt_find(hpair->subopt, &search_region, HPair_SubOpt_check_diagonal, hsp); } static gboolean HPair_SubOpt_check_region_since(gint query_pos, gint target_pos, gint path_id, gpointer user_data){ register gint last_updated = GPOINTER_TO_INT(user_data); if(path_id >= last_updated) return TRUE; return FALSE; } static gboolean HPair_check_region_since(HPair *hpair, Region *region, gint last_updated){ if(SubOpt_find(hpair->subopt, region, HPair_SubOpt_check_region_since, GINT_TO_POINTER(last_updated))) return TRUE; return FALSE; } /**/ static C4_Score HPair_confirm_edge_func(gpointer src_data, gpointer edge_data, gpointer dst_data, gpointer user_data){ register HPair *hpair = user_data; register HPair_EdgeData *hpair_edge_data = edge_data; register C4_Score score; register HPair_NodeData *src_node_data = src_data, *dst_node_data = dst_data; register HSP *src_hsp = HPair_NodeData_get_hsp(hpair, src_node_data), *dst_hsp = HPair_NodeData_get_hsp(hpair, dst_node_data); if(hpair_edge_data->sar_join){ g_assert(!hpair_edge_data->sar_span); if((HPair_check_entry(hpair, src_hsp, hpair_edge_data->sar_join->region)) || (HPair_check_exit(hpair, dst_hsp, hpair_edge_data->sar_join->region))) return C4_IMPOSSIBLY_LOW_SCORE; score = SAR_Join_find_score(hpair_edge_data->sar_join, hpair); } else { if((HPair_check_entry(hpair, src_hsp, hpair_edge_data->sar_span->src_region)) || (HPair_check_exit(hpair, dst_hsp, hpair_edge_data->sar_span->dst_region))) return C4_IMPOSSIBLY_LOW_SCORE; g_assert(hpair_edge_data->sar_span); score = SAR_Span_find_score(hpair_edge_data->sar_span, hpair); } return score; } static C4_Score HPair_update_edge_func(gpointer src_data, gpointer edge_data, gpointer dst_data, gpointer user_data, C4_Score prev_score, gint last_updated){ register HPair *hpair = user_data; register HPair_EdgeData *hpair_edge_data = edge_data; register HPair_NodeData *src_node_data = src_data, *dst_node_data = dst_data; register HSP *src_hsp = HPair_NodeData_get_hsp(hpair, src_node_data), *dst_hsp = HPair_NodeData_get_hsp(hpair, dst_node_data); if(hpair_edge_data->sar_join){ g_assert(!hpair_edge_data->sar_span); if((HPair_check_entry(hpair, src_hsp, hpair_edge_data->sar_join->region)) || (HPair_check_exit(hpair, dst_hsp, hpair_edge_data->sar_join->region))) return C4_IMPOSSIBLY_LOW_SCORE; if(HPair_check_region_since(hpair, hpair_edge_data->sar_join->region, last_updated)){ return SAR_Join_find_score(hpair_edge_data->sar_join, hpair); } } else { g_assert(hpair_edge_data->sar_span); if((HPair_check_entry(hpair, src_hsp, hpair_edge_data->sar_span->src_region)) || (HPair_check_exit(hpair, dst_hsp, hpair_edge_data->sar_span->dst_region))) return C4_IMPOSSIBLY_LOW_SCORE; if((HPair_check_region_since(hpair, hpair_edge_data->sar_span->src_region, last_updated)) || (HPair_check_region_since(hpair, hpair_edge_data->sar_span->dst_region, last_updated))) return SAR_Span_find_score(hpair_edge_data->sar_span, hpair); } return prev_score; } static C4_Score HPair_confirm_start_func(gpointer node_data, gpointer user_data){ register HPair *hpair = user_data; register HPair_NodeData *hpair_node_data = node_data; register HSP *hsp = HPair_NodeData_get_hsp(hpair, hpair_node_data); g_assert(hpair_node_data->sar_start); if(HPair_check_exit(hpair, hsp, hpair_node_data->sar_start->region)) return C4_IMPOSSIBLY_LOW_SCORE; return SAR_Terminal_find_score(hpair_node_data->sar_start, hpair_node_data->match->start_terminal->optimal, hpair); } static C4_Score HPair_update_start_func(gpointer node_data, gpointer user_data, C4_Score prev_score, gint last_updated){ register HPair *hpair = user_data; register HPair_NodeData *hpair_node_data = node_data; register HSP *hsp = HPair_NodeData_get_hsp(hpair, hpair_node_data); g_assert(hpair_node_data->sar_start); if(HPair_check_exit(hpair, hsp, hpair_node_data->sar_start->region)) return C4_IMPOSSIBLY_LOW_SCORE; if(HPair_check_region_since(hpair, hpair_node_data->sar_start->region, last_updated)) return SAR_Terminal_find_score(hpair_node_data->sar_start, hpair_node_data->match->start_terminal->optimal, hpair); return prev_score; } static C4_Score HPair_confirm_end_func(gpointer node_data, gpointer user_data){ register HPair *hpair = user_data; register HPair_NodeData *hpair_node_data = node_data; register HSP *hsp = HPair_NodeData_get_hsp(hpair, hpair_node_data); if(HPair_check_entry(hpair, hsp, hpair_node_data->sar_end->region)) return C4_IMPOSSIBLY_LOW_SCORE; g_assert(hpair_node_data->sar_end); return SAR_Terminal_find_score(hpair_node_data->sar_end, hpair_node_data->match->end_terminal->optimal, hpair); } static C4_Score HPair_update_end_func(gpointer node_data, gpointer user_data, C4_Score prev_score, gint last_updated){ register HPair *hpair = user_data; register HPair_NodeData *hpair_node_data = node_data; register HSP *hsp = HPair_NodeData_get_hsp(hpair, hpair_node_data); g_assert(hpair_node_data->sar_end); if(HPair_check_entry(hpair, hsp, hpair_node_data->sar_end->region)) return C4_IMPOSSIBLY_LOW_SCORE; if(HPair_check_region_since(hpair, hpair_node_data->sar_end->region, last_updated)) return SAR_Terminal_find_score(hpair_node_data->sar_end, hpair_node_data->match->end_terminal->optimal, hpair); return prev_score; } /**/ static void HPair_destroy_node_data_func(gpointer data){ register HPair_NodeData *hnd = data; HPair_NodeData_destroy(hnd); return; } static void HPair_destroy_edge_data_func(gpointer data){ register HPair_EdgeData *hed = data; HPair_EdgeData_destroy(hed); return; } /**/ HPair *HPair_create(Heuristic *heuristic, SubOpt *subopt, gint query_length, gint target_length, gint verbosity, gpointer user_data){ register HPair *hpair = g_new(HPair, 1); g_assert(heuristic); g_assert(query_length >= 0); g_assert(target_length >= 0); hpair->heuristic = Heuristic_share(heuristic); hpair->query_length = query_length; hpair->target_length = target_length; hpair->verbosity = verbosity; hpair->user_data = user_data; hpair->is_finalised = FALSE; /**/ hpair->portal_data_list = g_ptr_array_new(); g_assert(heuristic->model->portal_list); g_assert(heuristic->model->portal_list->len); g_ptr_array_set_size(hpair->portal_data_list, heuristic->model->portal_list->len); hpair->bsdp_node_offset = g_new0(gint, heuristic->match_total); /* FIXME: merge confirm and update funcs */ hpair->bsdp = BSDP_create(HPair_confirm_edge_func, HPair_confirm_start_func, HPair_confirm_end_func, HPair_update_edge_func, HPair_update_start_func, HPair_update_end_func, HPair_destroy_node_data_func, HPair_destroy_edge_data_func, hpair); hpair->subopt = SubOpt_share(subopt); return hpair; } static void HPair_destroy_offset_list(HPair *hpair){ if(hpair->bsdp_node_offset){ g_free(hpair->bsdp_node_offset); hpair->bsdp_node_offset = NULL; } return; } void HPair_destroy(HPair *hpair){ register gint i; register HSPset *hspset; for(i = 0; i < hpair->portal_data_list->len; i++){ hspset= hpair->portal_data_list->pdata[i]; if(hspset) HSPset_destroy(hspset); } g_ptr_array_free(hpair->portal_data_list, TRUE); HPair_destroy_offset_list(hpair); BSDP_destroy(hpair->bsdp); Heuristic_destroy(hpair->heuristic); SubOpt_destroy(hpair->subopt); g_free(hpair); return; } /**/ void HPair_add_hspset(HPair *hpair, C4_Portal *portal, HSPset *hsp_set){ g_assert(hpair); g_assert(!hpair->is_finalised); g_assert(hsp_set); /* Check this is the first hsp_set for this portal */ g_assert(!hpair->portal_data_list->pdata[portal->id]); hpair->portal_data_list->pdata[portal->id] = HSPset_share(hsp_set); return; } /**/ static void HPair_initialise_bsdp_nodes(HPair *hpair){ register gint i, j; register Heuristic_Match *match; register HSPset *hsp_set; register HSP *hsp; register gboolean start_ok, end_ok; register C4_Score start_bound, end_bound; register HPair_NodeData *hpair_node_data; register gint node_id, node_total = 0; for(i = 0; i < hpair->heuristic->match_total; i++){ match = hpair->heuristic->match_list[i]; g_assert(match); hsp_set = hpair->portal_data_list->pdata[match->portal->id]; if(!hsp_set) continue; for(j = 0; j < hsp_set->hsp_list->len; j++){ hsp = hsp_set->hsp_list->pdata[j]; hpair_node_data = HPair_NodeData_create(hpair, match, j); if(hpair_node_data->sar_start){ start_bound = SAR_Terminal_find_bound(hpair_node_data->sar_start, match->start_terminal->bound); start_ok = TRUE; } else { start_bound = C4_IMPOSSIBLY_LOW_SCORE; start_ok = FALSE; } if(hpair_node_data->sar_end){ end_bound = SAR_Terminal_find_bound(hpair_node_data->sar_end, match->end_terminal->bound); end_ok = TRUE; } else { end_bound = C4_IMPOSSIBLY_LOW_SCORE; end_ok = FALSE; } node_id = BSDP_add_node(hpair->bsdp, hpair_node_data, hsp->score, start_ok, end_ok, start_bound, end_bound); node_total++; if(!hpair->bsdp_node_offset[i]) hpair->bsdp_node_offset[i] = node_id + 1; } } if(hpair->verbosity > 1) g_message("Built HPair using [%d] nodes", node_total); return; } static gboolean HPair_hsp_pair_is_valid(HSP *src, HSP *dst){ if(src == dst) /* If the same */ return FALSE; /* FIXME: not required with test below */ if((HSP_query_cobs(src) == HSP_query_cobs(dst)) && (HSP_target_cobs(src) == HSP_target_cobs(dst))) return FALSE; /* Can be on same position with mixed HSPsets */ /* Check above and to the left */ if(HSP_query_cobs(src) > HSP_query_cobs(dst)) return FALSE; if(HSP_target_cobs(src) > HSP_target_cobs(dst)) return FALSE; return TRUE; } static void HPair_hsp_pair_calc_emit(HPair *hpair, HSP *src, HSP *dst, gint *query_emit, gint *target_emit){ register gboolean query_overlapping, target_overlapping; g_assert(query_emit); g_assert(target_emit); (*query_emit) = (*target_emit) = 0; query_overlapping = (HSP_query_end(src) > dst->query_start); target_overlapping = (HSP_target_end(src) > dst->target_start); (*query_emit) = dst->query_start - HSP_query_end(src); if(query_overlapping) (*query_emit) = (*query_emit) % HSP_query_advance(dst); (*target_emit) = dst->target_start - HSP_target_end(src); if(target_overlapping) (*target_emit) = (*target_emit) % HSP_target_advance(dst); /**/ if(query_overlapping && (!target_overlapping)) (*target_emit) += ((HSP_query_end(src) - dst->query_start) *(HSP_target_advance(dst)/HSP_query_advance(src))); if(target_overlapping && (!query_overlapping)) (*query_emit) += ((HSP_target_end(src) - dst->target_start) *(HSP_query_advance(dst)/HSP_target_advance(src))); /**/ g_assert((*query_emit) >= 0); g_assert((*target_emit) >= 0); g_assert((*query_emit) <= hpair->query_length); g_assert((*target_emit) <= hpair->target_length); /**/ return; } /* The emit distance is the shortest distance * from part of the query to part of the target. */ static gboolean HPair_Join_is_valid(Heuristic_Join *join, gint query_emit, gint target_emit){ if(query_emit > join->bound->query_range) return FALSE; if(target_emit > join->bound->target_range) return FALSE; return TRUE; } static gboolean HPair_Span_is_valid(Heuristic_Span *heuristic_span, HSP *src, HSP *dst, gint query_emit, gint target_emit){ register gint effective_query_limit, effective_target_limit; effective_query_limit = heuristic_span->span->max_query + heuristic_span->src_bound->query_range + heuristic_span->dst_bound->query_range; effective_target_limit = heuristic_span->span->max_target + heuristic_span->src_bound->target_range + heuristic_span->dst_bound->target_range; if(query_emit > effective_query_limit) return FALSE; if(target_emit > effective_target_limit) return FALSE; if(query_emit < heuristic_span->span->min_query) return FALSE; if(target_emit < heuristic_span->span->min_target) return FALSE; return TRUE; } static void HPair_add_candidate_hsp_pair(HPair *hpair, Heuristic_Pair *pair, HSP *src, HSP *dst, gint src_hsp_id, gint dst_hsp_id, C4_Portal *src_portal, C4_Portal *dst_portal, gint *edge_total){ register SAR_Join *sar_join; register SAR_Span *sar_span; register C4_Score bound_score; register HPair_EdgeData *hpair_edge_data; register Heuristic_Span *heuristic_span; register gint i, src_node_id, dst_node_id; gint query_emit, target_emit; if(!HPair_hsp_pair_is_valid(src, dst)) return; src_node_id = hpair->bsdp_node_offset[pair->src->id] + src_hsp_id - 1; dst_node_id = hpair->bsdp_node_offset[pair->dst->id] + dst_hsp_id - 1; HPair_hsp_pair_calc_emit(hpair, src, dst, &query_emit, &target_emit); if((HPair_Join_is_valid(pair->join, query_emit, target_emit)) && (sar_join = SAR_Join_create(src, dst, hpair, pair))){ bound_score = SAR_Join_find_bound(sar_join); hpair_edge_data = HPair_EdgeData_create(sar_join, NULL); BSDP_add_edge(hpair->bsdp, hpair_edge_data, src_node_id, dst_node_id, bound_score); (*edge_total)++; } else { for(i = 0; i < pair->span_list->len; i++){ heuristic_span = pair->span_list->pdata[i]; if(HPair_Span_is_valid(heuristic_span, src, dst, query_emit, target_emit)){ sar_span = SAR_Span_create(src, dst, hpair, heuristic_span, src_portal, dst_portal); if(sar_span){ bound_score = SAR_Span_find_bound(sar_span); if(bound_score <= C4_IMPOSSIBLY_LOW_SCORE){ SAR_Span_destroy(sar_span); continue; } /* Build edge */ hpair_edge_data = HPair_EdgeData_create(NULL, sar_span); /* Add edge to BSDP */ BSDP_add_edge(hpair->bsdp, hpair_edge_data, src_node_id, dst_node_id, bound_score); (*edge_total)++; } } } } return; } typedef struct { HPair *hpair; Heuristic_Pair *pair; C4_Portal *src_portal; C4_Portal *dst_portal; HSPset *dst_hsp_set; HSP *src_hsp; gint src_hsp_id; gint *edge_total; } HPair_RangeTree_Report_Data; static gboolean HPair_RangeTree_ReportFunc(gint x, gint y, gpointer info, gpointer user_data){ register HPair_RangeTree_Report_Data *hrtrd = user_data; register gint dst_hsp_id = GPOINTER_TO_INT(info); HPair_add_candidate_hsp_pair(hrtrd->hpair, hrtrd->pair, hrtrd->src_hsp, hrtrd->dst_hsp_set->hsp_list->pdata[dst_hsp_id], hrtrd->src_hsp_id, dst_hsp_id, hrtrd->src_portal, hrtrd->dst_portal, hrtrd->edge_total); return FALSE; } static void HPair_find_candidate_hsp_pairs(HPair *hpair, Heuristic_Pair *pair, gint *edge_total){ register gint src_portal_id = pair->src->portal->id, dst_portal_id = pair->dst->portal->id; register HSPset *src_hsp_set = hpair->portal_data_list->pdata[src_portal_id], *dst_hsp_set = hpair->portal_data_list->pdata[dst_portal_id]; register C4_Portal *src_portal = hpair->heuristic->model->portal_list->pdata [src_portal_id], *dst_portal = hpair->heuristic->model->portal_list->pdata [dst_portal_id]; register HSP *src, *dst; register gint i; register RangeTree *rangetree = RangeTree_create(); register HSP *max_dst_cobs_hsp; g_assert(src_portal == pair->src->portal); g_assert(dst_portal == pair->dst->portal); gint max_query_join_range, max_target_join_range; HPair_RangeTree_Report_Data hrtrd; g_assert(src_hsp_set); g_assert(dst_hsp_set); max_dst_cobs_hsp = dst_hsp_set->hsp_list->pdata[0]; hrtrd.hpair = hpair; hrtrd.pair = pair; hrtrd.src_portal = src_portal; hrtrd.dst_portal = dst_portal; hrtrd.dst_hsp_set = dst_hsp_set; hrtrd.edge_total = edge_total; Heuristic_Pair_get_max_range(pair, &max_query_join_range, &max_target_join_range); /* Build RangeTree from the dst HSPset */ for(i = 0; i < dst_hsp_set->hsp_list->len; i++){ dst = dst_hsp_set->hsp_list->pdata[i]; RangeTree_add(rangetree, HSP_query_cobs(dst), HSP_target_cobs(dst), GINT_TO_POINTER(i)); if(max_dst_cobs_hsp->cobs < dst->cobs) max_dst_cobs_hsp = dst; } /* Search RangeTree with each HSP from the src HSPset */ for(i = 0; i < src_hsp_set->hsp_list->len; i++){ src = src_hsp_set->hsp_list->pdata[i]; hrtrd.src_hsp = src; hrtrd.src_hsp_id = i; RangeTree_find(rangetree, HSP_query_cobs(src), (HSP_query_cobs(src) - src->query_start) + (HSP_query_cobs(max_dst_cobs_hsp) - max_dst_cobs_hsp->query_start) + max_query_join_range, HSP_target_cobs(src), (HSP_target_cobs(src) - src->target_start) + (HSP_target_cobs(max_dst_cobs_hsp) - max_dst_cobs_hsp->target_start) + max_target_join_range, HPair_RangeTree_ReportFunc, &hrtrd); } RangeTree_destroy(rangetree, NULL, NULL); return; } static void HPair_initialise_bsdp_edges(HPair *hpair){ register gint i, j; register Heuristic_Pair *pair; gint edge_total = 0; for(i = 0; i < hpair->heuristic->match_total; i++){ for(j = 0; j < hpair->heuristic->match_total; j++){ pair = hpair->heuristic->pair_matrix[i][j]; if(pair) HPair_find_candidate_hsp_pairs(hpair, pair, &edge_total); } } if(hpair->verbosity > 1) g_message("Built HPair using [%d] edges", edge_total); return; } void HPair_finalise(HPair *hpair, C4_Score threshold){ g_assert(hpair); g_assert(!hpair->is_finalised); HPair_initialise_bsdp_nodes(hpair); HPair_initialise_bsdp_edges(hpair); HPair_destroy_offset_list(hpair); BSDP_initialise(hpair->bsdp, threshold); hpair->is_finalised = TRUE; return; } /**/ #if 0 static Alignment *HPair_refine_alignment(HPair *hpair, Alignment *alignment){ register Alignment *refined_alignment = NULL; register Region *region; register gint query_region_start, target_region_start; g_assert(hpair->heuristic->optimal); if(hpair->verbosity > 1) g_message("Refining alignment ... (%d)", alignment->score); switch(hpair->heuristic->has->refinement){ case Heuristic_Refinement_FULL: region = Region_create(0, 0, hpair->query_length, hpair->target_length); refined_alignment = Optimal_find_path( hpair->heuristic->optimal, region, hpair->user_data, alignment->score, hpair->subopt); g_assert(refined_alignment); Region_destroy(region); break; case Heuristic_Refinement_REGION: query_region_start = MAX(0, alignment->region->query_start - hpair->heuristic->has->refinement_boundary); target_region_start = MAX(0, alignment->region->target_start - hpair->heuristic->has->refinement_boundary); region = Region_create(query_region_start, target_region_start, MIN(hpair->query_length, Region_query_end(alignment->region) + hpair->heuristic->has->refinement_boundary) - query_region_start, MIN(hpair->target_length, Region_target_end(alignment->region) + hpair->heuristic->has->refinement_boundary) - target_region_start); refined_alignment = Optimal_find_path( hpair->heuristic->optimal, region, hpair->user_data, alignment->score, hpair->subopt); g_assert(refined_alignment); Region_destroy(region); break; default: g_error("Bad type for refinement [%s]", Heuristic_Refinement_to_string( hpair->heuristic->has->refinement)); break; } g_assert(refined_alignment); g_assert(refined_alignment->score >= alignment->score); return refined_alignment; } #endif /* 0 */ Alignment *HPair_next_path(HPair *hpair, C4_Score threshold){ register Alignment *alignment; register SAR_Alignment *sar_alignment; register BSDP_Path *bsdp_path; register BSDP_Node *first_node, *last_node, *src_node, *dst_node; register gint i; register HPair_EdgeData *hpair_edge_data; register HPair_NodeData *first_node_data, *last_node_data, *src_node_data, *dst_node_data; g_assert(hpair->is_finalised); bsdp_path = BSDP_next_path(hpair->bsdp, threshold); if(!bsdp_path) return NULL; g_assert(bsdp_path->node_list->len); /* Check path not empty */ first_node = bsdp_path->node_list->pdata[0]; last_node = bsdp_path->node_list->pdata [bsdp_path->node_list->len-1]; /**/ first_node_data = first_node->node_data; last_node_data = last_node->node_data; /**/ sar_alignment = SAR_Alignment_create(first_node_data->sar_start, last_node_data->sar_end, first_node_data->match, last_node_data->match, hpair, bsdp_path->score); /* Add first HSP traceback */ SAR_Alignment_add_HSP(sar_alignment, HPair_NodeData_get_hsp(hpair, first_node_data), first_node_data->match); for(i = 1; i < bsdp_path->node_list->len; i++){ src_node = bsdp_path->node_list->pdata[i-1]; dst_node = bsdp_path->node_list->pdata[i]; g_assert(src_node->edge.used->mailbox != -1); hpair_edge_data = src_node->edge.used->edge_data; src_node_data = src_node->node_data; dst_node_data = dst_node->node_data; if(hpair_edge_data->sar_join){ SAR_Alignment_add_SAR_Join(sar_alignment, hpair_edge_data->sar_join); } else { SAR_Alignment_add_SAR_Span(sar_alignment, hpair_edge_data->sar_span); } SAR_Alignment_add_HSP(sar_alignment, HPair_NodeData_get_hsp(hpair, dst_node_data), dst_node_data->match); } SAR_Alignment_finalise(sar_alignment); BSDP_Path_destroy(bsdp_path); alignment = Alignment_share(sar_alignment->alignment); SAR_Alignment_destroy(sar_alignment); #if 0 if(hpair->heuristic->has->refinement != Heuristic_Refinement_NONE){ refined_alignment = HPair_refine_alignment(hpair, alignment); Alignment_destroy(alignment); alignment = refined_alignment; /* Refinement may put alignment below threshold */ if(alignment->score < threshold){ Alignment_destroy(alignment); return NULL; } } #endif /* 0 */ g_assert(alignment->score >= threshold); #if 0 SubOpt_add_alignment(hpair->subopt, alignment); #endif /* 0 */ return alignment; } exonerate-2.4.0/src/bsdp/sar.c0000644000175000017500000013414711162714262013114 00000000000000/****************************************************************\ * * * C4 dynamic programming library - sub-alignment regions * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "sar.h" /**/ SAR_ArgumentSet *SAR_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static SAR_ArgumentSet sas; if(arg){ as = ArgumentSet_create("SAR Options"); /**/ ArgumentSet_add_option(as, '\0', "quality", "percent", "HSP quality threshold", "0", Argument_parse_float, &sas.hsp_quality); /**/ Argument_absorb_ArgumentSet(arg, as); } return &sas; } /**/ static gboolean SAR_region_valid_hsp_exit(Region *region, HSP *hsp){ #ifndef G_DISABLE_ASSERT register gint query_width = Region_query_end(region) - hsp->query_start; register gint target_width = Region_target_end(region) - hsp->target_start; register gint query_prefix = Region_query_end(region) - hsp->query_start; #endif /* G_DISABLE_ASSERT */ /* Check region corner meets the HSP */ g_assert(Region_is_valid(region)); g_assert((query_width * HSP_target_advance(hsp)) == (target_width * HSP_query_advance(hsp))); /* Check region corner is at HSP word boundary */ g_assert(!(query_width % HSP_query_advance(hsp))); g_assert(!(target_width % HSP_target_advance(hsp))); /* Check region boundary is between cobs and HSP end */ g_assert(query_prefix >= 0); g_assert(query_prefix <= HSP_query_cobs(hsp)); return TRUE; } static gboolean SAR_region_valid_hsp_entry(Region *region, HSP *hsp){ #ifndef G_DISABLE_ASSERT register gint query_width = HSP_query_end(hsp) - region->query_start; register gint target_width = HSP_target_end(hsp) - region->target_start; register gint query_suffix = HSP_query_end(hsp) - region->query_start; #endif /* G_DISABLE_ASSERT */ g_assert(Region_is_valid(region)); /* Check region corner meets the HSP */ g_assert((query_width * HSP_target_advance(hsp)) == (target_width * HSP_query_advance(hsp))); /* Check region corner is at HSP word boundary */ g_assert(!(query_width % HSP_query_advance(hsp))); g_assert(!(target_width % HSP_target_advance(hsp))); /* Check region boundary is between HSP start and cobs */ g_assert(query_suffix >= 0); g_assert(query_suffix <= (HSP_query_end(hsp)-HSP_query_cobs(hsp))); return TRUE; } /**/ static gboolean SAR_Terminal_calculate_start_region(HSP *hsp, HPair *hpair, Heuristic_Range *range, C4_Scope scope, Region *region){ register gint to_shrink; register gboolean at_query_edge = FALSE, at_target_edge = FALSE; Region outer_box; /* Find outer box (cobs -> corner) */ outer_box.query_start = 0; outer_box.target_start = 0; outer_box.query_length = HSP_query_cobs(hsp); outer_box.target_length = HSP_target_cobs(hsp); /* Find inner_box (end point) */ region->query_start = hsp->query_start; region->target_start = hsp->target_start; region->query_length = 0; region->target_length = 0; /* Grow inner_box by external range */ region->query_start -= range->external_query; region->target_start -= range->external_target; region->query_length += range->external_query; region->target_length += range->external_target; /* Grow inner_box on HSPs by internal range */ region->query_length += range->internal_query; region->target_length += range->internal_target; /* Trim inner_box to intersection (inner,outer) */ if(region->query_start < outer_box.query_start){ region->query_length -= (outer_box.query_start - region->query_start); region->query_start = outer_box.query_start; } if(region->target_start < outer_box.target_start){ region->target_length -= (outer_box.target_start - region->target_start); region->target_start = outer_box.target_start; } to_shrink = Region_query_end(region) - Region_query_end(&outer_box); if(to_shrink > 0) region->query_length -= to_shrink; to_shrink = Region_target_end(region) - Region_target_end(&outer_box); if(to_shrink > 0) region->target_length -= to_shrink; /* Fail when region is empty */ if(region->query_length <= 0) return FALSE; if(region->target_length <= 0) return FALSE; /* Check region is still on HSP boundary */ g_assert(SAR_region_valid_hsp_exit(region, hsp)); /* Check validity for non-local models */ at_query_edge = (region->query_start == 0); at_target_edge = (region->target_start == 0); switch(scope){ case C4_Scope_ANYWHERE: return TRUE; break; case C4_Scope_CORNER: if(at_query_edge && at_target_edge) return TRUE; break; case C4_Scope_EDGE: if(at_query_edge || at_target_edge) return TRUE; break; case C4_Scope_QUERY: if(at_query_edge) return TRUE; break; case C4_Scope_TARGET: if(at_target_edge) return TRUE; break; } return FALSE; } static gboolean SAR_Terminal_calculate_end_region(HSP *hsp, HPair *hpair, Heuristic_Range *range, C4_Scope scope, Region *region){ register gint to_shrink; register gboolean at_query_edge = FALSE, at_target_edge = FALSE; Region outer_box; /* Find outer box (cobs -> corner) */ outer_box.query_start = HSP_query_cobs(hsp); outer_box.target_start = HSP_target_cobs(hsp); outer_box.query_length = hsp->hsp_set->query->len - HSP_query_cobs(hsp); outer_box.target_length = hsp->hsp_set->target->len - HSP_target_cobs(hsp); /**/ /* Find inner_box (end point) */ region->query_start = HSP_query_end(hsp); region->target_start = HSP_target_end(hsp); region->query_length = 0; region->target_length = 0; /* Grow inner_box by external range */ region->query_length += range->external_query; region->target_length += range->external_target; /* Grow inner_box on HSPs by internal range */ region->query_start -= range->internal_query; region->query_length += range->internal_query; region->target_start -= range->internal_target; region->target_length += range->internal_target; /* Trim inner_box to intersection (inner,outer) */ if(Region_query_end(region) > Region_query_end(&outer_box)){ region->query_length -= (Region_query_end(region) - Region_query_end(&outer_box)); } if(Region_target_end(region) > Region_target_end(&outer_box)){ region->target_length -= (Region_target_end(region) - Region_target_end(&outer_box)); } to_shrink = outer_box.query_start - region->query_start; if(to_shrink > 0){ region->query_start += to_shrink; region->query_length -= to_shrink; } to_shrink = outer_box.target_start - region->target_start; if(to_shrink > 0){ region->target_start += to_shrink; region->target_length -= to_shrink; } /* Fail when region is empty */ if(region->query_length <= 0) return FALSE; if(region->target_length <= 0) return FALSE; /* Check region is still on HSP boundary */ g_assert(SAR_region_valid_hsp_entry(region, hsp)); /* Check validity for non-local models */ at_query_edge = (Region_query_end(region) == hsp->hsp_set->query->len); at_target_edge = (Region_target_end(region) == hsp->hsp_set->target->len); switch(scope){ case C4_Scope_ANYWHERE: return TRUE; break; case C4_Scope_CORNER: if(at_query_edge && at_target_edge) return TRUE; break; case C4_Scope_EDGE: if(at_query_edge || at_target_edge) return TRUE; break; case C4_Scope_QUERY: if(at_query_edge) return TRUE; break; case C4_Scope_TARGET: if(at_target_edge) return TRUE; break; } return FALSE; } /**/ static C4_Score SAR_find_start_component(HPair *hpair, Region *region, HSP *hsp, C4_Calc *calc, gint *prefix){ register C4_Score component = 0; register gint i, query_pos = hsp->query_start, target_pos = hsp->target_start; register Region *calc_region; g_assert(prefix); (*prefix) = (Region_query_end(region) - hsp->query_start) / HSP_query_advance(hsp); g_assert((*prefix) >= 0); calc_region = Region_create(hsp->query_start, hsp->target_start, (*prefix) * HSP_query_advance(hsp), (*prefix) * HSP_target_advance(hsp)); C4_Calc_init(calc, calc_region, hpair->user_data); for(i = 0; i < (*prefix); i++){ component += C4_Calc_score(calc, query_pos, target_pos, hpair->user_data); query_pos += HSP_query_advance(hsp); target_pos += HSP_target_advance(hsp); } g_assert(component <= hsp->score); C4_Calc_exit(calc, calc_region, hpair->user_data); Region_destroy(calc_region); return component; } static C4_Score SAR_find_end_component(HPair *hpair, Region *region, HSP *hsp, C4_Calc *calc, gint *suffix){ register C4_Score component = 0; register gint i, query_pos, target_pos; register Region *calc_region; g_assert(suffix >= 0); g_assert(suffix); (*suffix) = (HSP_query_end(hsp) - region->query_start) / HSP_query_advance(hsp); calc_region = Region_create(region->query_start, region->target_start, (*suffix) * HSP_query_advance(hsp), (*suffix) * HSP_target_advance(hsp)); C4_Calc_init(calc, calc_region, hpair->user_data); query_pos = region->query_start; target_pos = region->target_start; for(i = 0; i < (*suffix); i++){ component += C4_Calc_score(calc, query_pos, target_pos, hpair->user_data); query_pos += HSP_query_advance(hsp); target_pos += HSP_target_advance(hsp); } g_assert(component <= hsp->score); C4_Calc_exit(calc, calc_region, hpair->user_data); Region_destroy(calc_region); return component; } /**/ static void SAR_HSP_quality(HPair *hpair, C4_Calc *calc, HSP *hsp, C4_Score component, gint start, gint length, gint *half_score, gint *max_score){ register gint i, query_pos, target_pos; g_assert(half_score); g_assert(max_score); query_pos = hsp->query_start + (start * HSP_query_advance(hsp)); target_pos = hsp->target_start + (start * HSP_target_advance(hsp)); (*half_score) = (*max_score) = 0; for(i = 0; i < length; i++){ (*half_score) += HSP_get_score(hsp, query_pos, target_pos); (*max_score) += HSP_query_self(hsp, query_pos); query_pos += HSP_query_advance(hsp); target_pos += HSP_target_advance(hsp); } return; } SAR_Terminal *SAR_Terminal_create(HSP *hsp, HPair *hpair, Heuristic_Match *match, gboolean is_start){ register SAR_Terminal *sar_terminal; Region region; register C4_Score component; gint prefix, suffix; register SAR_ArgumentSet *sas = SAR_ArgumentSet_create(NULL); register gboolean is_valid = FALSE; gint half_score, max_score; register gint start, length; prefix = suffix = 0; if(is_start){ is_valid = SAR_Terminal_calculate_start_region(hsp, hpair, match->start_terminal->range, hpair->heuristic->model->start_state->scope, ®ion); } else { /* is end */ is_valid = SAR_Terminal_calculate_end_region(hsp, hpair, match->end_terminal->range, hpair->heuristic->model->end_state->scope, ®ion); } if(!is_valid) return NULL; if(is_start){ component = SAR_find_start_component(hpair, ®ion, hsp, match->portal->calc, &prefix); start = prefix; length = hsp->cobs - prefix; } else { component = SAR_find_end_component(hpair, ®ion, hsp, match->portal->calc, &suffix); start = hsp->cobs; length = hsp->length - hsp->cobs - suffix; } if(length && (sas->hsp_quality > 0.0)){ SAR_HSP_quality(hpair, match->portal->calc, hsp, component, start, length, &half_score, &max_score); if((((gdouble)half_score/(gdouble)max_score) * 100.0) < sas->hsp_quality) return NULL; } sar_terminal = g_new(SAR_Terminal, 1); sar_terminal->region = Region_copy(®ion); sar_terminal->component = component; return sar_terminal; } void SAR_Terminal_destroy(SAR_Terminal *sar_terminal){ g_assert(sar_terminal); Region_destroy(sar_terminal->region); g_free(sar_terminal); return; } C4_Score SAR_Terminal_find_bound(SAR_Terminal *sar_terminal, Heuristic_Bound *heuristic_bound){ g_assert(sar_terminal->region->query_length >= 0); g_assert(sar_terminal->region->target_length >= 0); g_assert(sar_terminal->region->query_length <= heuristic_bound->query_range); g_assert(sar_terminal->region->target_length <= heuristic_bound->target_range); return heuristic_bound->matrix[sar_terminal->region->query_length] [sar_terminal->region->target_length] - sar_terminal->component; } C4_Score SAR_Terminal_find_score(SAR_Terminal *sar_terminal, Optimal *optimal, HPair *hpair){ return Optimal_find_score(optimal, sar_terminal->region, hpair->user_data, hpair->subopt) - sar_terminal->component; } /**/ static void SAR_reduce_mid_overlap(HPair *hpair, HSP *src, HSP *dst, Region *region, C4_Calc *src_calc, C4_Calc *dst_calc){ register C4_Score src_total, dst_total, max_total; register gint src_query_pos, src_target_pos, dst_query_pos, dst_target_pos; register gint max_src_query_pos, max_src_target_pos, max_dst_query_pos, max_dst_target_pos, max_dist_from_centre; /* Reverse up dst region, counting dst_total */ if(!(region->query_length + region->target_length)) return; src_total = dst_total = 0; dst_query_pos = Region_query_end(region) - HSP_query_advance(dst); dst_target_pos = Region_target_end(region) - HSP_target_advance(dst); while((dst_query_pos >= region->query_start) && (dst_target_pos >= region->target_start) && (dst_query_pos >= dst->query_start) && (dst_target_pos >= dst->target_start)){ dst_total += C4_Calc_score(dst_calc, dst_query_pos, dst_target_pos, hpair->user_data); dst_query_pos -= HSP_query_advance(dst); dst_target_pos -= HSP_target_advance(dst); } dst_query_pos += HSP_query_advance(dst); dst_target_pos += HSP_target_advance(dst); /* Go down src region, storing max total */ src_query_pos = region->query_start; src_target_pos = region->target_start; max_total = dst_total; max_src_query_pos = src_query_pos; max_src_target_pos = src_target_pos; max_dst_query_pos = dst_query_pos; max_dst_target_pos = dst_target_pos; max_dist_from_centre = Region_query_end(region) - src_query_pos; while((src_query_pos < Region_query_end(region)) && (src_target_pos < Region_target_end(region)) && (src_query_pos < HSP_query_end(src)) && (src_target_pos < HSP_target_end(src))){ src_total += C4_Calc_score(src_calc, src_query_pos, src_target_pos, hpair->user_data); while((src_query_pos >= dst_query_pos) || (src_target_pos >= dst_target_pos)){ dst_total -= C4_Calc_score(dst_calc, dst_query_pos, dst_target_pos, hpair->user_data); dst_query_pos += HSP_query_advance(dst); dst_target_pos += HSP_target_advance(dst); } if(max_total <= (src_total + dst_total)){ /* If the score better * or the distance from the overlap centre is less */ if((max_total < (src_total + dst_total)) || (ABS(Region_query_end(region)-src_query_pos) < max_dist_from_centre)){ max_dist_from_centre = ABS(Region_query_end(region) - src_query_pos); max_total = (src_total + dst_total); max_src_query_pos = src_query_pos; max_src_target_pos = src_target_pos; max_dst_query_pos = dst_query_pos; max_dst_target_pos = dst_target_pos; } } src_query_pos += HSP_query_advance(src); src_target_pos += HSP_target_advance(src); } /**/ region->query_start = max_src_query_pos; region->target_start = max_src_target_pos; region->query_length = max_dst_query_pos - max_src_query_pos; region->target_length = max_dst_target_pos - max_src_target_pos; g_assert(Region_is_valid(region)); return; } static void SAR_find_end_box(HPair *hpair, HSP *src, HSP *dst, Region *region, Region *cobs_box, C4_Calc *src_calc, C4_Calc *dst_calc){ register gint query_overlap = (HSP_query_end(src) - dst->query_start), target_overlap = (HSP_target_end(src) - dst->target_start); register gint src_query_move = 0, src_target_move = 0, dst_query_move = 0, dst_target_move = 0; region->query_start = MIN(HSP_query_end(src), dst->query_start); region->target_start = MIN(HSP_target_end(src), dst->target_start); region->query_length = MAX(HSP_query_end(src), dst->query_start) - region->query_start; region->target_length = MAX(HSP_target_end(src), dst->target_start) - region->target_start; if((query_overlap > 0) || (target_overlap > 0)){ /* Snap end_box to HSPs within cobs */ src_query_move = region->query_start - cobs_box->query_start; src_target_move = region->target_start - cobs_box->target_start; if((src_query_move <= 0) || (src_target_move <= 0)){ src_query_move = src_target_move = 0; } else { src_query_move -= (src_query_move % HSP_query_advance(src)); src_target_move -= (src_target_move % HSP_target_advance(src)); if((src_query_move / HSP_query_advance(src)) < (src_target_move / HSP_target_advance(src))){ src_target_move = (src_query_move / HSP_query_advance(src)) * HSP_target_advance(src); } else { src_query_move = (src_target_move / HSP_target_advance(src)) * HSP_query_advance(src); } } /**/ dst_query_move = Region_query_end(cobs_box) - Region_query_end(region); dst_target_move = Region_target_end(cobs_box) - Region_target_end(region); if((dst_query_move <= 0) || (dst_target_move <= 0)){ dst_query_move = dst_target_move = 0; } else { dst_query_move -= (dst_query_move % HSP_query_advance(dst)); dst_target_move -= (dst_target_move % HSP_target_advance(dst)); if((dst_query_move / HSP_query_advance(dst)) < (dst_target_move / HSP_target_advance(dst))){ dst_target_move = (dst_query_move / HSP_query_advance(dst)) * HSP_target_advance(dst); } else { dst_query_move = (dst_target_move / HSP_target_advance(dst)) * HSP_query_advance(dst); } } region->query_start = cobs_box->query_start + src_query_move; region->target_start = cobs_box->target_start + src_target_move; region->query_length = Region_query_end(cobs_box) - dst_query_move - region->query_start; region->target_length = Region_target_end(cobs_box) - dst_target_move - region->target_start; g_assert(SAR_region_valid_hsp_entry(region, src)); g_assert(SAR_region_valid_hsp_exit(region, dst)); SAR_reduce_mid_overlap(hpair, src, dst, region, src_calc, dst_calc); } g_assert(SAR_region_valid_hsp_entry(region, src)); g_assert(SAR_region_valid_hsp_exit(region, dst)); return; } static gboolean SAR_find_cobs_box(HSP *src, HSP *dst, Region *region){ region->query_start = HSP_query_cobs(src); region->target_start = HSP_target_cobs(src); region->query_length = HSP_query_cobs(dst) - region->query_start; region->target_length = HSP_target_cobs(dst) - region->target_start; if(region->query_length <= 0) return FALSE; if(region->target_length <= 0) return FALSE; return TRUE; } static gboolean SAR_Join_calculate_region(HPair *hpair, HSP *src, HSP *dst, Region *region, Heuristic_Pair *pair){ register gint to_shrink; Region outer_box; /* Find outer_box (cobs box) */ if(!SAR_find_cobs_box(src, dst, &outer_box)) return FALSE; /* Find inner_box (end_box) */ SAR_find_end_box(hpair, src, dst, region, &outer_box, pair->src->portal->calc, pair->dst->portal->calc); /* Check within external range */ if(region->query_length > (pair->join->src_range->external_query + pair->join->dst_range->external_query)) return FALSE; if(region->target_length > (pair->join->src_range->external_target + pair->join->dst_range->external_target)) return FALSE; /* Grow inner_region by internal_range */ region->query_start -= pair->join->src_range->internal_query; region->query_length += (pair->join->src_range->internal_query + pair->join->dst_range->internal_query); region->target_start -= pair->join->src_range->internal_target; region->target_length += (pair->join->src_range->internal_target + pair->join->dst_range->internal_target); /* Trim inner_box to intersection (inner,outer) */ to_shrink = outer_box.query_start - region->query_start; if(to_shrink > 0){ region->query_start += to_shrink; region->query_length -= to_shrink; } to_shrink = outer_box.target_start - region->target_start; if(to_shrink > 0){ region->target_start += to_shrink; region->target_length -= to_shrink; } to_shrink = Region_query_end(region) - Region_query_end(&outer_box); if(to_shrink > 0) region->query_length -= to_shrink; to_shrink = Region_target_end(region) - Region_target_end(&outer_box); if(to_shrink > 0) region->target_length -= to_shrink; /* Check region not empty */ if(region->query_length < 1) return FALSE; if(region->target_length < 1) return FALSE; /* Check region is still on HSP boundaries */ g_assert(SAR_region_valid_hsp_entry(region, src)); g_assert(SAR_region_valid_hsp_exit(region, dst)); return TRUE; } SAR_Join *SAR_Join_create(HSP *src_hsp, HSP *dst_hsp, HPair *hpair, Heuristic_Pair *pair){ register SAR_Join *sar_join; register C4_Score src_component, dst_component; register gint src_length, dst_length; gint prefix, suffix, src_half_score, dst_half_score, src_max_score, dst_max_score; Region region; register SAR_ArgumentSet *sas = SAR_ArgumentSet_create(NULL); if(!SAR_Join_calculate_region(hpair, src_hsp, dst_hsp, ®ion, pair)) return NULL; src_component = SAR_find_end_component(hpair, ®ion, src_hsp, pair->src->portal->calc, &suffix); dst_component = SAR_find_start_component(hpair, ®ion, dst_hsp, pair->dst->portal->calc, &prefix); src_length = src_hsp->length - src_hsp->cobs - suffix; dst_length = dst_hsp->cobs - prefix; if((src_length + dst_length) && (sas->hsp_quality > 0.0)){ SAR_HSP_quality(hpair, pair->src->portal->calc, src_hsp, src_component, src_hsp->cobs, src_length, &src_half_score, &src_max_score); SAR_HSP_quality(hpair, pair->dst->portal->calc, dst_hsp, dst_component, prefix, dst_length, &dst_half_score, &dst_max_score); if(((((gdouble)(src_half_score+dst_half_score)) / ((gdouble)(src_max_score+dst_max_score))) * 100.0) < sas->hsp_quality) return NULL; } sar_join = g_new(SAR_Join, 1); sar_join->region = Region_copy(®ion); sar_join->src_component = src_component; sar_join->dst_component = dst_component; sar_join->pair = pair; return sar_join; } void SAR_Join_destroy(SAR_Join *sar_join){ g_assert(sar_join); Region_destroy(sar_join->region); g_free(sar_join); return; } C4_Score SAR_Join_find_bound(SAR_Join *sar_join){ g_assert(sar_join->region->query_length >= 0); g_assert(sar_join->region->target_length >= 0); g_assert(sar_join->region->query_length <= sar_join->pair->join->bound->query_range); g_assert(sar_join->region->target_length <= sar_join->pair->join->bound->target_range); return sar_join->pair->join->bound->matrix [sar_join->region->query_length] [sar_join->region->target_length] - (sar_join->src_component + sar_join->dst_component); } C4_Score SAR_Join_find_score(SAR_Join *sar_join, HPair *hpair){ return Optimal_find_score(sar_join->pair->join->optimal, sar_join->region, hpair->user_data, hpair->subopt) - (sar_join->src_component + sar_join->dst_component); } /**/ static gboolean SAR_Span_calculate_regions(HPair *hpair, HSP *src, HSP *dst, Heuristic_Span *span, Region *src_region, Region *dst_region, C4_Calc *src_calc, C4_Calc *dst_calc){ register gint to_shrink; Region outer_box, end_box; /* Find outer_box (cobs box) */ if(!SAR_find_cobs_box(src, dst, &outer_box)) return FALSE; /* Find inner_box (end_box) */ SAR_find_end_box(hpair, src, dst, &end_box, &outer_box, src_calc, dst_calc); /* Set src_region to entry points to end_box */ src_region->query_start = end_box.query_start; src_region->target_start = end_box.target_start; src_region->query_length = 0; src_region->target_length = 0; /* Set dst_region to exit points from end_box */ dst_region->query_start = Region_query_end(&end_box); dst_region->target_start = Region_target_end(&end_box); dst_region->query_length = 0; dst_region->target_length = 0; /* Grow src_region by external range */ src_region->query_length += span->src_range->external_query; src_region->target_length += span->src_range->external_target; /* Grow dst_region by external range */ dst_region->query_start -= span->dst_range->external_query; dst_region->target_start -= span->dst_range->external_target; dst_region->query_length += span->dst_range->external_query; dst_region->target_length += span->dst_range->external_target; /* Grow inner_box on HSPs by internal range */ src_region->query_start -= span->src_range->internal_query; src_region->query_length += span->src_range->internal_query; src_region->target_start -= span->src_range->internal_target; src_region->target_length += span->src_range->internal_target; /**/ /* Grow inner_box on HSPs by internal range */ dst_region->query_length += span->dst_range->internal_query; dst_region->target_length += span->dst_range->internal_target; /**/ /* Trim inner_box to intersection (inner,outer) */ if(Region_query_end(src_region) > Region_query_end(&outer_box)){ src_region->query_length -= (Region_query_end(src_region) - Region_query_end(&outer_box)); } if(Region_target_end(src_region) > Region_target_end(&outer_box)){ src_region->target_length -= (Region_target_end(src_region) - Region_target_end(&outer_box)); } to_shrink = outer_box.query_start - src_region->query_start; if(to_shrink > 0){ src_region->query_start += to_shrink; src_region->query_length -= to_shrink; } to_shrink = outer_box.target_start - src_region->target_start; if(to_shrink > 0){ src_region->target_start += to_shrink; src_region->target_length -= to_shrink; } /* Trim inner_box to intersection (inner,outer) */ if(dst_region->query_start < outer_box.query_start){ dst_region->query_length -= (outer_box.query_start - dst_region->query_start); dst_region->query_start = outer_box.query_start; } if(dst_region->target_start < outer_box.target_start){ dst_region->target_length -= (outer_box.target_start - dst_region->target_start); dst_region->target_start = outer_box.target_start; } to_shrink = Region_query_end(dst_region) - Region_query_end(&outer_box); if(to_shrink > 0) dst_region->query_length -= to_shrink; to_shrink = Region_target_end(dst_region) - Region_target_end(&outer_box); if(to_shrink > 0) dst_region->target_length -= to_shrink; /* Check size not zero */ if(src_region->query_length < 1) return FALSE; if(src_region->target_length < 1) return FALSE; if(dst_region->query_length < 1) return FALSE; if(dst_region->target_length < 1) return FALSE; /* Check a path is possible */ if((dst_region->query_start - Region_query_end(src_region)) > span->span->max_query) return FALSE; if((dst_region->target_start - Region_target_end(src_region)) > span->span->max_target) return FALSE; /* Check region are still on HSP boundaries */ g_assert(SAR_region_valid_hsp_entry(src_region, src)); g_assert(SAR_region_valid_hsp_exit(dst_region, dst)); return TRUE; } SAR_Span *SAR_Span_create(HSP *src_hsp, HSP *dst_hsp, HPair *hpair, Heuristic_Span *span, C4_Portal *src_portal, C4_Portal *dst_portal){ register SAR_Span *sar_span; register SAR_ArgumentSet *sas = SAR_ArgumentSet_create(NULL); register C4_Score src_component, dst_component; register gint src_length, dst_length; gint suffix, prefix, src_half_score, dst_half_score, src_max_score, dst_max_score; Region src_region, dst_region; if(!SAR_Span_calculate_regions(hpair, src_hsp, dst_hsp, span, &src_region, &dst_region, src_portal->calc, dst_portal->calc)) return NULL; src_component = SAR_find_end_component(hpair, &src_region, src_hsp, src_portal->calc, &suffix); dst_component = SAR_find_start_component(hpair, &dst_region, dst_hsp, dst_portal->calc, &prefix); src_length = src_hsp->length - src_hsp->cobs - suffix; dst_length = dst_hsp->cobs - prefix; if((src_length + dst_length) && (sas->hsp_quality > 0.0)){ SAR_HSP_quality(hpair, src_portal->calc, src_hsp, src_component, src_hsp->cobs, src_length, &src_half_score, &src_max_score); SAR_HSP_quality(hpair, dst_portal->calc, dst_hsp, dst_component, prefix, dst_length, &dst_half_score, &dst_max_score); if(((((gdouble)(src_half_score+dst_half_score)) / ((gdouble)(src_max_score+dst_max_score))) * 100.0) < sas->hsp_quality) return NULL; } if((100.0) < sas->hsp_quality) return NULL; sar_span = g_new(SAR_Span, 1); sar_span->src_region = Region_copy(&src_region); sar_span->dst_region = Region_copy(&dst_region); sar_span->src_component = src_component; sar_span->dst_component = dst_component; sar_span->span = span; return sar_span; } void SAR_Span_destroy(SAR_Span *sar_span){ g_assert(sar_span); Region_destroy(sar_span->src_region); Region_destroy(sar_span->dst_region); g_free(sar_span); return; } C4_Score SAR_Span_find_bound(SAR_Span *sar_span){ register C4_Score src_raw_score, dst_raw_score; register gint query_overlap, target_overlap; g_assert(sar_span->src_region->query_length <= sar_span->span->src_bound->query_range); g_assert(sar_span->src_region->target_length <= sar_span->span->src_bound->target_range); g_assert(sar_span->dst_region->query_length <= sar_span->span->dst_bound->query_range); g_assert(sar_span->dst_region->target_length <= sar_span->span->dst_bound->target_range); query_overlap = Region_query_end(sar_span->src_region) - sar_span->dst_region->query_start; target_overlap = Region_target_end(sar_span->src_region) - sar_span->dst_region->target_start; if(query_overlap < 0) query_overlap = 0; if(target_overlap < 0) target_overlap = 0; /**/ /* Get the source bounds */ src_raw_score = sar_span->span->src_bound->matrix [sar_span->src_region->query_length-(query_overlap>>1)] [sar_span->src_region->target_length-(target_overlap>>1)]; dst_raw_score = sar_span->span->dst_bound->matrix [sar_span->dst_region->query_length-(query_overlap>>1) -(query_overlap & 1)] [sar_span->dst_region->target_length-(target_overlap>>1) -(target_overlap & 1)]; /* Calculate the bound score */ return (src_raw_score - sar_span->src_component) + (dst_raw_score - sar_span->dst_component); } C4_Score SAR_Span_find_score(SAR_Span *sar_span, HPair *hpair){ register C4_Score score; register Heuristic_Data *heuristic_data; Heuristic_Span_register(sar_span->span, sar_span->src_region, sar_span->dst_region); /* Calculate src optimal, reporting to the integration matrix */ heuristic_data = hpair->user_data; heuristic_data->heuristic_span = sar_span->span; score = Optimal_find_score(sar_span->span->src_optimal, sar_span->src_region, hpair->user_data, hpair->subopt); Heuristic_Span_integrate(sar_span->span, sar_span->src_region, sar_span->dst_region); /* Calculate dst optimal, starting from the integration matrix */ score = Optimal_find_score(sar_span->span->dst_optimal, sar_span->dst_region, hpair->user_data, hpair->subopt); heuristic_data->heuristic_span = NULL; return score - (sar_span->src_component + sar_span->dst_component); } /**/ SAR_Alignment *SAR_Alignment_create(SAR_Terminal *sar_start, SAR_Terminal *sar_end, Heuristic_Match *start_match, Heuristic_Match *end_match, HPair *hpair, C4_Score score){ register SAR_Alignment *sar_alignment = g_new(SAR_Alignment, 1); register Region *align_region; register Alignment *start_alignment = Optimal_find_path(start_match->start_terminal->optimal, sar_start->region, hpair->user_data, C4_IMPOSSIBLY_LOW_SCORE, hpair->subopt); g_assert(start_alignment); sar_alignment->end_alignment = Optimal_find_path(end_match->end_terminal->optimal, sar_end->region, hpair->user_data, C4_IMPOSSIBLY_LOW_SCORE, hpair->subopt); sar_alignment->end_region = Region_share(sar_end->region); sar_alignment->end_match = end_match; g_assert(sar_alignment->end_alignment); align_region = Region_create(start_alignment->region->query_start, start_alignment->region->target_start, Region_query_end(sar_alignment->end_alignment->region) - start_alignment->region->query_start, Region_target_end(sar_alignment->end_alignment->region) - start_alignment->region->target_start); sar_alignment->alignment = Alignment_create(hpair->heuristic->model, align_region, score); Alignment_import_derived(sar_alignment->alignment, start_alignment, start_match->start_terminal->terminal_model); sar_alignment->hpair = hpair; sar_alignment->last_region = Region_share(start_alignment->region); sar_alignment->last_hsp = NULL; sar_alignment->last_match = NULL; Alignment_destroy(start_alignment); Region_destroy(align_region); return sar_alignment; } void SAR_Alignment_destroy(SAR_Alignment *sar_alignment){ g_assert(sar_alignment); g_assert(sar_alignment->alignment); Alignment_destroy(sar_alignment->alignment); if(sar_alignment->end_alignment) Alignment_destroy(sar_alignment->end_alignment); if(sar_alignment->end_region) Region_destroy(sar_alignment->end_region); if(sar_alignment->last_region) Region_destroy(sar_alignment->last_region); g_free(sar_alignment); return; } static void SAR_Alignment_add_region(SAR_Alignment *sar_alignment, Region *src_region, Region *dst_region){ register gint suffix; g_assert(sar_alignment->last_hsp); g_assert(sar_alignment->last_match); g_assert(!sar_alignment->last_region); suffix = (HSP_query_end(sar_alignment->last_hsp) - src_region->query_start) / HSP_query_advance(sar_alignment->last_hsp); Alignment_add(sar_alignment->alignment, sar_alignment->last_match->transition, -suffix); sar_alignment->last_hsp = NULL; sar_alignment->last_match = NULL; sar_alignment->last_region = Region_share(dst_region); return; } void SAR_Alignment_finalise(SAR_Alignment *sar_alignment){ Region seq_region; g_assert(sar_alignment->end_alignment); SAR_Alignment_add_region(sar_alignment, sar_alignment->end_region, sar_alignment->end_region); Alignment_import_derived(sar_alignment->alignment, sar_alignment->end_alignment, sar_alignment->end_match->end_terminal->terminal_model); Alignment_destroy(sar_alignment->end_alignment); sar_alignment->end_alignment = NULL; Region_destroy(sar_alignment->end_region); sar_alignment->end_region = NULL; Region_init_static(&seq_region, 0, 0, sar_alignment->hpair->query_length, sar_alignment->hpair->target_length); g_assert(Alignment_is_valid(sar_alignment->alignment, &seq_region, sar_alignment->hpair->user_data)); return; } void SAR_Alignment_add_SAR_Join(SAR_Alignment *sar_alignment, SAR_Join *sar_join){ register Alignment *edge_alignment; edge_alignment = Optimal_find_path( sar_join->pair->join->optimal, sar_join->region, sar_alignment->hpair->user_data, C4_IMPOSSIBLY_LOW_SCORE, sar_alignment->hpair->subopt); SAR_Alignment_add_region(sar_alignment, sar_join->region, sar_join->region); Alignment_import_derived(sar_alignment->alignment, edge_alignment, sar_join->pair->join->join_model); Alignment_destroy(edge_alignment); return; } void SAR_Alignment_add_SAR_Span(SAR_Alignment *sar_alignment, SAR_Span *sar_span){ register Alignment *src_alignment, *dst_alignment; register gint query_span_start, target_span_start, query_span_end, target_span_end; register Region *src_align_region; register Heuristic_Data *heuristic_data = sar_alignment->hpair->user_data; heuristic_data->heuristic_span = sar_span->span; Heuristic_Span_register(sar_span->span, sar_span->src_region, sar_span->dst_region); Optimal_find_score(sar_span->span->src_optimal, sar_span->src_region, sar_alignment->hpair->user_data, sar_alignment->hpair->subopt); Heuristic_Span_integrate(sar_span->span, sar_span->src_region, sar_span->dst_region); dst_alignment = Optimal_find_path(sar_span->span->dst_optimal, sar_span->dst_region, sar_alignment->hpair->user_data, C4_IMPOSSIBLY_LOW_SCORE, sar_alignment->hpair->subopt); /* Merge edge traceback */ query_span_end = dst_alignment->region->query_start - sar_span->dst_region->query_start; target_span_end = dst_alignment->region->target_start - sar_span->dst_region->target_start; query_span_start = sar_span->span->dst_integration_matrix [query_span_end][target_span_end].query_pos - sar_span->src_region->query_start; target_span_start = sar_span->span->dst_integration_matrix [query_span_end][target_span_end].target_pos - sar_span->src_region->target_start; src_align_region = Region_create(sar_span->src_region->query_start, sar_span->src_region->target_start, query_span_start, target_span_start); src_alignment = Optimal_find_path( sar_span->span->src_traceback_optimal, src_align_region, sar_alignment->hpair->user_data, C4_IMPOSSIBLY_LOW_SCORE, sar_alignment->hpair->subopt); Region_destroy(src_align_region); SAR_Alignment_add_region(sar_alignment, sar_span->src_region, sar_span->dst_region); Alignment_import_derived(sar_alignment->alignment, src_alignment, sar_span->span->src_traceback_model); /* Add traceback for the span */ Heuristic_Span_add_traceback(sar_span->span, sar_alignment->alignment, dst_alignment->region->query_start -Region_query_end(src_alignment->region), dst_alignment->region->target_start -Region_target_end(src_alignment->region)); heuristic_data->heuristic_span = NULL; Alignment_import_derived(sar_alignment->alignment, dst_alignment, sar_span->span->dst_model); Alignment_destroy(src_alignment); Alignment_destroy(dst_alignment); return; } void SAR_Alignment_add_HSP(SAR_Alignment *sar_alignment, HSP *hsp, Heuristic_Match *match){ register gint prefix; g_assert(sar_alignment->last_region); g_assert(!sar_alignment->last_hsp); g_assert(!sar_alignment->last_match); prefix = (Region_query_end(sar_alignment->last_region) - hsp->query_start) / HSP_query_advance(hsp); Alignment_add(sar_alignment->alignment, match->transition, hsp->length - prefix); Region_destroy(sar_alignment->last_region); sar_alignment->last_region = NULL; sar_alignment->last_hsp = hsp; sar_alignment->last_match = match; return; } exonerate-2.4.0/src/bsdp/sar.test.c0000644000175000017500000000217111162714262014061 00000000000000/****************************************************************\ * * * C4 dynamic programming library - sub-alignment regions * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "argument.h" #include "sar.h" int Argument_main(Argument *arg){ g_warning("Test not implemented for [%s]", __FILE__); return 0; } /* FIXME: implement this test */ exonerate-2.4.0/src/bsdp/heuristic.h0000644000175000017500000001254211162714262014325 00000000000000/****************************************************************\ * * * C4 dynamic programming library - code for heuristics * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_HEURISTIC_H #define INCLUDED_HEURISTIC_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "c4.h" #include "optimal.h" #if 0 typedef enum { Heuristic_Refinement_NONE, Heuristic_Refinement_REGION, Heuristic_Refinement_FULL, Heuristic_Refinement_TOTAL /* Just the total */ } Heuristic_Refinement; gchar *Heuristic_Refinement_to_string(Heuristic_Refinement refinement); Heuristic_Refinement Heuristic_Refinement_from_string(gchar *str); #endif /* 0 */ typedef struct { gint terminal_range_internal; gint terminal_range_external; gint join_range_internal; gint join_range_external; gint span_range_internal; gint span_range_external; #if 0 Heuristic_Refinement refinement; gint refinement_boundary; #endif /* 0 */ } Heuristic_ArgumentSet; Heuristic_ArgumentSet *Heuristic_ArgumentSet_create(Argument *arg); typedef struct { C4_Score **matrix; gint query_range; gint target_range; } Heuristic_Bound; typedef struct { gint internal_query; gint internal_target; gint external_query; gint external_target; } Heuristic_Range; typedef struct { Heuristic_Range *src_range; Heuristic_Range *dst_range; C4_DerivedModel *join_model; Optimal *optimal; Heuristic_Bound *bound; } Heuristic_Join; typedef struct { gint query_pos; gint target_pos; } Heuristic_Span_Cell; /* Correspond to positions in the src_region */ /* FIXME: should correspond to positions * in the src_integration_matrix */ typedef struct { Heuristic_Range *src_range; Heuristic_Range *dst_range; C4_DerivedModel *src_model; C4_DerivedModel *src_traceback_model; C4_DerivedModel *dst_model; Heuristic_Bound *src_bound; Heuristic_Bound *dst_bound; Optimal *src_optimal; Optimal *src_traceback_optimal; Optimal *dst_optimal; C4_Span *span; /**/ C4_Score ***src_integration_matrix; Heuristic_Span_Cell **dst_integration_matrix; Region *curr_src_region; Region *curr_dst_region; C4_Score *dummy_cell; } Heuristic_Span; /* The sizes of the integration matrices are as in {src,dst}_bound */ typedef struct { Heuristic_Range *range; C4_DerivedModel *terminal_model; Heuristic_Bound *bound; Optimal *optimal; } Heuristic_Terminal; typedef struct { gint id; C4_Portal *portal; C4_Transition *transition; Heuristic_Terminal *start_terminal; Heuristic_Terminal *end_terminal; } Heuristic_Match; /* A Heuristic_Match for each portal/transition combination */ typedef struct { Heuristic_Match *src; Heuristic_Match *dst; Heuristic_Join *join; GPtrArray *span_list; /* Contains (*Heuristic_Span) */ } Heuristic_Pair; /* A Heuristic_Pair for each valid Heuristic_Match pair */ void Heuristic_Pair_get_max_range(Heuristic_Pair *pair, gint *max_query_join_range, gint *max_target_join_range); typedef struct { gchar *name; gint ref_count; Heuristic_ArgumentSet *has; C4_Model *model; #if 0 Optimal *optimal; #endif /* 0 */ gint match_total; Heuristic_Match **match_list; Heuristic_Pair ***pair_matrix; } Heuristic; /**/ Heuristic *Heuristic_create(C4_Model *model); void Heuristic_destroy(Heuristic *heuristic); Heuristic *Heuristic_share(Heuristic *heuristic); /**/ /* FIXME: move to hpair ? */ typedef struct { Heuristic_Span *heuristic_span; } Heuristic_Data; void Heuristic_Span_register(Heuristic_Span *heuristic_span, Region *src_region, Region *dst_region); void Heuristic_Span_integrate(Heuristic_Span *heuristic_span, Region *src_region, Region *dst_region); void Heuristic_Span_add_traceback(Heuristic_Span *heuristic_span, Alignment *alignment, gint query_length, gint target_length); GPtrArray *Heuristic_get_Optimal_list(Heuristic *heuristic); /* Required for bootstrapper */ #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #endif /* INCLUDED_HEURISTIC_H */ exonerate-2.4.0/src/bsdp/bsdp.h0000644000175000017500000001442111162714262013254 00000000000000/****************************************************************\ * * * BSDP: Bounded Sparse Dynamic Programming * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_BSDP_H #define INCLUDED_BSDP_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "c4.h" #include "pqueue.h" #include "recyclebin.h" #include "argument.h" typedef struct { gint join_filter; } BSDP_ArgumentSet; BSDP_ArgumentSet *BSDP_ArgumentSet_create(Argument *arg); typedef struct { gpointer edge_data; struct BSDP_Node *dst; C4_Score join_score; /* Current edge cost */ C4_Score stored_partial; /* Score for edge.pqueue */ gint mailbox; /* 0 indicates confirmed */ } BSDP_Edge; /* The confirm cost currently used is (DP_cells * states) */ typedef enum { BSDP_Node_Mask_IS_NEW = (1<<0), BSDP_Node_Mask_IS_INITIALISED = (1<<1), BSDP_Node_Mask_IS_USED = (1<<2), BSDP_Node_Mask_IS_VALID_START = (1<<3), BSDP_Node_Mask_IS_VALID_END = (1<<4), BSDP_Node_Mask_CONFIRMED_START = (1<<5), BSDP_Node_Mask_CONFIRMED_END = (1<<6), BSDP_Node_Mask_USED_AS_START = (1<<7), BSDP_Node_Mask_USED_AS_END = (1<<8), BSDP_Node_Mask_SCORED_TERMINAL = (1<<9) } BSDP_Node_Mask; typedef struct BSDP_Node { BSDP_Node_Mask mask; gpointer node_data; C4_Score node_score; /* Score for HSP only */ C4_Score stored_total; /* Score for node_pqueue */ union { GPtrArray *list; /* Used once IS_NEW */ PQueue *pqueue; /* Used once IS_INITIALISED */ BSDP_Edge *used; /* Used once IS_USED */ } edge; C4_Score start_score; C4_Score end_score; gint start_mailbox; gint end_mailbox; } BSDP_Node; typedef C4_Score (*BSDP_ConfirmEdgeFunc)(gpointer src_data, gpointer edge_data, gpointer dst_data, gpointer user_data); typedef C4_Score (*BSDP_UpdateEdgeFunc)(gpointer src_data, gpointer edge_data, gpointer dst_data, gpointer user_data, C4_Score prev_score, gint last_updated); typedef C4_Score (*BSDP_ConfirmTerminalFunc)(gpointer node_data, gpointer user_data); typedef C4_Score (*BSDP_UpdateTerminalFunc)(gpointer node_data, gpointer user_data, C4_Score prev_score, gint last_updated); typedef gboolean (*BSDP_NodeIsUsedFunc)(gpointer node_data, gpointer user_data); typedef void (*BSDP_UseNodeFunc)(gpointer node_data, gpointer user_data); typedef void (*BSDP_DestroyFunc)(gpointer data); typedef struct { gint ref_count; C4_Score score; BSDP_Edge *bsdp_edge; BSDP_Node *src; } BSDP_Potential; typedef struct { PQueue *src_edge_pqueue; PQueue *dst_edge_pqueue; } BSDP_Edge_Filter; typedef struct { BSDP_ArgumentSet *bas; BSDP_ConfirmEdgeFunc confirm_edge_func; BSDP_ConfirmTerminalFunc confirm_start_func; BSDP_ConfirmTerminalFunc confirm_end_func; BSDP_UpdateEdgeFunc update_edge_func; BSDP_UpdateTerminalFunc update_start_func; BSDP_UpdateTerminalFunc update_end_func; /**/ BSDP_DestroyFunc destroy_node_data_func; BSDP_DestroyFunc destroy_edge_data_func; gpointer user_data; GPtrArray *node_list; PQueue *node_pqueue; RecycleBin *edge_recycle; guint path_count; PQueueSet *pqueue_set; BSDP_Edge_Filter **edge_filter; RecycleBin *potential_recycle; } BSDP; /**/ BSDP *BSDP_create(BSDP_ConfirmEdgeFunc confirm_edge_func, BSDP_ConfirmTerminalFunc confirm_start_func, BSDP_ConfirmTerminalFunc confirm_end_func, BSDP_UpdateEdgeFunc update_edge_func, BSDP_UpdateTerminalFunc update_start_func, BSDP_UpdateTerminalFunc update_end_func, BSDP_DestroyFunc destroy_node_data_func, BSDP_DestroyFunc destroy_edge_data_func, gpointer user_data); void BSDP_destroy(BSDP *bsdp); gint BSDP_add_node(BSDP *bsdp, gpointer node_data, C4_Score node_score, gboolean is_valid_start, gboolean is_valid_end, C4_Score start_bound, C4_Score end_bound); void BSDP_add_edge(BSDP *bsdp, gpointer edge_data, gint src_node_id, gint dst_node_id, C4_Score bound_score); void BSDP_initialise(BSDP *bsdp, C4_Score threshold); /* This should be called after all the nodes and edges have been added */ typedef struct { C4_Score score; GPtrArray *node_list; } BSDP_Path; BSDP_Path *BSDP_next_path(BSDP *bsdp, C4_Score threshold); void BSDP_Path_destroy(BSDP_Path *bsdp_path); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_BSDP_H */ exonerate-2.4.0/src/bsdp/hpair.h0000644000175000017500000000425411162714262013432 00000000000000/****************************************************************\ * * * C4 dynamic programming library - code for heuristic pairs * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_HPAIR_H #define INCLUDED_HPAIR_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "c4.h" #include "alignment.h" #include "bsdp.h" #include "heuristic.h" #include "hspset.h" #include "subopt.h" typedef struct { gint query_length; gint target_length; gpointer user_data; Heuristic *heuristic; gboolean is_finalised; GPtrArray *portal_data_list; /* HSPset indexed by portal id */ gint *bsdp_node_offset; /* indexed by match */ BSDP *bsdp; SubOpt *subopt; gint verbosity; } HPair; HPair *HPair_create(Heuristic *heuristic, SubOpt *subopt, gint query_length, gint target_length, gint verbosity, gpointer user_data); void HPair_destroy(HPair *hpair); /* user_data is provided for the Optimal functions */ void HPair_add_hspset(HPair *hpair, C4_Portal *portal, HSPset *hsp_set); /* HSPsets must be finalised when provided */ void HPair_finalise(HPair *hpair, C4_Score threshold); Alignment *HPair_next_path(HPair *hpair, C4_Score threshold); /* FIXME: need C4_Score HPair_next_score(HPair *hpair); */ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_HPAIR_H */ exonerate-2.4.0/src/bsdp/sar.h0000644000175000017500000001111111162714262013102 00000000000000/****************************************************************\ * * * C4 dynamic programming library - sub-alignment regions * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_SAR_H #define INCLUDED_SAR_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "c4.h" #include "hpair.h" #include "hspset.h" #include "region.h" typedef struct { gfloat hsp_quality; } SAR_ArgumentSet; SAR_ArgumentSet *SAR_ArgumentSet_create(Argument *arg); /* The five types of SAR used in C4: * * +-----+ * | | START * | \| * +-----\ * x * \-----+ * |\ \ | JOIN * | \ \| * +-----\ * x * \-----+ ....... +-----+ * |\ | | | * SRC | \ | (span) | \ | DST * | | | \| * +-----+ ....... +-----\ * x * \-----+ * |\ | * | \ | END * | | * +-----+ */ typedef struct { Region *region; C4_Score component; } SAR_Terminal; SAR_Terminal *SAR_Terminal_create(HSP *hsp, HPair *hpair, Heuristic_Match *match, gboolean is_start); void SAR_Terminal_destroy(SAR_Terminal *sar_terminal); C4_Score SAR_Terminal_find_bound(SAR_Terminal *sar_terminal, Heuristic_Bound *heuristic_bound); C4_Score SAR_Terminal_find_score(SAR_Terminal *sar_terminal, Optimal *optimal, HPair *hpair); /**/ typedef struct { Region *region; C4_Score src_component; C4_Score dst_component; Heuristic_Pair *pair; } SAR_Join; SAR_Join *SAR_Join_create(HSP *src_hsp, HSP *dst_hsp, HPair *hpair, Heuristic_Pair *pair); void SAR_Join_destroy(SAR_Join *sar_join); C4_Score SAR_Join_find_bound(SAR_Join *sar_join); C4_Score SAR_Join_find_score(SAR_Join *sar_join, HPair *hpair); /**/ typedef struct { Region *src_region; Region *dst_region; C4_Score src_component; C4_Score dst_component; Heuristic_Span *span; } SAR_Span; SAR_Span *SAR_Span_create(HSP *src_hsp, HSP *dst_hsp, HPair *hpair, Heuristic_Span *span, C4_Portal *src_portal, C4_Portal *dst_portal); void SAR_Span_destroy(SAR_Span *sar_span); C4_Score SAR_Span_find_bound(SAR_Span *sar_span); C4_Score SAR_Span_find_score(SAR_Span *sar_span, HPair *hpair); /**/ typedef struct { Alignment *alignment; Alignment *end_alignment; Region *end_region; Heuristic_Match *end_match; HPair *hpair; HSP *last_hsp; Region *last_region; Heuristic_Match *last_match; } SAR_Alignment; SAR_Alignment *SAR_Alignment_create(SAR_Terminal *sar_start, SAR_Terminal *sar_end, Heuristic_Match *start_match, Heuristic_Match *end_match, HPair *hpair, C4_Score score); void SAR_Alignment_finalise(SAR_Alignment *sar_alignment); void SAR_Alignment_destroy(SAR_Alignment *sar_alignment); void SAR_Alignment_add_SAR_Join(SAR_Alignment *sar_alignment, SAR_Join *sar_join); void SAR_Alignment_add_SAR_Span(SAR_Alignment *sar_alignment, SAR_Span *sar_span); void SAR_Alignment_add_HSP(SAR_Alignment *sar_alignment, HSP *hsp, Heuristic_Match *match); /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_SAR_H */ exonerate-2.4.0/src/sdp/0000777000175000017500000000000011222703703012066 500000000000000exonerate-2.4.0/src/sdp/Makefile.am0000644000175000017500000000627411222243617014052 00000000000000 TESTS = sdp.test lookahead.test boundary.test scheduler.test \ straceback.test noinst_PROGRAMS = $(TESTS) INCLUDES = -I$(top_srcdir)/src/struct \ -I$(top_srcdir)/src/comparison \ -I$(top_srcdir)/src/sequence \ -I$(top_srcdir)/src/general \ -I$(top_srcdir)/src/c4 \ -DSOURCE_ROOT_DIR="\"@source_root_dir@\"" \ -DGLIB_CFLAGS="\"@glib_cflags@\"" \ -DHOSTTYPE="\"@host@\"" noinst_HEADERS = scheduler.h sdp.h lookahead.h boundary.h straceback.h SEQUENCE_OBJ = $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o \ -lm C4_OBJ = $(top_srcdir)/src/c4/c4.o \ $(top_srcdir)/src/c4/alignment.o \ $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/codegen.o \ $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/viterbi.o \ $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/c4/cgutil.o ALIGNMENT_OBJ = $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(C4_OBJ) $(SEQUENCE_OBJ) sdp_test_SOURCES = sdp.test.c sdp.c boundary.c scheduler.c lookahead.c \ straceback.c sdp_test_LDADD = $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/comparison.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(ALIGNMENT_OBJ) scheduler_test_SOURCES = scheduler.test.c scheduler.c lookahead.c \ boundary.c straceback.c scheduler_test_LDADD = $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(ALIGNMENT_OBJ) lookahead_test_SOURCES = lookahead.test.c lookahead.c boundary_test_SOURCES = boundary.test.c boundary.c boundary_test_LDADD = $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/c4/region.o straceback_test_SOURCES = straceback.test.c straceback.c straceback_test_LDADD = $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(C4_OBJ) $(SEQUENCE_OBJ) # Files to clear away MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/src/sdp/Makefile.in0000644000175000017500000003541411222703650014057 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ TESTS = sdp.test lookahead.test boundary.test scheduler.test straceback.test noinst_PROGRAMS = $(TESTS) INCLUDES = -I$(top_srcdir)/src/struct -I$(top_srcdir)/src/comparison -I$(top_srcdir)/src/sequence -I$(top_srcdir)/src/general -I$(top_srcdir)/src/c4 -DSOURCE_ROOT_DIR="\"@source_root_dir@\"" -DGLIB_CFLAGS="\"@glib_cflags@\"" -DHOSTTYPE="\"@host@\"" noinst_HEADERS = scheduler.h sdp.h lookahead.h boundary.h straceback.h SEQUENCE_OBJ = $(top_srcdir)/src/struct/sparsecache.o $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o $(top_srcdir)/src/sequence/alphabet.o $(top_srcdir)/src/sequence/splice.o $(top_srcdir)/src/sequence/translate.o $(top_srcdir)/src/general/argument.o $(top_srcdir)/src/general/lineparse.o $(top_srcdir)/src/general/threadref.o -lm C4_OBJ = $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/optimal.o $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/region.o $(top_srcdir)/src/c4/layout.o $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/subopt.o $(top_srcdir)/src/c4/cgutil.o ALIGNMENT_OBJ = $(top_srcdir)/src/comparison/match.o $(top_srcdir)/src/sequence/codonsubmat.o $(top_srcdir)/src/sequence/submat.o $(C4_OBJ) $(SEQUENCE_OBJ) sdp_test_SOURCES = sdp.test.c sdp.c boundary.c scheduler.c lookahead.c straceback.c sdp_test_LDADD = $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/rangetree.o $(top_srcdir)/src/comparison/hspset.o $(top_srcdir)/src/comparison/comparison.o $(top_srcdir)/src/comparison/wordhood.o $(ALIGNMENT_OBJ) scheduler_test_SOURCES = scheduler.test.c scheduler.c lookahead.c boundary.c straceback.c scheduler_test_LDADD = $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/rangetree.o $(ALIGNMENT_OBJ) lookahead_test_SOURCES = lookahead.test.c lookahead.c boundary_test_SOURCES = boundary.test.c boundary.c boundary_test_LDADD = $(top_srcdir)/src/general/argument.o $(top_srcdir)/src/c4/region.o straceback_test_SOURCES = straceback.test.c straceback.c straceback_test_LDADD = $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/rangetree.o $(C4_OBJ) $(SEQUENCE_OBJ) # Files to clear away MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = PROGRAMS = $(noinst_PROGRAMS) DEFS = @DEFS@ -I. -I$(srcdir) CPPFLAGS = @CPPFLAGS@ LDFLAGS = @LDFLAGS@ LIBS = @LIBS@ sdp_test_OBJECTS = sdp.test.o sdp.o boundary.o scheduler.o lookahead.o \ straceback.o sdp_test_DEPENDENCIES = $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/comparison.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/c4/c4.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/layout.o $(top_srcdir)/src/c4/viterbi.o \ $(top_srcdir)/src/c4/subopt.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o sdp_test_LDFLAGS = lookahead_test_OBJECTS = lookahead.test.o lookahead.o lookahead_test_LDADD = $(LDADD) lookahead_test_DEPENDENCIES = lookahead_test_LDFLAGS = boundary_test_OBJECTS = boundary.test.o boundary.o boundary_test_DEPENDENCIES = $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/c4/region.o boundary_test_LDFLAGS = scheduler_test_OBJECTS = scheduler.test.o scheduler.o lookahead.o \ boundary.o straceback.o scheduler_test_DEPENDENCIES = $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/c4/c4.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/layout.o $(top_srcdir)/src/c4/viterbi.o \ $(top_srcdir)/src/c4/subopt.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o scheduler_test_LDFLAGS = straceback_test_OBJECTS = straceback.test.o straceback.o straceback_test_DEPENDENCIES = $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/alignment.o \ $(top_srcdir)/src/c4/optimal.o $(top_srcdir)/src/c4/codegen.o \ $(top_srcdir)/src/c4/region.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/c4/cgutil.o $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o straceback_test_LDFLAGS = CFLAGS = @CFLAGS@ COMPILE = $(CC) $(DEFS) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) CCLD = $(CC) LINK = $(CCLD) $(AM_CFLAGS) $(CFLAGS) $(LDFLAGS) -o $@ HEADERS = $(noinst_HEADERS) DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best SOURCES = $(sdp_test_SOURCES) $(lookahead_test_SOURCES) $(boundary_test_SOURCES) $(scheduler_test_SOURCES) $(straceback_test_SOURCES) OBJECTS = $(sdp_test_OBJECTS) $(lookahead_test_OBJECTS) $(boundary_test_OBJECTS) $(scheduler_test_OBJECTS) $(straceback_test_OBJECTS) all: all-redirect .SUFFIXES: .SUFFIXES: .S .c .o .s $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps src/sdp/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status mostlyclean-noinstPROGRAMS: clean-noinstPROGRAMS: -test -z "$(noinst_PROGRAMS)" || rm -f $(noinst_PROGRAMS) distclean-noinstPROGRAMS: maintainer-clean-noinstPROGRAMS: .c.o: $(COMPILE) -c $< .s.o: $(COMPILE) -c $< .S.o: $(COMPILE) -c $< mostlyclean-compile: -rm -f *.o core *.core clean-compile: distclean-compile: -rm -f *.tab.c maintainer-clean-compile: sdp.test: $(sdp_test_OBJECTS) $(sdp_test_DEPENDENCIES) @rm -f sdp.test $(LINK) $(sdp_test_LDFLAGS) $(sdp_test_OBJECTS) $(sdp_test_LDADD) $(LIBS) lookahead.test: $(lookahead_test_OBJECTS) $(lookahead_test_DEPENDENCIES) @rm -f lookahead.test $(LINK) $(lookahead_test_LDFLAGS) $(lookahead_test_OBJECTS) $(lookahead_test_LDADD) $(LIBS) boundary.test: $(boundary_test_OBJECTS) $(boundary_test_DEPENDENCIES) @rm -f boundary.test $(LINK) $(boundary_test_LDFLAGS) $(boundary_test_OBJECTS) $(boundary_test_LDADD) $(LIBS) scheduler.test: $(scheduler_test_OBJECTS) $(scheduler_test_DEPENDENCIES) @rm -f scheduler.test $(LINK) $(scheduler_test_LDFLAGS) $(scheduler_test_OBJECTS) $(scheduler_test_LDADD) $(LIBS) straceback.test: $(straceback_test_OBJECTS) $(straceback_test_DEPENDENCIES) @rm -f straceback.test $(LINK) $(straceback_test_LDFLAGS) $(straceback_test_OBJECTS) $(straceback_test_LDADD) $(LIBS) tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = src/sdp distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done check-TESTS: $(TESTS) @failed=0; all=0; \ srcdir=$(srcdir); export srcdir; \ for tst in $(TESTS); do \ if test -f $$tst; then dir=.; \ else dir="$(srcdir)"; fi; \ if $(TESTS_ENVIRONMENT) $$dir/$$tst; then \ all=`expr $$all + 1`; \ echo "PASS: $$tst"; \ elif test $$? -ne 77; then \ all=`expr $$all + 1`; \ failed=`expr $$failed + 1`; \ echo "FAIL: $$tst"; \ fi; \ done; \ if test "$$failed" -eq 0; then \ banner="All $$all tests passed"; \ else \ banner="$$failed of $$all tests failed"; \ fi; \ dashes=`echo "$$banner" | sed s/./=/g`; \ echo "$$dashes"; \ echo "$$banner"; \ echo "$$dashes"; \ test "$$failed" -eq 0 info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am $(MAKE) $(AM_MAKEFLAGS) check-TESTS check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall: uninstall-am all-am: Makefile $(PROGRAMS) $(HEADERS) all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-noinstPROGRAMS mostlyclean-compile \ mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-noinstPROGRAMS clean-compile clean-tags clean-generic \ mostlyclean-am clean: clean-am distclean-am: distclean-noinstPROGRAMS distclean-compile distclean-tags \ distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-noinstPROGRAMS \ maintainer-clean-compile maintainer-clean-tags \ maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: mostlyclean-noinstPROGRAMS distclean-noinstPROGRAMS \ clean-noinstPROGRAMS maintainer-clean-noinstPROGRAMS \ mostlyclean-compile distclean-compile clean-compile \ maintainer-clean-compile tags mostlyclean-tags distclean-tags \ clean-tags maintainer-clean-tags distdir check-TESTS info-am info \ dvi-am dvi check check-am installcheck-am installcheck install-exec-am \ install-exec install-data-am install-data install-am install \ uninstall-am uninstall all-redirect all-am all installdirs \ mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/src/sdp/sdp.test.c0000644000175000017500000000204211162714330013711 00000000000000/****************************************************************\ * * * C4 dynamic programming library - Seeded Dynamic Programming * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "sdp.h" int Argument_main(Argument *arg){ g_warning("[%s] does nothing", __FILE__); return 0; } exonerate-2.4.0/src/sdp/sdp.c0000644000175000017500000007571611222434423012754 00000000000000/****************************************************************\ * * * C4 dynamic programming library - Seeded Dynamic Programming * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For qsort() */ #include "sdp.h" #include "matrix.h" /**/ SDP_ArgumentSet *SDP_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static SDP_ArgumentSet sas = {30}; if(arg){ as = ArgumentSet_create("Seeded Dynamic Programming options"); ArgumentSet_add_option(as, 'x', "extensionthreshold", NULL, "Gapped extension threshold", "50", Argument_parse_int, &sas.dropoff); ArgumentSet_add_option(as, 0, "singlepass", NULL, "Generate suboptimal alignment in a single pass", "TRUE", Argument_parse_boolean, &sas.single_pass_subopt); Argument_absorb_ArgumentSet(arg, as); } return &sas; } /**/ typedef struct { GPtrArray *seed_list; gint seed_pos; gint seed_state_id; SDP_Pair *sdp_pair; } Scheduler_Seed_List; static Scheduler_Seed_List *Scheduler_Seed_List_create(SDP_Pair *sdp_pair, gboolean is_forward){ register Scheduler_Seed_List *seed_info_list = g_new(Scheduler_Seed_List, 1); register C4_Model *model = sdp_pair->sdp->model; seed_info_list->seed_list = sdp_pair->seed_list; seed_info_list->seed_pos = 0; if(is_forward){ seed_info_list->seed_state_id = model->start_state->state->id; } else { seed_info_list->seed_state_id = model->end_state->state->id; } seed_info_list->sdp_pair = sdp_pair; return seed_info_list; } static void Scheduler_Seed_List_destroy( Scheduler_Seed_List *seed_info_list){ g_free(seed_info_list); return; } static void Scheduler_Seed_List_init_forward(gpointer seed_data){ register Scheduler_Seed_List *seed_info_list = seed_data; g_assert(seed_info_list); seed_info_list->seed_pos = 0; return; } static void Scheduler_Seed_List_init_reverse(gpointer seed_data){ register Scheduler_Seed_List *seed_info_list = seed_data; g_assert(seed_info_list); seed_info_list->seed_pos = seed_info_list->seed_list->len - 1; return; } static void Scheduler_Seed_List_next_forward(gpointer seed_data){ register Scheduler_Seed_List *seed_info_list = seed_data; g_assert(seed_info_list); g_assert(seed_info_list->seed_pos < seed_info_list->seed_list->len); seed_info_list->seed_pos++; return; } static void Scheduler_Seed_List_next_reverse(gpointer seed_data){ register Scheduler_Seed_List *seed_info_list = seed_data; g_assert(seed_info_list); g_assert(seed_info_list->seed_pos >= 0); seed_info_list->seed_pos--; return; } static gboolean Scheduler_Seed_List_get_forward(gpointer seed_data, Scheduler_Seed *seed_info){ register Scheduler_Seed_List *seed_info_list = seed_data; register SDP_Seed *seed; g_assert(seed_info_list); if(seed_info_list->seed_pos >= seed_info_list->seed_list->len) return FALSE; seed = seed_info_list->seed_list->pdata[seed_info_list->seed_pos]; seed_info->query_pos = HSP_query_cobs(seed->hsp); seed_info->target_pos = HSP_target_cobs(seed->hsp); seed_info->seed_id = seed->seed_id; seed_info->start_score = seed->max_start->score - (seed->hsp->score >> 1); return TRUE; } static gboolean Scheduler_Seed_List_get_reverse(gpointer seed_data, Scheduler_Seed *seed_info){ register Scheduler_Seed_List *seed_info_list = seed_data; register SDP_Seed *seed; g_assert(seed_info_list); if(seed_info_list->seed_pos < 0) return FALSE; seed = seed_info_list->seed_list->pdata[seed_info_list->seed_pos]; seed_info->query_pos = -HSP_query_cobs(seed->hsp); seed_info->target_pos = -HSP_target_cobs(seed->hsp); seed_info->seed_id = seed->seed_id; seed_info->start_score = (seed->hsp->score >> 1); return TRUE; } static void Scheduler_Seed_List_start_func(gpointer seed_data, gint seed_id, C4_Score score, gint start_query_pos, gint start_target_pos, STraceback_Cell *stcell){ register Scheduler_Seed_List *seed_info_list = seed_data; register SDP_Seed *seed; g_assert(seed_info_list); g_assert(seed_info_list->seed_list); g_assert(seed_id < seed_info_list->sdp_pair->seed_list->len); seed = seed_info_list->sdp_pair->seed_list->pdata[seed_id]; /* Is best start for this seed */ if(seed->max_start->score < score){ seed->max_start->score = score; seed->max_start->query_pos = start_query_pos; seed->max_start->target_pos = start_target_pos; if(stcell) seed->max_start->cell = STraceback_Cell_share(stcell); else seed->max_start->cell = NULL; } return; } static void Scheduler_Seed_List_end_func(gpointer seed_data, gint seed_id, C4_Score score, gint end_query_pos, gint end_target_pos, STraceback_Cell *stcell){ register Scheduler_Seed_List *seed_info_list = seed_data; register SDP_Seed *seed; g_assert(seed_info_list); g_assert(seed_info_list->seed_list); g_assert(seed_id < seed_info_list->sdp_pair->seed_list->len); seed = seed_info_list->sdp_pair->seed_list->pdata[seed_id]; /* Is best end for this seed */ if(seed->max_end->score < score){ seed->max_end->score = score; seed->max_end->query_pos = end_query_pos; seed->max_end->target_pos = end_target_pos; g_assert(stcell); seed->max_end->cell = STraceback_Cell_share(stcell); } return; } /**/ typedef struct { Boundary *boundary; gint curr_row; gint curr_interval; gint curr_pos; gboolean is_finished; gint seed_state_id; SDP_Pair *sdp_pair; } Scheduler_Seed_Boundary; static Scheduler_Seed_Boundary *Scheduler_Seed_Boundary_create( Boundary *boundary, SDP_Pair *sdp_pair){ register Scheduler_Seed_Boundary *seed_info_boundary = g_new(Scheduler_Seed_Boundary, 1); seed_info_boundary->boundary = Boundary_share(boundary); seed_info_boundary->curr_row = 0; seed_info_boundary->curr_interval = 0; seed_info_boundary->curr_pos = 0; seed_info_boundary->is_finished = FALSE; seed_info_boundary->seed_state_id = sdp_pair->sdp->model->start_state->state->id; seed_info_boundary->sdp_pair = sdp_pair; return seed_info_boundary; } static void Scheduler_Seed_Boundary_destroy( Scheduler_Seed_Boundary *seed_info_boundary){ Boundary_destroy(seed_info_boundary->boundary); g_free(seed_info_boundary); return; } static void Scheduler_Seed_Boundary_init_forward(gpointer seed_data){ register Scheduler_Seed_Boundary *seed_info_boundary = seed_data; g_assert(seed_info_boundary); seed_info_boundary->curr_row = 0; seed_info_boundary->curr_interval = 0; seed_info_boundary->curr_pos = 0; seed_info_boundary->is_finished = FALSE; return; } static void Scheduler_Seed_Boundary_next_forward(gpointer seed_data){ register Scheduler_Seed_Boundary *seed_info_boundary = seed_data; register Boundary_Row *boundary_row = seed_info_boundary->boundary->row_list->pdata [seed_info_boundary->curr_row]; register Boundary_Interval *interval = boundary_row->interval_list->pdata [seed_info_boundary->curr_interval]; /* If more seeds in interval, move curr_pos */ if(seed_info_boundary->curr_pos < (interval->length-1)){ seed_info_boundary->curr_pos++; return; } /* If more intervals in row, move curr_interval */ if(seed_info_boundary->curr_interval < (boundary_row->interval_list->len-1)){ seed_info_boundary->curr_interval++; seed_info_boundary->curr_pos = 0; return; } /* If more rows in boundary, move curr_row */ if(seed_info_boundary->curr_row < (seed_info_boundary->boundary->row_list->len-1)){ seed_info_boundary->curr_row++; seed_info_boundary->curr_interval = 0; seed_info_boundary->curr_pos = 0; return; } seed_info_boundary->is_finished = TRUE; return; } static gboolean Scheduler_Seed_Boundary_get_forward(gpointer seed_data, Scheduler_Seed *seed_info){ register Scheduler_Seed_Boundary *seed_info_boundary = seed_data; register Boundary_Row *boundary_row = seed_info_boundary->boundary->row_list->pdata [seed_info_boundary->curr_row]; /* FIXME: fails when boundary is empty */ register Boundary_Interval *interval = boundary_row->interval_list->pdata [seed_info_boundary->curr_interval]; register SDP_Seed *seed; if(seed_info_boundary->is_finished) return FALSE; g_assert(interval->seed_id < seed_info_boundary->sdp_pair->seed_list->len); seed = seed_info_boundary->sdp_pair->seed_list->pdata[interval->seed_id]; seed_info->query_pos = interval->query_pos + seed_info_boundary->curr_pos; seed_info->target_pos = boundary_row->target_pos; seed_info->seed_id = interval->seed_id; seed_info->start_score = 0; return TRUE; } static void Scheduler_Seed_Boundary_end_func(gpointer seed_data, gint seed_id, C4_Score score, gint end_query_pos, gint end_target_pos, STraceback_Cell *stcell){ register Scheduler_Seed_Boundary *seed_info_boundary = seed_data; register SDP_Seed *seed = seed_info_boundary->sdp_pair->seed_list->pdata[seed_id]; /* Is best end for this seed */ if(seed->max_end->score < score){ seed->max_end->score = score; seed->max_end->query_pos = end_query_pos; seed->max_end->target_pos = end_target_pos; g_assert(stcell); seed->max_end->cell = STraceback_Cell_share(stcell); /* FIXME: free old seed max_end->cell ?? */ } return; } /**/ SDP *SDP_create(C4_Model *model){ register SDP *sdp = g_new0(SDP, 1); register C4_Portal *portal; /**/ sdp->thread_ref = ThreadRef_create(); sdp->sas = SDP_ArgumentSet_create(NULL); g_assert(model); g_assert(!model->is_open); g_assert(Lookahead_Mask_WIDTH > model->max_query_advance); g_assert(Lookahead_Mask_WIDTH > model->max_target_advance); sdp->model = C4_Model_share(model); /* Perform bidirectional SDP only * when there is a single match state and no shadows or spans */ sdp->use_boundary = TRUE; if((model->shadow_list->len == 0) && (model->span_list->len == 0)){ if(model->portal_list->len == 1){ portal = model->portal_list->pdata[0]; g_assert(portal); if(portal->transition_list->len == 1) sdp->use_boundary = FALSE; } } /**/ if(sdp->use_boundary){ sdp->find_starts_scheduler = Scheduler_create(model, FALSE, FALSE, TRUE, Scheduler_Seed_List_init_reverse, Scheduler_Seed_List_next_reverse, Scheduler_Seed_List_get_reverse, NULL, NULL, sdp->sas->dropoff); sdp->find_ends_scheduler = Scheduler_create(model, TRUE, TRUE, TRUE, Scheduler_Seed_Boundary_init_forward, Scheduler_Seed_Boundary_next_forward, Scheduler_Seed_Boundary_get_forward, NULL, Scheduler_Seed_Boundary_end_func, sdp->sas->dropoff); } else { sdp->find_starts_scheduler = Scheduler_create(model, FALSE, TRUE, FALSE, Scheduler_Seed_List_init_reverse, Scheduler_Seed_List_next_reverse, Scheduler_Seed_List_get_reverse, Scheduler_Seed_List_start_func, NULL, sdp->sas->dropoff); sdp->find_ends_scheduler = Scheduler_create(model, TRUE, TRUE, FALSE, Scheduler_Seed_List_init_forward, Scheduler_Seed_List_next_forward, Scheduler_Seed_List_get_forward, NULL, Scheduler_Seed_List_end_func, sdp->sas->dropoff); } return sdp; } SDP *SDP_share(SDP *sdp){ g_assert(sdp); ThreadRef_share(sdp->thread_ref); return sdp; } void SDP_destroy(SDP *sdp){ g_assert(sdp); if(ThreadRef_destroy(sdp->thread_ref)) return; if(sdp->find_starts_scheduler) Scheduler_destroy(sdp->find_starts_scheduler); if(sdp->find_ends_scheduler) Scheduler_destroy(sdp->find_ends_scheduler); C4_Model_destroy(sdp->model); g_free(sdp); return; } /**/ static void SDP_add_codegen(Scheduler *scheduler, GPtrArray *codegen_list){ register Codegen *codegen; if(scheduler){ codegen = Scheduler_make_Codegen(scheduler); g_ptr_array_add(codegen_list, codegen); } return; } GPtrArray *SDP_get_codegen_list(SDP *sdp){ register GPtrArray *codegen_list = g_ptr_array_new(); SDP_add_codegen(sdp->find_starts_scheduler, codegen_list); SDP_add_codegen(sdp->find_ends_scheduler, codegen_list); g_assert(codegen_list->len); return codegen_list; } /**/ static SDP_Terminal *SDP_Terminal_create(void){ register SDP_Terminal *terminal = g_new(SDP_Terminal, 1); terminal->query_pos = 0; terminal->target_pos = 0; terminal->score = C4_IMPOSSIBLY_LOW_SCORE; terminal->cell = NULL; return terminal; } static void SDP_Terminal_destroy(SDP_Terminal *terminal, STraceback *straceback){ if(terminal->cell) STraceback_Cell_destroy(terminal->cell, straceback); g_free(terminal); return; } /**/ static SDP_Seed *SDP_Seed_create(HSP *hsp, gint id){ register SDP_Seed *seed = g_new(SDP_Seed, 1); seed->seed_id = id; seed->hsp = hsp; seed->max_start = SDP_Terminal_create(); seed->max_end = SDP_Terminal_create(); seed->pq_node = NULL; return seed; } /* FIXME: optimisation: use RecycleBins for SDP_Terminal allocations */ static void SDP_Seed_destroy(SDP_Seed *seed, STraceback *fwd_straceback, STraceback *rev_straceback){ SDP_Terminal_destroy(seed->max_start, rev_straceback); SDP_Terminal_destroy(seed->max_end, fwd_straceback); g_free(seed); return; } /**/ static int SDP_compare_HSP_cobs_in_forward_dp_order(const void *a, const void *b){ register HSP **hsp_a = (HSP**)a, **hsp_b = (HSP**)b; register gint target_diff = HSP_target_cobs(*hsp_a) - HSP_target_cobs(*hsp_b); if(!target_diff) return HSP_query_cobs(*hsp_a) - HSP_query_cobs(*hsp_b); return target_diff; } static GPtrArray *SDP_Pair_create_seed_list(Comparison *comparison){ register gint i; register SDP_Seed *seed; register HSP *hsp, *prev_hsp = NULL; register GPtrArray *hsp_list = g_ptr_array_new(), *seed_list = g_ptr_array_new(); g_assert(Comparison_has_hsps(comparison)); /* Build a combined HSP list */ if(comparison->dna_hspset) for(i = 0; i < comparison->dna_hspset->hsp_list->len; i++){ hsp = comparison->dna_hspset->hsp_list->pdata[i]; g_ptr_array_add(hsp_list, hsp); } if(comparison->protein_hspset){ for(i = 0; i < comparison->protein_hspset->hsp_list->len; i++){ hsp = comparison->protein_hspset->hsp_list->pdata[i]; g_ptr_array_add(hsp_list, hsp); } } if(comparison->codon_hspset) for(i = 0; i < comparison->codon_hspset->hsp_list->len; i++){ hsp = comparison->codon_hspset->hsp_list->pdata[i]; g_ptr_array_add(hsp_list, hsp); } /* Sort hsps on cobs point in DP order */ qsort(hsp_list->pdata, hsp_list->len, sizeof(gpointer), SDP_compare_HSP_cobs_in_forward_dp_order); /* Make a seed for each unique HSP */ g_assert(hsp_list->len); for(i = 0; i < hsp_list->len; i++){ hsp = hsp_list->pdata[i]; if((!prev_hsp) || (HSP_query_cobs(hsp) != HSP_query_cobs(prev_hsp)) || (HSP_target_cobs(hsp) != HSP_target_cobs(prev_hsp))){ seed = SDP_Seed_create(hsp, seed_list->len); g_ptr_array_add(seed_list, seed); } prev_hsp = hsp; } g_ptr_array_free(hsp_list, TRUE); g_assert(seed_list->len); return seed_list; } SDP_Pair *SDP_Pair_create(SDP *sdp, SubOpt *subopt, Comparison *comparison, gpointer user_data){ register SDP_Pair *sdp_pair = g_new(SDP_Pair, 1); g_assert(sdp); sdp_pair->sdp = SDP_share(sdp); g_assert(comparison); g_assert(Comparison_has_hsps(comparison)); sdp_pair->comparison = Comparison_share(comparison); sdp_pair->user_data = user_data; sdp_pair->alignment_count = 0; sdp_pair->subopt = SubOpt_share(subopt); sdp_pair->seed_list = SDP_Pair_create_seed_list(comparison); sdp_pair->seed_list_by_score = NULL; sdp_pair->boundary = NULL; sdp_pair->last_score = C4_IMPOSSIBLY_LOW_SCORE; sdp_pair->fwd_straceback = STraceback_create(sdp->model, TRUE); sdp_pair->rev_straceback = STraceback_create(sdp->model, FALSE); return sdp_pair; } void SDP_Pair_destroy(SDP_Pair *sdp_pair){ register gint i; register SDP_Seed *seed; g_assert(sdp_pair); Comparison_destroy(sdp_pair->comparison); SDP_destroy(sdp_pair->sdp); for(i = 0; i < sdp_pair->seed_list->len; i++){ seed = sdp_pair->seed_list->pdata[i]; SDP_Seed_destroy(seed, sdp_pair->fwd_straceback, sdp_pair->rev_straceback); } g_ptr_array_free(sdp_pair->seed_list, TRUE); if(sdp_pair->seed_list_by_score) g_free(sdp_pair->seed_list_by_score); SubOpt_destroy(sdp_pair->subopt); if(sdp_pair->boundary) Boundary_destroy(sdp_pair->boundary); STraceback_destroy(sdp_pair->fwd_straceback); STraceback_destroy(sdp_pair->rev_straceback); g_free(sdp_pair); return; } /**/ static Boundary *SDP_Pair_find_start_points(SDP_Pair *sdp_pair){ register Scheduler_Seed_List *seed_info_list = Scheduler_Seed_List_create(sdp_pair, FALSE); register Boundary *boundary = NULL; register Scheduler_Pair *spair; if(sdp_pair->sdp->use_boundary) boundary = Boundary_create(); spair = Scheduler_Pair_create(sdp_pair->sdp->find_starts_scheduler, sdp_pair->rev_straceback, sdp_pair->comparison->query->len, sdp_pair->comparison->target->len, sdp_pair->subopt, boundary, -1, seed_info_list, sdp_pair->user_data); Scheduler_Pair_calculate(spair); Scheduler_Pair_destroy(spair); Scheduler_Seed_List_destroy(seed_info_list); if(boundary) Boundary_reverse(boundary); return boundary; } static void SDP_Pair_find_end_points(SDP_Pair *sdp_pair){ register Scheduler_Pair *spair; register Scheduler_Seed_List *seed_info_list = NULL; register Scheduler_Seed_Boundary *seed_info_boundary = NULL; if(sdp_pair->boundary){ g_assert(sdp_pair->boundary->row_list->len); seed_info_boundary = Scheduler_Seed_Boundary_create(sdp_pair->boundary, sdp_pair); spair = Scheduler_Pair_create(sdp_pair->sdp->find_ends_scheduler, sdp_pair->fwd_straceback, sdp_pair->comparison->query->len, sdp_pair->comparison->target->len, sdp_pair->subopt, sdp_pair->boundary, -1, seed_info_boundary, sdp_pair->user_data); } else { seed_info_list = Scheduler_Seed_List_create(sdp_pair, TRUE); spair = Scheduler_Pair_create(sdp_pair->sdp->find_ends_scheduler, sdp_pair->fwd_straceback, sdp_pair->comparison->query->len, sdp_pair->comparison->target->len, sdp_pair->subopt, sdp_pair->boundary, -1, seed_info_list, sdp_pair->user_data); } Scheduler_Pair_calculate(spair); Scheduler_Pair_destroy(spair); if(sdp_pair->boundary){ g_assert(seed_info_boundary); Scheduler_Seed_Boundary_destroy(seed_info_boundary); } else { g_assert(seed_info_list); Scheduler_Seed_List_destroy(seed_info_list); } return; } /**/ static Boundary *SDP_Pair_update_starts(SDP_Pair *sdp_pair){ register gint i; register SDP_Seed *seed; g_assert(sdp_pair->seed_list->len); /* Set max start scores low */ for(i = 0; i < sdp_pair->seed_list->len; i++){ seed = sdp_pair->seed_list->pdata[i]; seed->max_start->score = C4_IMPOSSIBLY_LOW_SCORE; if(!sdp_pair->sdp->use_boundary){ g_assert(seed->max_start->cell); STraceback_Cell_destroy(seed->max_start->cell, sdp_pair->rev_straceback); seed->max_start->cell = NULL; } } return SDP_Pair_find_start_points(sdp_pair); } static void SDP_Pair_update_ends(SDP_Pair *sdp_pair){ register gint i; register SDP_Seed *seed; g_assert(sdp_pair->seed_list->len); /* Set max end scores low */ for(i = 0; i < sdp_pair->seed_list->len; i++){ seed = sdp_pair->seed_list->pdata[i]; seed->max_end->score = C4_IMPOSSIBLY_LOW_SCORE; if(seed->max_end->cell){ STraceback_Cell_destroy(seed->max_end->cell, sdp_pair->fwd_straceback); seed->max_end->cell = NULL; } } SDP_Pair_find_end_points(sdp_pair); return; } /**/ static void SDP_Pair_add_traceback(SDP_Pair *sdp_pair, SDP_Seed *best_seed, gboolean is_forward, Alignment *alignment){ register STraceback_List *stlist = NULL; register gint i; register STraceback_Operation *operation; register AlignmentOperation *last; register STraceback_Operation *first; g_assert(best_seed); if(is_forward){ g_assert(best_seed->max_end); g_assert(best_seed->max_end->cell); g_assert(best_seed->max_end->cell->transition->output == sdp_pair->sdp->model->end_state->state); stlist = STraceback_List_create(sdp_pair->fwd_straceback, best_seed->max_end->cell); if(!sdp_pair->sdp->use_boundary){ /* Check join is valid */ g_assert(alignment->operation_list->len); last = alignment->operation_list->pdata [alignment->operation_list->len-1]; g_assert(stlist->operation_list->len); first = stlist->operation_list->pdata[0]; g_assert(last->transition->output == first->transition->output); } /* Include 1st operation only when using boundary */ for(i = sdp_pair->sdp->use_boundary?0:1; i < stlist->operation_list->len; i++){ operation = stlist->operation_list->pdata[i]; Alignment_add(alignment, operation->transition, operation->length); } } else { g_assert(best_seed->max_start->cell); g_assert(best_seed->max_start->cell->transition->input == sdp_pair->sdp->model->start_state->state); stlist = STraceback_List_create(sdp_pair->rev_straceback, best_seed->max_start->cell); /* Don't include last operation (to end state) */ for(i = stlist->operation_list->len-1; i >= 1; i--){ operation = stlist->operation_list->pdata[i]; Alignment_add(alignment, operation->transition, operation->length); } } STraceback_List_destroy(stlist); return; } static void SDP_Seed_find_start(SDP_Seed *best_seed, C4_Model *model){ register STraceback_Cell *stcell; best_seed->max_start->query_pos = best_seed->max_end->query_pos; best_seed->max_start->target_pos = best_seed->max_end->target_pos; stcell = best_seed->max_end->cell; g_assert(stcell); do { best_seed->max_start->query_pos -= (stcell->transition->advance_query * stcell->length); best_seed->max_start->target_pos -= (stcell->transition->advance_target * stcell->length); stcell = stcell->prev; g_assert(stcell); } while(stcell->transition->input != model->start_state->state); return; } static Alignment *SDP_Pair_find_path(SDP_Pair *sdp_pair, SDP_Seed *best_seed, Boundary *boundary){ register Region *alignment_region; register Alignment *alignment; /**/ if(sdp_pair->sdp->use_boundary) SDP_Seed_find_start(best_seed, sdp_pair->sdp->model); alignment_region = Region_create(best_seed->max_start->query_pos, best_seed->max_start->target_pos, best_seed->max_end->query_pos -best_seed->max_start->query_pos, best_seed->max_end->target_pos -best_seed->max_start->target_pos); alignment = Alignment_create(sdp_pair->sdp->model, alignment_region, best_seed->max_end->score); if(sdp_pair->sdp->use_boundary){ SDP_Pair_add_traceback(sdp_pair, best_seed, TRUE, alignment); } else { SDP_Pair_add_traceback(sdp_pair, best_seed, FALSE, alignment); SDP_Pair_add_traceback(sdp_pair, best_seed, TRUE, alignment); } g_assert(Alignment_is_valid(alignment, alignment_region, sdp_pair->user_data)); Region_destroy(alignment_region); return alignment; } static int SDP_compare_SDP_Seed_by_score(const void *a, const void *b){ register SDP_Seed **seed_a = (SDP_Seed**)a, **seed_b = (SDP_Seed**)b; return (*seed_b)->max_end->score - (*seed_a)->max_end->score; } Alignment *SDP_Pair_next_path(SDP_Pair *sdp_pair, C4_Score threshold){ register SDP_Seed *seed, *best_seed = NULL; register Alignment *alignment = NULL; register gint i; /* Make the start and end points up to date */ if(sdp_pair->alignment_count){ if(!sdp_pair->sdp->sas->single_pass_subopt){ /* multipass */ if(sdp_pair->boundary) Boundary_destroy(sdp_pair->boundary); sdp_pair->boundary = SDP_Pair_update_starts(sdp_pair); SDP_Pair_update_ends(sdp_pair); } } else { g_assert(!sdp_pair->boundary); sdp_pair->boundary = SDP_Pair_find_start_points(sdp_pair); /* Boundary_print_gnuplot(sdp_pair->boundary, 1); */ SDP_Pair_find_end_points(sdp_pair); if(sdp_pair->sdp->sas->single_pass_subopt){ sdp_pair->seed_list_by_score = g_new(gpointer, sdp_pair->seed_list->len); for(i = 0; i < sdp_pair->seed_list->len; i++) sdp_pair->seed_list_by_score[i] = sdp_pair->seed_list->pdata[i]; qsort(sdp_pair->seed_list_by_score, sdp_pair->seed_list->len, sizeof(gpointer), SDP_compare_SDP_Seed_by_score); sdp_pair->single_pass_pos = 0; } } /**/ if(sdp_pair->sdp->sas->single_pass_subopt){ /* singlepass */ best_seed = NULL; while(sdp_pair->single_pass_pos < sdp_pair->seed_list->len){ best_seed = sdp_pair->seed_list_by_score [sdp_pair->single_pass_pos++]; if(best_seed->max_end->score < threshold) return NULL; alignment = SDP_Pair_find_path(sdp_pair, best_seed, sdp_pair->boundary); if(SubOpt_overlaps_alignment(sdp_pair->subopt, alignment)){ Alignment_destroy(alignment); best_seed = NULL; } else { break; } } if(!best_seed) return NULL; /**/ } else { /* multipass */ g_assert(sdp_pair->seed_list->len); best_seed = sdp_pair->seed_list->pdata[0]; for(i = 1; i < sdp_pair->seed_list->len; i++){ seed = sdp_pair->seed_list->pdata[i]; if(best_seed->max_end->score < seed->max_end->score) best_seed = seed; } if(best_seed->max_end->score < threshold) return NULL; /* Score below threshold */ alignment = SDP_Pair_find_path(sdp_pair, best_seed, sdp_pair->boundary); } g_assert(best_seed); g_assert(alignment); /* Check score is less than previous */ g_assert((sdp_pair->last_score < 0) || (best_seed->max_end->score <= sdp_pair->last_score)); sdp_pair->alignment_count++; sdp_pair->last_score = best_seed->max_end->score; best_seed->max_end->score = C4_IMPOSSIBLY_LOW_SCORE; return alignment; } /**/ exonerate-2.4.0/src/sdp/boundary.c0000644000175000017500000006217511162714320014004 00000000000000/****************************************************************\ * * * C4 dynamic programming library - Boundary Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "boundary.h" /**/ static Boundary_Interval *Boundary_Interval_create(gint query_pos, gint seed_id){ register Boundary_Interval *interval = g_new(Boundary_Interval, 1); interval->query_pos = query_pos; interval->length = 1; interval->seed_id = seed_id; return interval; } static void Boundary_Interval_destroy(Boundary_Interval *interval){ g_free(interval); return; } static Boundary_Interval *Boundary_Interval_copy( Boundary_Interval *interval){ register Boundary_Interval *new_interval = Boundary_Interval_create(interval->query_pos, interval->seed_id); new_interval->length = interval->length; return new_interval; } #define Boundary_Interval_end(interval) \ ((interval)->query_pos \ + (interval)->length) /**/ static Boundary_Row *Boundary_Row_create(gint target_pos){ register Boundary_Row *boundary_row = g_new(Boundary_Row, 1); boundary_row->target_pos = target_pos; boundary_row->interval_list = g_ptr_array_new(); return boundary_row; } static void Boundary_Row_destroy(Boundary_Row *boundary_row){ register gint i; register Boundary_Interval *interval; for(i = 0; i < boundary_row->interval_list->len; i++){ interval = boundary_row->interval_list->pdata[i]; Boundary_Interval_destroy(interval); } g_ptr_array_free(boundary_row->interval_list, TRUE); g_free(boundary_row); return; } static void Boundary_Row_print_gnuplot(Boundary_Row *boundary_row, gint colour){ register gint i; register Boundary_Interval *interval; for(i = 0; i < boundary_row->interval_list->len; i++){ interval = boundary_row->interval_list->pdata[i]; g_print("set arrow from %d,%d to %d,%d heads ls %d\n", interval->query_pos, boundary_row->target_pos, interval->query_pos+interval->length, boundary_row->target_pos, colour); } return; } static Boundary_Row *Boundary_Row_copy(Boundary_Row *boundary_row){ register Boundary_Row *new_row = Boundary_Row_create( boundary_row->target_pos); register gint i; register Boundary_Interval *interval; g_assert(boundary_row->interval_list->len); for(i = 0; i < boundary_row->interval_list->len; i++){ interval = boundary_row->interval_list->pdata[i]; g_ptr_array_add(new_row->interval_list, Boundary_Interval_copy(interval)); } return new_row; } static void Boundary_Row_reverse(Boundary_Row *boundary_row){ register gint a, z; register Boundary_Interval *swap; for(a = 0, z = boundary_row->interval_list->len-1; a < z; a++, z--){ swap = boundary_row->interval_list->pdata[a]; boundary_row->interval_list->pdata[a] = boundary_row->interval_list->pdata[z]; boundary_row->interval_list->pdata[z] = swap; } return; } static void Boundary_Row_add_interval(Boundary_Row *boundary_row, gint query_pos, gint length, gint seed_id){ register Boundary_Interval *interval = Boundary_Interval_create(query_pos, seed_id); interval->length = length; g_ptr_array_add(boundary_row->interval_list, interval); return; } static void Boundary_Row_select(Boundary_Row *boundary_row, Boundary_Row *sub_boundary_row, Region *region){ register gint i, query_start, query_end; register Boundary_Interval *interval; for(i = 0; i < boundary_row->interval_list->len; i++){ interval = boundary_row->interval_list->pdata[i]; if((interval->query_pos+interval->length) < region->query_start) continue; if(interval->query_pos > Region_query_end(region)) break; query_start = MAX(interval->query_pos, region->query_start); query_end = MIN(interval->query_pos+interval->length, Region_query_end(region)); if(query_end-query_start > 0) Boundary_Row_add_interval(sub_boundary_row, query_start, query_end-query_start, interval->seed_id); } return; } static Boundary_Interval *Boundary_Row_get_last_interval( Boundary_Row *boundary_row){ register Boundary_Interval *interval; g_assert(boundary_row); g_assert(boundary_row->interval_list); if(!boundary_row->interval_list->len) return NULL; interval = boundary_row->interval_list->pdata [boundary_row->interval_list->len-1]; g_assert(interval); return interval; } void Boundary_Row_prepend(Boundary_Row *boundary_row, gint query_pos, gint seed_id){ register Boundary_Interval *interval = Boundary_Row_get_last_interval(boundary_row); if(interval && (interval->seed_id == seed_id) && ((interval->query_pos - 1) == query_pos)){ interval->query_pos = query_pos; interval->length++; return; } Boundary_Row_add_interval(boundary_row, query_pos, 1, seed_id); return; } static void Boundary_Row_append_interval(Boundary_Row *boundary_row, Boundary_Interval *interval){ register Boundary_Interval *last_interval = Boundary_Row_get_last_interval(boundary_row); g_assert(interval->query_pos >= 0); g_assert(interval->seed_id >= 0); g_assert(interval->length > 0); if(last_interval && (last_interval->seed_id == interval->seed_id) && ((Boundary_Interval_end(last_interval) == interval->query_pos))){ last_interval->length += interval->length; return; } Boundary_Row_add_interval(boundary_row, interval->query_pos, interval->length, interval->seed_id); return; } typedef enum { Boundary_State_NULL = 0, Boundary_State_START = (1<<0), Boundary_State_END = (1<<1), Boundary_State_BOUNDARY = (1<<2), Boundary_State_INSERT = (1<<3) } Boundary_State; #define Boundary_State_pair(prev_state, curr_state) \ (((prev_state) << 4) | (curr_state)) static void Boundary_Row_insert_append(Boundary_Row *boundary_row, Boundary_Interval *interval, Boundary_Interval *boundary_stored, Boundary_Interval *insert_stored, Boundary_State *prev_state, Boundary_State *curr_state){ Boundary_Interval temp_interval; switch(Boundary_State_pair(*prev_state, *curr_state)){ case Boundary_State_pair(Boundary_State_START, Boundary_State_INSERT): /* S->I: report I, store I */ Boundary_Row_append_interval(boundary_row, interval); insert_stored->query_pos = interval->query_pos; insert_stored->length = interval->length; insert_stored->seed_id = interval->seed_id; break; case Boundary_State_pair(Boundary_State_START, Boundary_State_BOUNDARY): /* S->B: store B */ boundary_stored->query_pos = interval->query_pos; boundary_stored->length = interval->length; boundary_stored->seed_id = interval->seed_id; break; case Boundary_State_pair(Boundary_State_INSERT, Boundary_State_BOUNDARY): case Boundary_State_pair(Boundary_State_BOUNDARY, Boundary_State_BOUNDARY): /* ON B: report any stored_B * truncate B (after last I) * store B if +ve len else clear */ if(boundary_stored->seed_id != -1){ Boundary_Row_append_interval(boundary_row, boundary_stored); } boundary_stored->query_pos = interval->query_pos; boundary_stored->length = interval->length; boundary_stored->seed_id = interval->seed_id; if(insert_stored->seed_id != -1){ if(interval->query_pos < Boundary_Interval_end(insert_stored)){ boundary_stored->length = (Boundary_Interval_end(interval) - Boundary_Interval_end(insert_stored)); boundary_stored->query_pos = Boundary_Interval_end(insert_stored); boundary_stored->seed_id = interval->seed_id; if(boundary_stored->length <= 0){ boundary_stored->query_pos = -1; boundary_stored->length = -1; boundary_stored->seed_id = -1; } } } break; case Boundary_State_pair(Boundary_State_INSERT, Boundary_State_INSERT): case Boundary_State_pair(Boundary_State_BOUNDARY, Boundary_State_INSERT): /* ON I: report any stored_B before I * store any remaining stored B after I * report I * store I */ /**/ if(boundary_stored->seed_id != -1){ if(interval->query_pos < Boundary_Interval_end(boundary_stored)){ temp_interval.query_pos = boundary_stored->query_pos; temp_interval.length = interval->query_pos - boundary_stored->query_pos; temp_interval.seed_id = boundary_stored->seed_id; if(temp_interval.length > 0) Boundary_Row_append_interval(boundary_row, &temp_interval); } else { Boundary_Row_append_interval(boundary_row, boundary_stored); } /**/ if(Boundary_Interval_end(interval) < Boundary_Interval_end(boundary_stored)){ boundary_stored->length = Boundary_Interval_end(boundary_stored) - Boundary_Interval_end(interval); boundary_stored->query_pos = Boundary_Interval_end(interval); /* (keeps seed_id) */ } else { boundary_stored->query_pos = -1; boundary_stored->length = -1; boundary_stored->seed_id = -1; } } /**/ Boundary_Row_append_interval(boundary_row, interval); insert_stored->query_pos = interval->query_pos; insert_stored->length = interval->length; insert_stored->seed_id = interval->seed_id; break; case Boundary_State_pair(Boundary_State_INSERT, Boundary_State_END): /*fallthrough*/ case Boundary_State_pair(Boundary_State_BOUNDARY, Boundary_State_END): /* [IB]->E: report stored B */ if(boundary_stored->seed_id != -1) Boundary_Row_append_interval(boundary_row, boundary_stored); break; default: g_error("Bad boundary state pair [%d,%d]", *prev_state, *curr_state); break; } return; } static gboolean Boundary_Row_is_valid(Boundary_Row *boundary_row){ register gint i; register Boundary_Interval *interval, *prev; g_assert(boundary_row->interval_list->len); prev = boundary_row->interval_list->pdata[0]; g_assert(prev->length > 0); g_assert(prev->seed_id >= 0); for(i = 1; i < boundary_row->interval_list->len; i++){ interval = boundary_row->interval_list->pdata[i]; g_assert(interval->length > 0); g_assert(interval->seed_id >= 0); g_assert(prev->query_pos < interval->query_pos); g_assert(Boundary_Interval_end(prev) <= interval->query_pos); prev = interval; } return TRUE; } static void Boundary_Row_insert(Boundary_Row *boundary_row, Boundary_Row *insert_row){ register gint i = 0, j = 0; register Boundary_Interval *boundary_interval, *insert_interval; register Boundary_Row *combined_row = Boundary_Row_create(boundary_row->target_pos); Boundary_Interval boundary_stored = {-1, -1, -1}, insert_stored = {-1, -1, -1}; Boundary_State curr_state = Boundary_State_START, prev_state = Boundary_State_NULL; g_assert(boundary_row->interval_list->len); g_assert(insert_row->interval_list->len); g_assert(boundary_row->target_pos == insert_row->target_pos); do { if(i == insert_row->interval_list->len) break; if(j == boundary_row->interval_list->len) break; insert_interval = insert_row->interval_list->pdata[i]; boundary_interval = boundary_row->interval_list->pdata[j]; if(insert_interval->query_pos < boundary_interval->query_pos){ prev_state = curr_state; curr_state = Boundary_State_INSERT; Boundary_Row_insert_append(combined_row, insert_interval, &boundary_stored, &insert_stored, &prev_state, &curr_state); i++; } else { prev_state = curr_state; curr_state = Boundary_State_BOUNDARY; Boundary_Row_insert_append(combined_row, boundary_interval, &boundary_stored, &insert_stored, &prev_state, &curr_state); j++; } } while(TRUE); while(i < insert_row->interval_list->len){ prev_state = curr_state; curr_state = Boundary_State_INSERT; insert_interval = insert_row->interval_list->pdata[i++]; Boundary_Row_insert_append(combined_row, insert_interval, &boundary_stored, &insert_stored, &prev_state, &curr_state); } while(j < boundary_row->interval_list->len){ prev_state = curr_state; curr_state = Boundary_State_BOUNDARY; boundary_interval = boundary_row->interval_list->pdata[j++]; Boundary_Row_insert_append(combined_row, boundary_interval, &boundary_stored, &insert_stored, &prev_state, &curr_state); } prev_state = curr_state; curr_state = Boundary_State_END; /* B -> E report stored */ Boundary_Row_insert_append(combined_row, NULL, &boundary_stored, &insert_stored, &prev_state, &curr_state); for(i = 0; i < boundary_row->interval_list->len; i++){ boundary_interval = boundary_row->interval_list->pdata[i]; Boundary_Interval_destroy(boundary_interval); } g_ptr_array_free(boundary_row->interval_list, TRUE); boundary_row->interval_list = combined_row->interval_list; g_free(combined_row); g_assert(boundary_row->interval_list->len); g_assert(Boundary_Row_is_valid(boundary_row)); return; } /**/ Boundary *Boundary_create(void){ register Boundary *boundary = g_new(Boundary, 1); boundary->row_list = g_ptr_array_new(); boundary->ref_count = 1; return boundary; } void Boundary_destroy(Boundary *boundary){ register gint i; register Boundary_Row *boundary_row; g_assert(boundary); if(--boundary->ref_count) return; for(i = 0; i < boundary->row_list->len; i++){ boundary_row = boundary->row_list->pdata[i]; Boundary_Row_destroy(boundary_row); } g_ptr_array_free(boundary->row_list, TRUE); g_free(boundary); return; } Boundary *Boundary_share(Boundary *boundary){ g_assert(boundary); boundary->ref_count++; return boundary; } /**/ Boundary_Row *Boundary_add_row(Boundary *boundary, gint target_pos){ register Boundary_Row *boundary_row; g_assert(boundary); g_assert(boundary->row_list); boundary_row = Boundary_Row_create(target_pos); g_ptr_array_add(boundary->row_list, boundary_row); return boundary_row; } Boundary_Row *Boundary_get_last_row(Boundary *boundary){ register Boundary_Row *boundary_row; g_assert(boundary); g_assert(boundary->row_list); if(!boundary->row_list->len) return NULL; boundary_row = boundary->row_list->pdata[boundary->row_list->len-1]; g_assert(boundary_row); return boundary_row; } void Boundary_remove_empty_last_row(Boundary *boundary){ register Boundary_Row *boundary_row = Boundary_get_last_row(boundary); if(!boundary_row) return; if(boundary_row->interval_list->len) return; /* Don't remove if not empty */ Boundary_Row_destroy(boundary_row); g_ptr_array_set_size(boundary->row_list, boundary->row_list->len-1); return; } /* FIXME: optimisation: use RecycleBin for boundary rows and cells */ /**/ void Boundary_reverse(Boundary *boundary){ register gint a, z; register Boundary_Row *swap; for(a = 0, z = boundary->row_list->len-1; a < z; a++, z--){ swap = boundary->row_list->pdata[a]; boundary->row_list->pdata[a] = boundary->row_list->pdata[z]; boundary->row_list->pdata[z] = swap; Boundary_Row_reverse(boundary->row_list->pdata[a]); Boundary_Row_reverse(boundary->row_list->pdata[z]); } if(boundary->row_list->len & 1){ /* Reverse central row */ a = boundary->row_list->len >> 1; Boundary_Row_reverse(boundary->row_list->pdata[a]); } return; } /* Reverse order of the rows and intervals */ Boundary *Boundary_select(Boundary *boundary, Region *region){ register Boundary *sub_boundary = Boundary_create(); register Boundary_Row *boundary_row, *sub_boundary_row; register gint i; g_assert(boundary); for(i = 0; i < boundary->row_list->len; i++){ boundary_row = boundary->row_list->pdata[i]; if(boundary_row->target_pos < region->target_start) continue; if(boundary_row->target_pos > Region_target_end(region)) break; sub_boundary_row = Boundary_add_row(sub_boundary, boundary_row->target_pos); Boundary_Row_select(boundary_row, sub_boundary_row, region); Boundary_remove_empty_last_row(sub_boundary); } return sub_boundary; } /* Returns only the boundary within region */ static gboolean Boundary_is_valid(Boundary *boundary){ register gint i; register Boundary_Row *prev_row, *row; g_assert(boundary); g_assert(boundary->row_list->len); prev_row = boundary->row_list->pdata[0]; g_assert(prev_row); g_assert(Boundary_Row_is_valid(prev_row)); for(i = 1; i < boundary->row_list->len; i++){ row = boundary->row_list->pdata[i]; g_assert(row); g_assert(prev_row->target_pos < row->target_pos); g_assert(Boundary_Row_is_valid(row)); prev_row = row; } return TRUE; } void Boundary_insert(Boundary *boundary, Boundary *insert){ register gint i = 0, j = 0; register Boundary_Row *boundary_row, *insert_row, *last_boundary_row; register GPtrArray *combined_row_list = g_ptr_array_new(); do { if(i == insert->row_list->len) break; if(j == boundary->row_list->len) break; insert_row = insert->row_list->pdata[i]; boundary_row = boundary->row_list->pdata[j]; if(insert_row->target_pos < boundary_row->target_pos){ i++; if(j){ last_boundary_row = boundary->row_list->pdata[j-1]; if(insert_row->target_pos == last_boundary_row->target_pos){ Boundary_Row_insert(last_boundary_row, insert_row); } else { g_ptr_array_add(combined_row_list, Boundary_Row_copy(insert_row)); } } else { g_ptr_array_add(combined_row_list, Boundary_Row_copy(insert_row)); } } else { j++; g_assert(boundary_row->interval_list->len); g_ptr_array_add(combined_row_list, boundary_row); } } while(TRUE); /* Handle repeated last row */ if((i < insert->row_list->len) && j){ last_boundary_row = boundary->row_list->pdata[j-1]; insert_row = insert->row_list->pdata[i]; if(insert_row->target_pos == last_boundary_row->target_pos){ Boundary_Row_insert(last_boundary_row, insert_row); i++; } } while(i < insert->row_list->len){ insert_row = insert->row_list->pdata[i++]; g_ptr_array_add(combined_row_list, Boundary_Row_copy(insert_row)); } while(j < boundary->row_list->len){ boundary_row = boundary->row_list->pdata[j++]; g_assert(boundary_row->interval_list->len); g_ptr_array_add(combined_row_list, boundary_row); } g_ptr_array_free(boundary->row_list, TRUE); boundary->row_list = combined_row_list; g_assert(Boundary_is_valid(boundary)); return; } /**/ static Region *Boundary_find_bounding_region(Boundary *boundary){ register Region *region = Region_create_blank(); register gint i, qmin, qmax; register Boundary_Row *boundary_row; register Boundary_Interval *interval; g_assert(boundary->row_list->len); /* take first row */ boundary_row = boundary->row_list->pdata[0]; region->target_start = boundary_row->target_pos; /* take 1st interval */ interval = boundary_row->interval_list->pdata[0]; qmin = interval->query_pos; /* take last interval */ interval = boundary_row->interval_list->pdata [boundary_row->interval_list->len-1]; qmax = interval->query_pos + interval->length; /* take last row */ boundary_row = boundary->row_list->pdata[boundary->row_list->len-1]; region->target_length = boundary_row->target_pos; for(i = 1; i < boundary->row_list->len; i++){ boundary_row = boundary->row_list->pdata[i]; g_assert(boundary_row->interval_list->len); /* take first interval */ interval = boundary_row->interval_list->pdata[0]; if(qmin > interval->query_pos) qmin = interval->query_pos; /* take last interval */ interval = boundary_row->interval_list->pdata [boundary_row->interval_list->len-1]; if(qmax < (interval->query_pos + interval->length)) qmax = interval->query_pos + interval->length; } region->query_start = qmin; region->query_length = qmax - qmin; return region; } void Boundary_print_gnuplot(Boundary *boundary, gint colour){ register gint i; register Boundary_Row *boundary_row; /* Find bounding region */ register Region *region = Boundary_find_bounding_region(boundary); g_print("# Begin gnuplot commands\n"); g_print("set xrange [%d:%d]\n", region->query_start, Region_query_end(region)); g_print("set yrange [%d:%d]\n", region->target_start, Region_target_end(region)); g_print("set style line %d\n", colour); g_print("set grid\n"); g_print("unset key\n"); /**/ for(i = 0; i < boundary->row_list->len; i++){ boundary_row = boundary->row_list->pdata[i]; Boundary_Row_print_gnuplot(boundary_row, colour); } g_print("plot 0\n"); g_print("# End gnuplot commands\n"); Region_destroy(region); return; } /**/ exonerate-2.4.0/src/sdp/scheduler.c0000644000175000017500000020055611162714330014135 00000000000000/****************************************************************\ * * * C4 dynamic programming library - DP Scheduler * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For qsort() */ #include "scheduler.h" #include "matrix.h" #include "cgutil.h" #include /* For memset() */ static gboolean Scheduler_Pair_is_valid(Scheduler_Pair *spair); static void Scheduler_Pair_align_rows(Scheduler_Pair *spair); #ifdef USE_COMPILED_MODELS #include "c4_model_archive.h" #endif /* USE_COMPILED_MODELS */ static C4_Span **Scheduler_get_span_map(C4_Model *model){ register gint i, j; register C4_Span *span, **span_map = g_new0(C4_Span*, model->state_list->len); register C4_State *state; for(i = 0; i < model->state_list->len; i++){ state = model->state_list->pdata[i]; for(j = 0; j < model->span_list->len; j++){ span = model->span_list->pdata[j]; if(state == span->span_state) span_map[state->id] = span; } } return span_map; } Scheduler *Scheduler_create(C4_Model *model, gboolean is_forward, gboolean has_traceback, gboolean use_boundary, Scheduler_Seed_init_Func init_func, Scheduler_Seed_next_Func next_func, Scheduler_Seed_get_Func get_func, Scheduler_start_Func start_func, Scheduler_end_Func end_func, C4_Score dropoff){ register Scheduler *scheduler = g_new(Scheduler, 1); register gchar *raw_name; register Codegen_ArgumentSet *cas = Codegen_ArgumentSet_create(NULL); g_assert(!(is_forward && start_func)); g_assert(!(use_boundary && start_func)); g_assert(!((!is_forward) && end_func)); scheduler->is_forward = is_forward; scheduler->has_traceback = has_traceback; scheduler->use_boundary = use_boundary; /**/ scheduler->shadow_start = 3; /* score, max, seed */ scheduler->cell_score_offset = sizeof(Scheduler_Cell); scheduler->cell_traceback_offset = scheduler->cell_score_offset + Matrix2d_size(model->state_list->len, scheduler->shadow_start + model->shadow_list->len, sizeof(C4_Score)); scheduler->cell_size = scheduler->cell_traceback_offset; if(has_traceback) scheduler->cell_size += (sizeof(STraceback_Cell*) * model->state_list->len); scheduler->init_func = init_func; scheduler->next_func = next_func; scheduler->get_func = get_func; scheduler->start_func = start_func; scheduler->end_func = end_func; scheduler->model = C4_Model_share(model); scheduler->span_map = Scheduler_get_span_map(model); scheduler->dropoff = dropoff; /**/ raw_name = g_strdup_printf("Scheduler_Cell_process_%s:%s:%s:%s", model->name, is_forward?"forward":"reverse", has_traceback?"traceback":"notraceback", use_boundary?"boundary":"seeded"); scheduler->name = Codegen_clean_path_component(raw_name); g_free(raw_name); if(cas->use_compiled){ #ifdef USE_COMPILED_MODELS scheduler->cell_func = (Scheduler_DP_Func)Bootstrapper_lookup(scheduler->name); if(!scheduler->cell_func){ g_warning("Could not find compiled implementation of [%s]", scheduler->name); scheduler->cell_func = Scheduler_Cell_process; } #else /* USE_COMPILED_MODELS */ g_warning("No compiled models - will be very slow"); scheduler->cell_func = Scheduler_Cell_process; #endif /* USE_COMPILED_MODELS */ } else { scheduler->cell_func = Scheduler_Cell_process; } g_assert(scheduler->cell_func); return scheduler; } void Scheduler_destroy(Scheduler *scheduler){ C4_Model_destroy(scheduler->model); if(scheduler->span_map) g_free(scheduler->span_map); g_free(scheduler->name); g_free(scheduler); return; } static void Scheduler_end_transition(Codegen *codegen, gint transition_id){ register gint sorry = 'o'; Codegen_indent(codegen, 1); Codegen_printf(codegen, "%c%c%c%c end_of_transition_%d;\n", sorry-8, sorry, sorry+5, sorry, transition_id); Codegen_indent(codegen, -1); return; } static void Scheduler_implement_DP(Scheduler *scheduler, Codegen *codegen){ register gint i, j, input_pos, output_pos; register C4_Transition *transition; register C4_Span *span; register C4_Shadow *shadow; Codegen_printf(codegen, "void %s(Scheduler_Cell *cell,\n" " Scheduler_Row *row,\n" " Scheduler_Pair *spair){\n", scheduler->name); Codegen_indent(codegen, 1); Codegen_printf(codegen, "register gint seed_id = 0;\n"); Codegen_printf(codegen, "register gint src_query_pos,\n" " src_target_pos,\n" " dst_query_pos,\n" " dst_target_pos;\n"); Codegen_printf(codegen, "register C4_Score src_score, dst_score,\n" " transition_score, max_score;\n"); Codegen_printf(codegen, "register C4_Transition *transition;\n"); if(scheduler->model->span_list->len){ Codegen_printf(codegen, "Scheduler_SpanSeed span_seed;\n"); Codegen_printf(codegen, "register Scheduler_SpanData *span_data;\n"); } if(scheduler->model->shadow_list->len) Codegen_printf(codegen, "register C4_Shadow *shadow;\n"); Codegen_printf(codegen, "register Scheduler_Row *dst_row;\n"); Codegen_printf(codegen, "register Scheduler_Cell *dst_cell;\n"); Codegen_printf(codegen, "src_query_pos = %scell->query_pos;\n", scheduler->is_forward?"":"-"); Codegen_printf(codegen, "src_target_pos = %srow->target_pos;\n", scheduler->is_forward?"":"-"); /* Calculate transitions in reverse order for forward DP */ for(i = scheduler->model->transition_list->len-1; i >= 0; i--){ transition = scheduler->model->transition_list->pdata[i]; if(C4_Transition_is_span(transition)){ if(scheduler->is_forward && scheduler->use_boundary){ /* Copy output to span history */ span = scheduler->span_map[transition->output->id]; if(span){ input_pos = transition->input->id; Codegen_printf(codegen, "span_data = spair->span_history->span_data[%d];\n", span->id); Codegen_printf(codegen, "span_seed.score = cell->score[%d][0];\n", input_pos); Codegen_printf(codegen, "if(span_seed.score >= 0){\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "span_seed.max = cell->score[%d][1];\n" "span_seed.seed_id = cell->score[%d][2];\n" "span_seed.cell = cell->traceback[%d];\n" "span_seed.query_entry = src_query_pos;\n" "span_seed.target_entry = src_target_pos;\n" "span_seed.shadow_data = &cell->score[%d][%d];\n", input_pos, input_pos, input_pos, input_pos, scheduler->shadow_start); /* FIXME: func call */ Codegen_printf(codegen, "Scheduler_SpanData_submit(span_data,\n" " &span_seed, spair->span_seed_cache, spair);\n"); /**/ Codegen_printf(codegen, "}"); Codegen_indent(codegen, -1); } } continue; } Codegen_printf(codegen, "transition = spair->scheduler->model" "->transition_list->pdata[%d];\n", i); if(scheduler->is_forward){ Codegen_printf(codegen, "dst_query_pos = src_query_pos + %d;\n", transition->advance_query); Codegen_printf(codegen, "dst_target_pos = src_target_pos + %d;\n", transition->advance_target); Codegen_printf(codegen, "if((dst_query_pos > Region_query_end(spair->region))\n" "|| (dst_target_pos > Region_target_end(spair->region)))\n"); Scheduler_end_transition(codegen, i); input_pos = transition->input->id; output_pos = transition->output->id; if(scheduler->use_boundary){ /* Copy curr best span to input */ span = scheduler->span_map[transition->input->id]; if(span){ Codegen_printf(codegen, "if(cell->permit_span_thaw){\n"); Codegen_indent(codegen, 1); Codegen_printf(codegen, "span_data =\n" " spair->span_history->span_data[%d];\n", span->id); /* FIXME: func call */ Codegen_printf(codegen, "Scheduler_SpanData_get_curr(\n" " span_data, spair->span_seed_cache,\n" " cell->query_pos, row->target_pos,\n" " spair->straceback);\n"); /* Protect good present score from overwriting */ Codegen_printf(codegen, "if(span_data->curr_span_seed\n" "&& (cell->score[%d][0]\n" " < span_data->curr_span_seed->score)){\n", input_pos); Codegen_indent(codegen, 1); Codegen_printf(codegen, "cell->score[%d][0] = %s;\n" "cell->score[%d][1] = %s;\n" "cell->score[%d][2] = %s;\n", input_pos, "span_data->curr_span_seed->score", input_pos, "span_data->curr_span_seed->max", input_pos, "span_data->curr_span_seed->seed_id"); /* FIXME: func call */ Codegen_printf(codegen, "cell->traceback[%d] = Scheduler_Cell_add_span(\n" " span_data->curr_span_seed->cell,\n" " spair->straceback,\n" " spair->scheduler->span_map[%d],\n" " src_query_pos\n" " -span_data->curr_span_seed->query_entry,\n" " src_target_pos\n" " -span_data->curr_span_seed->target_entry);\n", input_pos, transition->input->id); /* Thaw shadows */ for(j = 0; j < scheduler->model->total_shadow_designations; j++){ Codegen_printf(codegen, "cell->score[%d][%d] = span_data" "->curr_span_seed->shadow_data[%d];\n", input_pos, scheduler->shadow_start+j, j); } Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); } } /* End Shadows */ /* This replaces Scheduler_Cell_shadow_end() */ for(j = 0; j < transition->dst_shadow_list->len; j++){ shadow = transition->dst_shadow_list->pdata[j]; Codegen_printf(codegen, "shadow = transition->dst_shadow_list->pdata[%d];\n", j); Codegen_printf(codegen, "shadow->end_func(cell->score[%d][%d],\n" " dst_query_pos, dst_target_pos,\n" " spair->user_data);\n", input_pos, scheduler->shadow_start+shadow->designation); } /* FIXME: func call */ Codegen_printf(codegen, "transition_score = C4_Calc_score(transition->calc,\n" " src_query_pos, src_target_pos,\n" " spair->user_data);\n"); } else { /* reverse pass */ Codegen_printf(codegen, "dst_query_pos = src_query_pos - %d;\n" "dst_target_pos = src_target_pos - %d;\n", transition->advance_query, transition->advance_target); Codegen_printf(codegen, "if((dst_query_pos < spair->region->query_start)\n" "|| (dst_target_pos < spair->region->target_start))\n"); Scheduler_end_transition(codegen, i); input_pos = transition->output->id; output_pos = transition->input->id; /* Allow extension through shadowed transitions */ if(transition->dst_shadow_list->len) Codegen_printf(codegen, "transition_score = 0;\n"); else /* FIXME: func call */ Codegen_printf(codegen, "transition_score = C4_Calc_score(transition->calc,\n" " dst_query_pos, dst_target_pos,\n" " spair->user_data);\n"); } Codegen_printf(codegen, "src_score = cell->score[%d][0];\n" "max_score = cell->score[%d][1];\n" "seed_id = cell->score[%d][2];\n" "dst_score = src_score + transition_score;\n", input_pos, input_pos, input_pos); Codegen_printf(codegen, "if(dst_score < 0)\n"); Scheduler_end_transition(codegen, i); Codegen_printf(codegen, "if((max_score - dst_score)" " > spair->scheduler->dropoff)\n"); Scheduler_end_transition(codegen, i); /**/ if(C4_Transition_is_match(transition)){ Codegen_printf(codegen, "if(spair->subopt_index){\n"); Codegen_indent(codegen, 1); /* FIXME: func call */ Codegen_printf(codegen, "if(SubOpt_Index_is_blocked(\n" "spair->subopt_index,\n" "src_query_pos-spair->region->query_start))\n"); Scheduler_end_transition(codegen, i); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); } /* Fetch/create and set the dst cell */ /* FIXME: func call */ Codegen_printf(codegen, "dst_row = Lookahead_get(spair->row_index, %d);\n", transition->advance_target); Codegen_printf(codegen, "if(!dst_row){\n"); Codegen_indent(codegen, 1); /* FIXME: func call */ Codegen_printf(codegen, "dst_row = Scheduler_Row_create(%sdst_target_pos, spair);\n", scheduler->is_forward?"":"-"); /* FIXME: func call */ Codegen_printf(codegen, "Lookahead_set(spair->row_index, %d, dst_row);\n", transition->advance_target); Codegen_printf(codegen, "Lookahead_move(dst_row->cell_index, cell->query_pos);\n"); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); /* FIXME: func call */ Codegen_printf(codegen, "dst_cell = Lookahead_get(dst_row->cell_index, %d);\n", transition->advance_query); Codegen_printf(codegen, "if(dst_cell){\n"); Codegen_indent(codegen, 1); /* Keep score if higher (don't tie break on max_score) */ Codegen_printf(codegen, "if(dst_score <= dst_cell->score[%d][0])\n", output_pos); Scheduler_end_transition(codegen, i); Codegen_indent(codegen, -1); Codegen_printf(codegen, "} else {\n"); Codegen_indent(codegen, 1); /* New cell */ /* FIXME: func call */ Codegen_printf(codegen, "dst_cell = Scheduler_Cell_create(%sdst_query_pos, FALSE, spair);\n", scheduler->is_forward?"":"-"); /* FIXME: func call */ Codegen_printf(codegen, "Lookahead_set(dst_row->cell_index, %d, dst_cell);\n", transition->advance_query); Codegen_printf(codegen, "}\n"); Codegen_indent(codegen, -1); /* FIXME: func call */ Codegen_printf(codegen, "Scheduler_Cell_assign(spair,\n" " cell, %d,\n" " dst_cell, %d,\n" " dst_score, max_score,\n" " transition, seed_id,\n" " dst_query_pos, dst_target_pos);\n", input_pos, output_pos); Codegen_printf(codegen, "end_of_transition_%d:\n", i); } /**/ Codegen_printf(codegen, "return;\n"); Codegen_printf(codegen, "}\n\n"); Codegen_indent(codegen, -1); return; } /* Equivalent to Scheduler_Cell_process() */ Codegen *Scheduler_make_Codegen(Scheduler *scheduler){ register Codegen *codegen = Codegen_create(NULL, scheduler->name); CGUtil_print_header(codegen, scheduler->model); Codegen_printf(codegen, "/* Seeded Viterbi DP implementation. */\n" "\n" "#include \"scheduler.h\"\n" "\n"); Scheduler_implement_DP(scheduler, codegen); CGUtil_print_footer(codegen); CGUtil_compile(codegen, scheduler->model); return codegen; } /* This only implements the Scheduler_Cell_process() function, * as is is the most time consuming part of the SDP, * and that which is mostly likely to benefit * from the codegen optimisations. */ /**/ static gpointer Scheduler_Cache_get(SparseCache *cache, gint slot, gint pos){ register gpointer *cache_data = SparseCache_get(cache, pos); return cache_data[slot]; } static void Scheduler_Cache_set(SparseCache *cache, gint slot, gint pos, gpointer data){ register gpointer *cache_data = SparseCache_get(cache, pos); cache_data[slot] = data; return; } /**/ static Scheduler_SpanSeed *Scheduler_SpanSeed_create(C4_Score score, C4_Score max, gint seed_id, gint query_entry, gint target_entry, STraceback_Cell *cell, C4_Score *shadow_data, Scheduler *scheduler){ register Scheduler_SpanSeed *span_seed = g_new(Scheduler_SpanSeed, 1); register gint i; register C4_Model *model = scheduler->model; span_seed->score = score; span_seed->max = max; span_seed->seed_id = seed_id; span_seed->query_entry = query_entry; span_seed->target_entry = target_entry; span_seed->cell = STraceback_Cell_share(cell); if(scheduler->is_forward && model->total_shadow_designations){ span_seed->shadow_data = g_new0(C4_Score, model->total_shadow_designations); if(shadow_data) for(i = 0; i < model->total_shadow_designations; i++) span_seed->shadow_data[i] = shadow_data[i]; } else { span_seed->shadow_data = NULL; } return span_seed; } static void Scheduler_SpanSeed_destroy(Scheduler_SpanSeed *span_seed, STraceback *straceback){ if(span_seed->shadow_data) g_free(span_seed->shadow_data); STraceback_Cell_destroy(span_seed->cell, straceback); g_free(span_seed); return; } static void Scheduler_SpanSeed_copy(Scheduler_SpanSeed *src, Scheduler_SpanSeed *dst, Scheduler_Pair *spair){ register gint i; register C4_Model *model = spair->scheduler->model; g_assert(src); g_assert(dst); dst->score = src->score; dst->max = src->max; dst->seed_id = src->seed_id; dst->query_entry = src->query_entry; dst->target_entry = src->target_entry; STraceback_Cell_destroy(dst->cell, spair->straceback); dst->cell = STraceback_Cell_share(src->cell); /* Copy shadows */ if(spair->scheduler->is_forward) for(i = 0; i < model->total_shadow_designations; i++) dst->shadow_data[i] = src->shadow_data[i]; return; } static Scheduler_SpanSeed *Scheduler_SpanSeed_duplicate( Scheduler_SpanSeed *span_seed, Scheduler_Pair *spair){ return Scheduler_SpanSeed_create(span_seed->score, span_seed->max, span_seed->seed_id, span_seed->query_entry, span_seed->target_entry, span_seed->cell, span_seed->shadow_data, spair->scheduler); } /**/ static gpointer Scheduler_Cache_get_func(gint pos, gpointer page_data, gpointer user_data){ register Scheduler_SpanSeed **seed_matrix = page_data; return seed_matrix[pos]; } static SparseCache_Page *Scheduler_Cache_fill_func(gint start, gpointer user_data){ register Scheduler_Pair *spair = user_data; register Scheduler_SpanSeed ***seed_matrix = (Scheduler_SpanSeed***)Matrix2d_create( SparseCache_PAGE_SIZE, spair->scheduler->model->span_list->len, sizeof(Scheduler_SpanSeed*)); register SparseCache_Page *page = g_new(SparseCache_Page, 1); page->data = seed_matrix; page->data_size = Matrix2d_size(SparseCache_PAGE_SIZE, spair->scheduler->model->span_list->len, sizeof(Scheduler_SpanSeed*)); page->get_func = Scheduler_Cache_get_func; page->copy_func = NULL; return page; } static void Scheduler_Cache_empty_func(SparseCache_Page *page, gpointer user_data){ register Scheduler_Pair *spair = user_data; register Scheduler_SpanSeed *seed, ***seed_matrix = page->data; register gint i, j; for(i = 0; i < SparseCache_PAGE_SIZE; i++) for(j = 0; j < spair->scheduler->model->span_list->len; j++){ seed = seed_matrix[i][j]; if(seed) Scheduler_SpanSeed_destroy(seed, spair->straceback); } g_free(seed_matrix); g_free(page); return; } /**/ static Scheduler_SpanData *Scheduler_SpanData_create( SListSet *slist_set, C4_Span *span){ register Scheduler_SpanData *span_data = g_new(Scheduler_SpanData, 1); span_data->span = span; span_data->curr_span_seed = NULL; return span_data; } static void Scheduler_SpanData_destroy(Scheduler_SpanData *span_data){ g_free(span_data); return; } void Scheduler_SpanData_get_curr(Scheduler_SpanData *span_data, SparseCache *cache, gint query_pos, gint target_pos, STraceback *straceback){ register Scheduler_SpanSeed *stored_seed; g_assert(span_data); g_assert(cache); /* Remove curr if expired */ if(span_data->curr_span_seed){ if((span_data->curr_span_seed->query_entry > query_pos) || ((span_data->curr_span_seed->query_entry + span_data->span->max_query) < query_pos) || ((span_data->curr_span_seed->target_entry + span_data->span->max_target) < target_pos)){ span_data->curr_span_seed = NULL; } } /* Challenge curr with stored */ stored_seed = Scheduler_Cache_get(cache, span_data->span->id, query_pos); if(stored_seed){ g_assert(stored_seed->query_entry == query_pos); if((stored_seed->target_entry + span_data->span->max_target) >= target_pos){ /* Still valid */ if(span_data->curr_span_seed){ if(span_data->curr_span_seed->score < stored_seed->score){ span_data->curr_span_seed = stored_seed; } } else { span_data->curr_span_seed = stored_seed; } } else { /* Stored has expired */ Scheduler_SpanSeed_destroy(stored_seed, straceback); Scheduler_Cache_set(cache, span_data->span->id, query_pos, NULL); } } return; } /* Scheduler_SpanData_get_curr() * is called with transition from a span state * sets span_data->curr_span_seed to the best span_seed in scope */ /* Algorithm: * --------- * if(qy_span) challenging curr * if(ty_span) challenging cached * if(qy_span && ty_span) * challenging curr and stored */ void Scheduler_SpanData_submit(Scheduler_SpanData *span_data, Scheduler_SpanSeed *span_seed, SparseCache *cache, Scheduler_Pair *spair){ register Scheduler_SpanSeed *stored_seed; if(span_data->span->max_target){ /* Challenge cache */ stored_seed = Scheduler_Cache_get(cache, span_data->span->id, span_seed->query_entry); if(stored_seed){ g_assert(stored_seed->query_entry == span_seed->query_entry); if(stored_seed->score <= span_seed->score){ Scheduler_SpanSeed_copy(span_seed, stored_seed, spair); } } else { Scheduler_Cache_set(cache, span_data->span->id, span_seed->query_entry, Scheduler_SpanSeed_duplicate(span_seed, spair)); } } return; } /* Scheduler_SpanData_submit() * is called with transition to a span state */ /**/ static Scheduler_SpanHistory *Scheduler_SpanHistory_create( SListSet *slist_set, C4_Model *model){ register gint i; register Scheduler_SpanHistory *span_history = g_new(Scheduler_SpanHistory, 1); register Scheduler_SpanData *span_data; register C4_Span *span; span_history->model = C4_Model_share(model); span_history->span_data = g_new(Scheduler_SpanData*, model->span_list->len); for(i = 0; i < model->span_list->len; i++){ span = model->span_list->pdata[i]; span_data = Scheduler_SpanData_create(slist_set, span); span_history->span_data[i] = span_data; } return span_history; } static void Scheduler_SpanHistory_destroy( Scheduler_SpanHistory *span_history){ register gint i; register Scheduler_SpanData *span_data; for(i = 0; i < span_history->model->span_list->len; i++){ span_data = span_history->span_data[i]; Scheduler_SpanData_destroy(span_data); } C4_Model_destroy(span_history->model); g_free(span_history->span_data); g_free(span_history); return; } /**/ static void Scheduler_Cell_init(Scheduler_Cell *cell, gint query_pos, gboolean permit_span_thaw, Scheduler_Pair *spair){ register gint i; register C4_Model *model = spair->scheduler->model; memset(cell, 0, spair->scheduler->cell_size); cell->score = (C4_Score**) ((gchar*)cell + spair->scheduler->cell_score_offset); Matrix2d_init((gchar**)cell->score, model->state_list->len, spair->scheduler->shadow_start + model->shadow_list->len, sizeof(C4_Score)); /**/ if(spair->scheduler->has_traceback) cell->traceback = (STraceback_Cell**) ((gchar*)cell + spair->scheduler->cell_traceback_offset); else cell->traceback = NULL; if(spair->scheduler->is_forward){ g_assert(query_pos >= 0); } else { g_assert(query_pos <= 0); } cell->query_pos = query_pos; cell->permit_span_thaw = permit_span_thaw; for(i = 0; i < spair->scheduler->model->state_list->len; i++) cell->score[i][0] = C4_IMPOSSIBLY_LOW_SCORE; if(spair->scheduler->has_traceback) for(i = 0; i < spair->scheduler->model->state_list->len; i++) cell->traceback[i] = NULL; return; } Scheduler_Cell *Scheduler_Cell_create(gint query_pos, gboolean permit_span_thaw, Scheduler_Pair *spair){ register Scheduler_Cell *cell; cell = RecycleBin_alloc(spair->cell_recycle); Scheduler_Cell_init(cell, query_pos, permit_span_thaw, spair); return cell; } static void Scheduler_Cell_destroy(Scheduler_Cell *cell, Scheduler_Pair *spair){ RecycleBin_recycle(spair->cell_recycle, cell); return; } static void Scheduler_Cell_shadow_start(C4_Transition *transition, C4_Score *dst_cell, gint shadow_start, gint dst_query_pos, gint dst_target_pos, gpointer user_data){ register gint i; register C4_Shadow *shadow; for(i = 0; i < transition->input->src_shadow_list->len; i++){ shadow = transition->input->src_shadow_list->pdata[i]; dst_cell[shadow_start+shadow->designation] = shadow->start_func( dst_query_pos, dst_target_pos, user_data); } return; } /* FIXME: do in codegen */ static void Scheduler_Cell_shadow_end(C4_Transition *transition, C4_Score *src_cell, gint shadow_start, gint dst_query_pos, gint dst_target_pos, gpointer user_data){ register gint i; register C4_Shadow *shadow; for(i = 0; i < transition->dst_shadow_list->len; i++){ shadow = transition->dst_shadow_list->pdata[i]; shadow->end_func(src_cell[shadow_start+shadow->designation], dst_query_pos, dst_target_pos, user_data); } return; } void Scheduler_Cell_assign(Scheduler_Pair *spair, Scheduler_Cell *src_cell, gint input_pos, Scheduler_Cell *dst_cell, gint output_pos, C4_Score dst_score, C4_Score max_score, C4_Transition *transition, gint seed_id, gint dst_query_pos, gint dst_target_pos){ register C4_Model *model = spair->scheduler->model; register gint i; /* g_message("Assign (%d,%d)->(%d,%d) dst_score[%d] seed[%d] [%s]", dst_query_pos-transition->advance_query, dst_target_pos-transition->advance_target, dst_query_pos, dst_target_pos, dst_score, seed_id, transition->name); */ dst_cell->score[output_pos][0] = dst_score; /* Assign score */ dst_cell->score[output_pos][2] = seed_id; if(spair->scheduler->has_traceback){ if(dst_cell->traceback[output_pos]) STraceback_Cell_destroy(dst_cell->traceback[output_pos], spair->straceback); dst_cell->traceback[output_pos] = STraceback_add(spair->straceback, transition, 1, src_cell->traceback[input_pos]); /* FIXME: optimisation: overwrite prev cell directly ? */ } if(spair->scheduler->is_forward){ /* Start shadows */ Scheduler_Cell_shadow_start(transition, src_cell->score[input_pos], spair->scheduler->shadow_start, dst_query_pos-transition->advance_query, dst_target_pos-transition->advance_target, spair->user_data); /* Transport shadows */ for(i = 0; i < model->total_shadow_designations; i++){ dst_cell->score[output_pos][spair->scheduler->shadow_start+i] = src_cell->score[input_pos][spair->scheduler->shadow_start+i]; } /* FIXME: is this necessary ? or write shadows directly to dst ? */ } if(dst_score < max_score){ dst_cell->score[output_pos][1] = max_score; /* g_message("MAX [%d] on (%d,%d) seed=%d opp=%d (%s)", dst_score, dst_query_pos-transition->advance_query, dst_target_pos-transition->advance_target, seed_id, output_pos, transition->name); */ } else { /* Is the best score on this path */ dst_cell->score[output_pos][1] = dst_score; if(spair->scheduler->start_func){ if(transition->input == model->start_state->state) spair->scheduler->start_func(spair->seed_data, seed_id, dst_score, dst_query_pos, dst_target_pos, spair->scheduler->has_traceback ?dst_cell->traceback[output_pos] :NULL); } if(spair->scheduler->end_func){ if(transition->output == model->end_state->state){ g_assert(dst_cell->traceback[output_pos]); spair->scheduler->end_func(spair->seed_data, seed_id, dst_score, dst_query_pos, dst_target_pos, dst_cell->traceback[output_pos]); } } } return; } /**/ STraceback_Cell *Scheduler_Cell_add_span(STraceback_Cell *prev, STraceback *straceback, C4_Span *span, gint query_len, gint target_len){ g_assert(span); g_assert(span->query_loop || (!query_len)); g_assert(span->target_loop || (!target_len)); if(query_len){ g_assert(query_len >= 0); g_assert(span->query_loop); prev = STraceback_add(straceback, span->query_loop, query_len, prev); } if(target_len){ g_assert(target_len >= 0); g_assert(span->target_loop); prev = STraceback_add(straceback, span->target_loop, target_len, prev); } return prev; } void Scheduler_Cell_process(Scheduler_Cell *cell, Scheduler_Row *row, Scheduler_Pair *spair){ register C4_Transition *transition; register C4_Model *model = spair->scheduler->model; register C4_Score src_score, dst_score, transition_score, max_score; register Scheduler_Row *dst_row; register Scheduler_Cell *dst_cell; register Scheduler_SpanData *span_data; register C4_Span *span; register gint i, j, seed_id = 0, src_query_pos, src_target_pos, dst_query_pos, dst_target_pos, rel_dst_query_pos, rel_dst_target_pos, input_pos, output_pos; Scheduler_SpanSeed span_seed; /* g_print("%s %s %d %d\n", __FUNCTION__, spair->scheduler->is_forward?"fwd":"rev", cell->query_pos, row->target_pos); */ if(spair->scheduler->is_forward){ src_query_pos = cell->query_pos; src_target_pos = row->target_pos; } else { src_query_pos = -cell->query_pos; src_target_pos = -row->target_pos; } /* Calculate transitions in reverse order for forward DP */ for(i = model->transition_list->len-1; i >= 0; i--){ transition = model->transition_list->pdata[i]; if(C4_Transition_is_span(transition)){ if(spair->scheduler->is_forward && spair->scheduler->use_boundary){ /* Copy output to span history */ span = spair->scheduler->span_map[transition->output->id]; if(span){ input_pos = transition->input->id; span_data = spair->span_history->span_data[span->id]; span_seed.score = cell->score[input_pos][0]; if(span_seed.score >= 0){ span_seed.max = cell->score[input_pos][1]; span_seed.seed_id = cell->score[input_pos][2]; span_seed.cell = cell->traceback[input_pos]; /* Ref count does not need to be taken here */ span_seed.query_entry = src_query_pos; span_seed.target_entry = src_target_pos; /* g_message("FREEZING [%d,%d] [%d] entry (%d,%d)", span_seed.score, span_seed.max, span_seed.seed_id, span_seed.query_entry, span_seed.target_entry); */ span_seed.shadow_data = &cell->score[input_pos] [spair->scheduler->shadow_start]; Scheduler_SpanData_submit(span_data, &span_seed, spair->span_seed_cache, spair); } } } continue; } if(spair->scheduler->is_forward){ dst_query_pos = src_query_pos + transition->advance_query; dst_target_pos = src_target_pos + transition->advance_target; if((dst_query_pos > Region_query_end(spair->region)) || (dst_target_pos > Region_target_end(spair->region))) continue; input_pos = transition->input->id; output_pos = transition->output->id; rel_dst_query_pos = dst_query_pos; rel_dst_target_pos = dst_target_pos; if(cell->permit_span_thaw){ /* Copy curr best span to input */ span = spair->scheduler ->span_map[transition->input->id]; if(span){ span_data = spair ->span_history->span_data[span->id]; Scheduler_SpanData_get_curr(span_data, spair->span_seed_cache, cell->query_pos, row->target_pos, spair->straceback); /* Protect good present score from overwriting */ if((span_data->curr_span_seed) && (cell->score[input_pos][0] < span_data->curr_span_seed->score)){ /* g_message("THAWING [%d,%d] [%d] entry(%d,%d) [%d,%d]", span_data->curr_span_seed->score, span_data->curr_span_seed->max, span_data->curr_span_seed->seed_id, span_data->curr_span_seed->query_entry, span_data->curr_span_seed->target_entry, src_query_pos, src_target_pos); */ /**/ cell->score[input_pos][0] = span_data->curr_span_seed->score; cell->score[input_pos][1] = span_data->curr_span_seed->max; cell->score[input_pos][2] = span_data->curr_span_seed->seed_id; g_assert(spair->scheduler->has_traceback); cell->traceback[input_pos] = Scheduler_Cell_add_span( span_data->curr_span_seed->cell, spair->straceback, span, src_query_pos -span_data->curr_span_seed->query_entry, src_target_pos -span_data->curr_span_seed->target_entry); /**/ /* Thaw shadows */ for(j = 0; j < model->total_shadow_designations; j++) cell->score[input_pos] [spair->scheduler->shadow_start+j] = span_data->curr_span_seed->shadow_data[j]; } } } /* End shadows */ Scheduler_Cell_shadow_end(transition, cell->score[input_pos], spair->scheduler->shadow_start, dst_query_pos, dst_target_pos, spair->user_data); transition_score = C4_Calc_score(transition->calc, src_query_pos, src_target_pos, spair->user_data); } else { /* reverse pass */ dst_query_pos = src_query_pos - transition->advance_query; dst_target_pos = src_target_pos - transition->advance_target; if((dst_query_pos < spair->region->query_start) || (dst_target_pos < spair->region->target_start)) continue; rel_dst_query_pos = -dst_query_pos; rel_dst_target_pos = -dst_target_pos; input_pos = transition->output->id; output_pos = transition->input->id; /* Allow extension through shadowed transitions */ if(transition->dst_shadow_list->len) transition_score = 0; else transition_score = C4_Calc_score(transition->calc, dst_query_pos, dst_target_pos, spair->user_data); } src_score = cell->score[input_pos][0]; max_score = cell->score[input_pos][1]; seed_id = cell->score[input_pos][2]; dst_score = src_score + transition_score; if(spair->scheduler->is_forward) if(dst_score < 0) continue; if((max_score - dst_score) > spair->scheduler->dropoff) continue; /**/ if(C4_Transition_is_match(transition)){ if(spair->subopt_index){ if(SubOpt_Index_is_blocked(spair->subopt_index, src_query_pos-spair->region->query_start)) continue; } } /**/ /* Fetch/create and set the dst cell */ dst_row = Lookahead_get(spair->row_index, transition->advance_target); if(!dst_row){ dst_row = Scheduler_Row_create(rel_dst_target_pos, spair); Lookahead_set(spair->row_index, transition->advance_target, dst_row); Lookahead_move(dst_row->cell_index, cell->query_pos); } g_assert(dst_row); g_assert(dst_row->target_pos == rel_dst_target_pos); g_assert(dst_row->cell_index->pos == cell->query_pos); dst_cell = Lookahead_get(dst_row->cell_index, transition->advance_query); if(dst_cell){ /* Keep score if higher (don't tie break on max_score) */ if(dst_score <= dst_cell->score[output_pos][0]) continue; g_assert(dst_cell->query_pos == rel_dst_query_pos); } else { /* New cell */ dst_cell = Scheduler_Cell_create(rel_dst_query_pos, FALSE, spair); Lookahead_set(dst_row->cell_index, transition->advance_query, dst_cell); } g_assert(dst_cell); Scheduler_Cell_assign(spair, cell, input_pos, dst_cell, output_pos, dst_score, max_score, transition, seed_id, dst_query_pos, dst_target_pos); } return; } /* FIXME: tidy and optimise */ static void Scheduler_Cell_seed(Scheduler_Cell *cell, Scheduler_Seed *seed, Scheduler_Pair *spair){ register gboolean seed_state_id = spair->scheduler->is_forward ? spair->scheduler->model->start_state->state->id : spair->scheduler->model->end_state->state->id; cell->score[seed_state_id][0] = seed->start_score; cell->score[seed_state_id][1] = seed->start_score; cell->score[seed_state_id][2] = seed->seed_id; if(cell->traceback) cell->traceback[seed_state_id] = NULL; return; } /**/ static void Scheduler_Row_Lookahead_free_func(gpointer data, gpointer user_data){ register Scheduler_Cell *cell = data; register Scheduler_Row *row = user_data; g_assert(cell); /* Do not destroy cells - just add to the used_list */ SList_queue(row->used_list, cell); return; } Scheduler_Row *Scheduler_Row_create(gint target_pos, Scheduler_Pair *spair){ register Scheduler_Row *row = g_new0(Scheduler_Row, 1); if(spair->scheduler->is_forward){ g_assert(target_pos >= 0); } else { g_assert(target_pos <= 0); } row->target_pos = target_pos; row->cell_index = Lookahead_create( spair->scheduler->is_forward ?0:-Region_query_end(spair->region), spair->scheduler->model->max_query_advance, Scheduler_Row_Lookahead_free_func, row); row->used_list = SList_create(spair->slist_set); row->unused_list = SList_create(spair->slist_set); return row; } static void Scheduler_Row_process(Scheduler_Row *row, Scheduler_Pair *spair){ register Scheduler_Cell *cell, *next_cell; register gint advance; g_assert(Scheduler_Pair_is_valid(spair)); if(spair->scheduler->is_forward) SubOpt_Index_set_row(spair->subopt_index, row->target_pos-spair->region->target_start); else SubOpt_Index_set_row(spair->subopt_index, (-row->target_pos)-spair->region->target_start); do { /* Add first cell if row index is empty */ if(!row->cell_index->mask){ if(SList_isempty(row->unused_list)) break; /* No more cells */ cell = SList_pop(row->unused_list); g_assert(cell); Lookahead_move(row->cell_index, cell->query_pos); Lookahead_set(row->cell_index, 0, cell); } /* Populate initial cell positions with seeds * (no more than max_query_advance columns) */ cell = Lookahead_get(row->cell_index, 0); g_assert(cell); do { g_assert(row); g_assert(row->unused_list); next_cell = SList_first(row->unused_list); if(!next_cell) break; advance = next_cell->query_pos - cell->query_pos; g_assert(advance >= 0); if(advance > row->cell_index->max_advance) break; /* Cell cannot be already created */ g_assert(!Lookahead_get(row->cell_index, advance)); next_cell = SList_pop(row->unused_list); Lookahead_set(row->cell_index, advance, next_cell); } while(TRUE); /* Process cell */ Scheduler_Pair_align_rows(spair); /* Will use codegen when available ... */ spair->scheduler->cell_func(cell, row, spair); Lookahead_next(row->cell_index); } while(TRUE); return; } typedef struct { Scheduler_Pair *spair; gint target_pos; } Scheduler_TraverseData; static void Scheduler_Row_traverse_cell_destroy(gpointer data, gpointer user_data){ register Scheduler_Cell *cell = data; register Scheduler_TraverseData *std = user_data; register Scheduler_Pair *spair = std->spair; register gint i, state_id; register Boundary_Row *boundary_row; register C4_Model *model = spair->scheduler->model; register STraceback_Cell *stcell, *prev_stcell; register C4_Span *span; if(spair->straceback){ for(i = 0; i < model->state_list->len; i++){ stcell = cell->traceback[i]; if(stcell){ if((stcell->prev) && (stcell->prev->ref_count == 1) && (stcell->prev->transition == stcell->transition)){ prev_stcell = stcell->prev; stcell->prev = prev_stcell->prev ? STraceback_Cell_share(prev_stcell->prev) : NULL; stcell->length += prev_stcell->length; STraceback_Cell_destroy(prev_stcell, spair->straceback); } STraceback_Cell_destroy(cell->traceback[i], spair->straceback); } } } if((!spair->scheduler->is_forward) && spair->boundary){ /* Record new boundary */ boundary_row = Boundary_get_last_row(spair->boundary); g_assert(boundary_row); state_id = model->start_state->state->id; if(cell->score[state_id][0] >= 0){ Boundary_Row_prepend(boundary_row, -cell->query_pos, cell->score[state_id][2]); } else { for(i = 0; i < model->span_list->len; i++){ span = model->span_list->pdata[i]; state_id = span->span_state->id; if(cell->score[state_id][0] > 0){ Boundary_Row_prepend(boundary_row, -cell->query_pos, cell->score[state_id][2]); break; } } } } /**/ Scheduler_Cell_destroy(cell, spair); return; } static void Scheduler_Row_destroy(Scheduler_Row *row, Scheduler_Pair *spair){ register Boundary_Row *boundary_row = NULL; Scheduler_TraverseData std; std.target_pos = row->target_pos; std.spair = spair; if((!spair->scheduler->is_forward) && spair->boundary) boundary_row = Boundary_add_row(spair->boundary, -row->target_pos); Lookahead_destroy(row->cell_index); SList_destroy_with_data(row->used_list, Scheduler_Row_traverse_cell_destroy, &std); SList_destroy_with_data(row->unused_list, Scheduler_Row_traverse_cell_destroy, &std); if(boundary_row) Boundary_remove_empty_last_row(spair->boundary); g_free(row); return; } /* lookahead->pos uses -ve coordinates * terminal coordinates are always +ve * cell->query_pos * and row->target_pos use reversed coordinates. */ static void Scheduler_Row_add_seed(Scheduler_Row *row, Scheduler_Pair *spair, Scheduler_Seed *seed){ register Scheduler_Cell *cell = Scheduler_Cell_create(seed->query_pos, (spair->scheduler->is_forward && spair->scheduler->use_boundary), spair); register Scheduler_Cell *first_cell; register gint advance; if(row->cell_index->mask){ /* Already have cells */ /* Add to lookahead when within advance range */ first_cell = Lookahead_get(row->cell_index, 0); g_assert(first_cell); advance = cell->query_pos - first_cell->query_pos; g_assert(advance >= 0); if(advance < row->cell_index->max_advance){ /* Row should not exist already */ g_assert(!Lookahead_get(row->cell_index, advance)); Lookahead_set(row->cell_index, advance, cell); } else { /* Otherwise add to the unused list */ SList_queue(row->unused_list, cell); } } else { /* Row is empty */ Lookahead_move(row->cell_index, cell->query_pos); Lookahead_set(row->cell_index, 0, cell); } /* FIXME: Optimisation: (FOR EFFICIENT SUBOPT DP) * Should not seed here unless the seed is part of the * update boundary (or the seed is an update seed) */ Scheduler_Cell_seed(cell, seed, spair); return; } static void Scheduler_Row_reset(Scheduler_Row *row){ register SList *joined_list; Lookahead_reset(row->cell_index); /* Prepend anything in used onto unused */ joined_list = SList_join(row->used_list, row->unused_list); row->used_list = row->unused_list; row->unused_list = joined_list; return; } static void Scheduler_Row_move(Scheduler_Row *row, gint query_pos){ register Scheduler_Cell *cell; register gint advance; Lookahead_move(row->cell_index, query_pos); /* Move anything in unused list before index onto used list */ do { cell = SList_first(row->unused_list); if(!cell) break; if(cell->query_pos >= query_pos) break; SList_queue(row->used_list, SList_pop(row->unused_list)); } while(TRUE); /* Move anything in unused list in index into index */ do { cell = SList_first(row->unused_list); if(!cell) break; advance = cell->query_pos - query_pos; g_assert(advance >= 0); if(advance > row->cell_index->max_advance) break; Lookahead_set(row->cell_index, advance, SList_pop(row->unused_list)); } while(TRUE); return; } /* FIXME: tidy */ static gboolean Scheduler_Row_print_traverse(gpointer data, gpointer user_data){ register Scheduler_Cell *cell = data; register gchar *message = user_data; g_message(" >[%s] CELL [%d](%p)", message, cell->query_pos, cell); return FALSE; } static void Scheduler_Row_print(Scheduler_Row *row){ register gint i; register Scheduler_Cell *cell; g_print("\n"); g_message("--["); g_message("ROW target_pos [%d] (mask %d) (ma=%d)", row->target_pos, row->cell_index->mask, row->cell_index->max_advance); g_message("index pos [%d]", row->cell_index->pos); for(i = 0; i <= row->cell_index->max_advance; i++){ cell = Lookahead_get(row->cell_index, i); if(cell){ g_message(" >lookahead (%d) CELL [%d]", i, cell->query_pos); } } SList_traverse(row->unused_list, Scheduler_Row_print_traverse, "unused"); SList_traverse(row->used_list, Scheduler_Row_print_traverse, "used"); g_message("]--\n"); return; } /**/ typedef struct { gint pos; gboolean is_valid; } Scheduler_Row_Validate; static gboolean Scheduler_Row_is_valid_traverse(gpointer data, gpointer user_data){ register Scheduler_Row_Validate *v = user_data; register Scheduler_Cell *cell = data; if(cell->query_pos < v->pos){ v->is_valid = FALSE; } v->pos = cell->query_pos; return FALSE; } static gboolean Scheduler_Row_is_valid(Scheduler_Row *row){ Scheduler_Row_Validate v; register gint i; register Scheduler_Cell *cell; v.pos = -987654321; v.is_valid = TRUE; SList_traverse(row->used_list, Scheduler_Row_is_valid_traverse, &v); for(i = 0; i <= row->cell_index->max_advance; i++){ cell = Lookahead_get(row->cell_index, i); if(cell){ if(cell->query_pos < v.pos) v.is_valid = FALSE; } } SList_traverse(row->unused_list, Scheduler_Row_is_valid_traverse, &v); if(!v.is_valid){ g_message("Found invalid row"); Scheduler_Row_print(row); g_error("bad row"); } return v.is_valid; } /**/ static void Scheduler_Pair_init(Scheduler_Pair *spair, Scheduler_Seed *seed){ /* Load first seed at start of index */ register Scheduler_Row *first_row; /* Make first row */ first_row = Scheduler_Row_create(seed->target_pos, spair); Scheduler_Row_add_seed(first_row, spair, seed); /* Set row index position */ Lookahead_move(spair->row_index, first_row->target_pos); Lookahead_set(spair->row_index, 0, first_row); return; } static void Scheduler_Pair_add_seed(Scheduler_Pair *spair, Scheduler_Seed *seed){ register Scheduler_Row *row, *first_row; register gint advance; first_row = Lookahead_get(spair->row_index, 0); g_assert(first_row); advance = seed->target_pos - first_row->target_pos; g_assert(advance >= 0); g_assert(advance <= spair->row_index->max_advance); row = Lookahead_get(spair->row_index, advance); if(!row){ row = Scheduler_Row_create(seed->target_pos, spair); Lookahead_set(spair->row_index, advance, row); } g_assert(row); g_assert(row->target_pos == seed->target_pos); Scheduler_Row_add_seed(row, spair, seed); return; } static void Scheduler_Pair_reset_rows(Scheduler_Pair *spair){ register gint i; register Scheduler_Row *row; for(i = 0; i <= spair->row_index->max_advance; i++){ row = Lookahead_get(spair->row_index, i); if(row){ g_assert(Scheduler_Row_is_valid(row)); Scheduler_Row_reset(row); } } return; } void Scheduler_Pair_calculate(Scheduler_Pair *spair){ Scheduler_Seed seed; register Scheduler_Row *row; register gint i; register C4_Calc *calc; register C4_Model *model = spair->scheduler->model; /* Call model init func */ if(model->init_func) model->init_func(spair->region, spair->user_data); /* Call model calc init funcs */ for(i = 0; i < model->calc_list->len; i++){ calc = model->calc_list->pdata[i]; if(calc && calc->init_func) calc->init_func(spair->region, spair->user_data); } /* Call spair init func */ spair->scheduler->init_func(spair->seed_data); /* Compute seeded DP rows while rows and seeds remain */ do { /* Add first point if row_index is empty */ if(!spair->row_index->mask){ if(!spair->scheduler->get_func(spair->seed_data, &seed)) break; Scheduler_Pair_init(spair, &seed); spair->scheduler->next_func(spair->seed_data); } /* Populate initial DP rows with seeds * (no more than max_target_advance rows) */ row = Lookahead_get(spair->row_index, 0); g_assert(row); while(spair->scheduler->get_func(spair->seed_data, &seed)){ if((seed.target_pos - row->target_pos) > model->max_target_advance) break; Scheduler_Pair_add_seed(spair, &seed); spair->scheduler->next_func(spair->seed_data); } Scheduler_Pair_reset_rows(spair); Scheduler_Row_process(row, spair); Lookahead_next(spair->row_index); } while(TRUE); /* Call model calc exit funcs */ for(i = 0; i < model->calc_list->len; i++){ calc = model->calc_list->pdata[i]; if(calc && calc->exit_func) calc->exit_func(spair->region, spair->user_data); } /* Call model exit funcs */ if(model->exit_func) model->exit_func(spair->region, spair->user_data); return; } /**/ static void Scheduler_Lookahead_free_func(gpointer data, gpointer user_data){ register Scheduler_Row *row = data; register Scheduler_Pair *spair = user_data; Scheduler_Row_destroy(row, spair); return; } Scheduler_Pair *Scheduler_Pair_create( Scheduler *scheduler, STraceback *straceback, gint query_len, gint target_len, SubOpt *subopt, Boundary *boundary, gint best_seed_id, gpointer seed_data, gpointer user_data){ register Scheduler_Pair *spair = g_new(Scheduler_Pair, 1); spair->scheduler = scheduler; spair->slist_set = SListSet_create(); spair->region = Region_create(0, 0, query_len, target_len); spair->seed_data = seed_data; spair->user_data = user_data; spair->best_seed_id = best_seed_id; if(scheduler->has_traceback){ g_assert(straceback); spair->straceback = STraceback_share(straceback); } else { spair->straceback = NULL; } spair->row_index = Lookahead_create( scheduler->is_forward?0:-target_len, scheduler->model->max_target_advance, Scheduler_Lookahead_free_func, spair); spair->cell_recycle = RecycleBin_create("Scheduler_Cell", scheduler->cell_size, 1024); if(scheduler->use_boundary){ g_assert(boundary); spair->boundary = Boundary_share(boundary); spair->span_seed_cache = SparseCache_create(query_len+1, Scheduler_Cache_fill_func, Scheduler_Cache_empty_func, NULL, spair); spair->span_history = Scheduler_SpanHistory_create(spair->slist_set, scheduler->model); } else { g_assert(!boundary); spair->boundary = NULL; spair->span_history = NULL; spair->span_seed_cache = NULL; } spair->subopt_index = SubOpt_Index_create(subopt, spair->region); return spair; } void Scheduler_Pair_destroy(Scheduler_Pair *spair){ if(spair->straceback) STraceback_destroy(spair->straceback); if(spair->span_history) Scheduler_SpanHistory_destroy(spair->span_history); if(spair->boundary) Boundary_destroy(spair->boundary); Region_destroy(spair->region); Lookahead_destroy(spair->row_index); RecycleBin_destroy(spair->cell_recycle); SListSet_destroy(spair->slist_set); if(spair->span_seed_cache) SparseCache_destroy(spair->span_seed_cache); if(spair->subopt_index) SubOpt_Index_destroy(spair->subopt_index); g_free(spair); return; } static void Scheduler_Pair_align_rows(Scheduler_Pair *spair){ register gint i; /* Set other rows in row_index to same position as first row */ register Scheduler_Row *first_row, *row; register Scheduler_Cell *first_cell; first_row = Lookahead_get(spair->row_index, 0); g_assert(first_row); first_cell = Lookahead_get(first_row->cell_index, 0); g_assert(first_cell); for(i = 1; i <= spair->row_index->max_advance; i++){ row = Lookahead_get(spair->row_index, i); if(row) Scheduler_Row_move(row, first_cell->query_pos); } /* FIXME: optimisation * can do this faster with Lookahead_next() */ return; } static gboolean Scheduler_Pair_is_valid(Scheduler_Pair *spair){ register gint i; register Scheduler_Row *row; for(i = 0; i <= spair->row_index->max_advance; i++){ row = Lookahead_get(spair->row_index, i); g_assert((!row) || Scheduler_Row_is_valid(row)); } return TRUE; } /**/ exonerate-2.4.0/src/sdp/lookahead.c0000644000175000017500000001143711162714320014103 00000000000000/****************************************************************\ * * * C4 dynamic programming library - SDP Lookahead Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "lookahead.h" /**/ #define Lookahead_index_pos(lookahead, advance) \ (((lookahead)->pos + (advance)) & (Lookahead_Mask_WIDTH-1)) #define Lookahead_pos_is_set(lookahead, advance) \ ((lookahead)->mask \ & (1 << Lookahead_index_pos((lookahead), (advance)))) Lookahead *Lookahead_create(gint reset_pos, gint max_advance, Lookahead_FreeFunc free_func, gpointer user_data){ register Lookahead *lookahead = g_new0(Lookahead, 1); g_assert(max_advance > 0); g_assert(max_advance <= Lookahead_Mask_WIDTH); lookahead->reset_pos = reset_pos; lookahead->pos = reset_pos; lookahead->max_advance = max_advance; lookahead->free_func = free_func; lookahead->user_data = user_data; return lookahead; } static Lookahead_Mask Lookahead_orientate_mask(Lookahead *lookahead){ register gint index_pos = Lookahead_index_pos(lookahead, 0); return (lookahead->mask >> index_pos) | (lookahead->mask << (Lookahead_Mask_WIDTH-index_pos)); } static gint Lookahead_next_pos(Lookahead *lookahead){ g_assert(lookahead->mask); /* Not currently called when empty */ return Lookahead_Mask_ffs(Lookahead_orientate_mask(lookahead)) - 1; } /* Returns advance of next occupied or -1 when nothing is set */ void Lookahead_destroy(Lookahead *lookahead){ if(lookahead->mask){ /* Not empty */ /* Move to first occupied pos */ Lookahead_move(lookahead, lookahead->pos + Lookahead_next_pos(lookahead)); while(lookahead->mask) /* Not empty */ Lookahead_next(lookahead); /* Move to next position */ } g_free(lookahead); return; } void Lookahead_reset(Lookahead *lookahead){ Lookahead_move(lookahead, lookahead->pos + Lookahead_Mask_WIDTH); g_assert(!lookahead->mask); lookahead->pos = lookahead->reset_pos; return; } gpointer Lookahead_get(Lookahead *lookahead, gint advance){ g_assert(advance >= 0); g_assert(advance <= lookahead->max_advance); return lookahead->index[Lookahead_index_pos(lookahead, advance)]; } void Lookahead_set(Lookahead *lookahead, gint advance, gpointer data){ register gint index_pos = Lookahead_index_pos(lookahead, advance); g_assert(advance >= 0); g_assert(advance <= lookahead->max_advance); g_assert(!Lookahead_get(lookahead, advance)); lookahead->index[index_pos] = data; lookahead->mask |= (1 << index_pos); return; } static void Lookahead_clear(Lookahead *lookahead, gint advance){ register gint index_pos = Lookahead_index_pos(lookahead, advance); g_assert(advance >= 0); g_assert(advance <= lookahead->max_advance); g_assert(lookahead->index[index_pos]); lookahead->free_func(lookahead->index[index_pos], lookahead->user_data); lookahead->index[index_pos] = NULL; lookahead->mask &= (~(1 << index_pos)); return; } gint Lookahead_next(Lookahead *lookahead){ g_assert(Lookahead_pos_is_set(lookahead, 0)); Lookahead_clear(lookahead, 0); if(lookahead->mask) /* not empty */ lookahead->pos += Lookahead_next_pos(lookahead); else lookahead->pos = lookahead->reset_pos; /* empty */ return lookahead->pos; } void Lookahead_move(Lookahead *lookahead, gint pos){ register gint next_advance, max_advance = pos - lookahead->pos; g_assert(pos >= lookahead->pos); g_assert(max_advance >= 0); while(lookahead->mask){ next_advance = Lookahead_next_pos(lookahead); g_assert(next_advance >= 0); g_assert(next_advance <= lookahead->max_advance); if(next_advance < max_advance){ Lookahead_clear(lookahead, next_advance); } else { break; } } lookahead->pos = pos; /* update the position */ return; } /**/ exonerate-2.4.0/src/sdp/straceback.c0000644000175000017500000001121311162714330014247 00000000000000/****************************************************************\ * * * C4 dynamic programming library - Scheduler Traceback * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "straceback.h" /**/ STraceback *STraceback_create(C4_Model *model, gboolean is_forward){ register STraceback *straceback = g_new(STraceback, 1); straceback->ref_count = 1; straceback->is_forward = is_forward; straceback->cell_recycle = RecycleBin_create( "STraceback_Cell", sizeof(STraceback_Cell), 1024); straceback->model = C4_Model_share(model); return straceback; } STraceback *STraceback_share(STraceback *straceback){ straceback->ref_count++; return straceback; } void STraceback_destroy(STraceback *straceback){ if(--straceback->ref_count) return; RecycleBin_destroy(straceback->cell_recycle); C4_Model_destroy(straceback->model); g_free(straceback); return; } STraceback_Cell *STraceback_add(STraceback *straceback, C4_Transition *transition, gint length, STraceback_Cell *prev){ register STraceback_Cell *cell = RecycleBin_alloc(straceback->cell_recycle); g_assert(transition); cell->ref_count = 1; cell->transition = transition; cell->length = length; if(prev){ if(straceback->is_forward){ g_assert(prev->transition->output == transition->input); } else { g_assert(prev->transition->input == transition->output); } cell->prev = STraceback_Cell_share(prev); } else { if(straceback->is_forward){ g_assert(transition->input == straceback->model->start_state->state); } else { g_assert(transition->output == straceback->model->end_state->state); } cell->prev = NULL; } return cell; } STraceback_Cell *STraceback_Cell_share(STraceback_Cell *cell){ cell->ref_count++; return cell; } void STraceback_Cell_destroy(STraceback_Cell *cell, STraceback *straceback){ if(--cell->ref_count) return; if(cell->prev) STraceback_Cell_destroy(cell->prev, straceback); RecycleBin_recycle(straceback->cell_recycle, cell); return; } static void straceback_g_ptr_array_reverse(GPtrArray *ptr_array){ register gint a, z; register gpointer swap; for(a = 0, z = ptr_array->len-1; a < z; a++, z--){ swap = ptr_array->pdata[a]; ptr_array->pdata[a] = ptr_array->pdata[z]; ptr_array->pdata[z] = swap; } return; } STraceback_List *STraceback_List_create(STraceback *straceback, STraceback_Cell *prev){ register STraceback_List *stlist = g_new(STraceback_List, 1); register STraceback_Cell *cell; register STraceback_Operation *operation; g_assert(prev); if(straceback->is_forward){ g_assert(prev->transition->output == straceback->model->end_state->state); } else { g_assert(prev->transition->input == straceback->model->start_state->state); } stlist->operation_list = g_ptr_array_new(); for(cell = prev; cell; cell = cell->prev){ operation = g_new(STraceback_Operation, 1); operation->transition = cell->transition; operation->length = cell->length; g_ptr_array_add(stlist->operation_list, operation); } straceback_g_ptr_array_reverse(stlist->operation_list); return stlist; } void STraceback_List_destroy(STraceback_List *stlist){ register gint i; register STraceback_Operation *operation; for(i = 0; i < stlist->operation_list->len; i++){ operation = stlist->operation_list->pdata[i]; g_free(operation); } g_ptr_array_free(stlist->operation_list, TRUE); g_free(stlist); return; } /**/ exonerate-2.4.0/src/sdp/lookahead.test.c0000644000175000017500000000526311162714330015062 00000000000000/****************************************************************\ * * * C4 dynamic programming library - SDP Lookahead Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "lookahead.h" static void test_free_func(gpointer data, gpointer user_data){ register gchar *word = data; g_assert(word); g_message("freeing [%s]", word); g_free(word); return; } int main(void){ register Lookahead *lookahead = Lookahead_create(0, 32, test_free_func, NULL); register gchar *word; word = Lookahead_get(lookahead, 0); g_assert(!word); /**/ Lookahead_set(lookahead, 0, g_strdup("zero")); word = Lookahead_get(lookahead, 0); g_message("Have zero [%s]", word); /**/ Lookahead_set(lookahead, 7, g_strdup("seven")); word = Lookahead_get(lookahead, 7); g_message("Have seven [%s]", word); /**/ Lookahead_set(lookahead, 12, g_strdup("twelve")); word = Lookahead_get(lookahead, 12); g_message("Have twelve [%s]", word); /**/ Lookahead_set(lookahead, 24, g_strdup("twentyfour")); word = Lookahead_get(lookahead, 24); g_message("Have twentyfour [%s]", word); /**/ Lookahead_next(lookahead); word = Lookahead_get(lookahead, 0); g_message("Moved to [%s]", word); /**/ Lookahead_next(lookahead); word = Lookahead_get(lookahead, 0); g_message("Moved to [%s]", word); /**/ Lookahead_get(lookahead, 12); word = Lookahead_get(lookahead, 12); g_message("Should have twentyfour [%s]", word); /**/ Lookahead_move(lookahead, 20); word = Lookahead_get(lookahead, 4); g_message("Should still have twentyfour [%s]", word); /**/ Lookahead_set(lookahead, 20, g_strdup("forty")); Lookahead_move(lookahead, 24); Lookahead_next(lookahead); word = Lookahead_get(lookahead, 0); g_message("Should now have forty [%s]", word); Lookahead_destroy(lookahead); return 0; } exonerate-2.4.0/src/sdp/boundary.test.c0000644000175000017500000000207511162714320014753 00000000000000/****************************************************************\ * * * C4 dynamic programming library - Boundary Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "boundary.h" #include "argument.h" int Argument_main(Argument *arg){ g_warning("[%s] does nothing", __FILE__); return 0; } exonerate-2.4.0/src/sdp/scheduler.test.c0000644000175000017500000000205011162714330015100 00000000000000/****************************************************************\ * * * C4 dynamic programming library - DP Scheduler * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "scheduler.h" int Argument_main(Argument *arg){ g_warning("[%s] does nothing", __FILE__); return 0; } exonerate-2.4.0/src/sdp/straceback.test.c0000644000175000017500000000207711162714330015235 00000000000000/****************************************************************\ * * * C4 dynamic programming library - Scheduler Traceback * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "straceback.h" #include "argument.h" int Argument_main(Argument *arg){ g_warning("[%s] does nothing", __FILE__); return 0; } exonerate-2.4.0/src/sdp/scheduler.h0000644000175000017500000002026511162714330014137 00000000000000/****************************************************************\ * * * C4 dynamic programming library - DP Scheduler * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_SCHEDULER_H #define INCLUDED_SCHEDULER_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "c4.h" #include "region.h" #include "lookahead.h" #include "boundary.h" #include "slist.h" #include "recyclebin.h" #include "subopt.h" #include "sparsecache.h" #include "codegen.h" #include "straceback.h" /**/ typedef struct { gint query_pos; gint target_pos; gint seed_id; C4_Score start_score; } Scheduler_Seed; typedef void (*Scheduler_Seed_init_Func)(gpointer seed_data); typedef void (*Scheduler_Seed_next_Func)(gpointer seed_data); typedef gboolean (*Scheduler_Seed_get_Func)(gpointer seed_data, Scheduler_Seed *seed); /* typedef void(*Scheduler_obstruct_Func)(gpointer seed_data, gint src_seed_id, gint dst_seed_id); */ typedef void(*Scheduler_start_Func)(gpointer seed_data, gint seed_id, C4_Score score, gint start_query_pos, gint start_target_pos, STraceback_Cell *stcell); typedef void(*Scheduler_end_Func)(gpointer seed_data, gint seed_id, C4_Score score, gint end_query_pos, gint end_target_pos, STraceback_Cell *stcell); struct Scheduler_Pair; /* To avoid compiler warning */ struct Scheduler_Row; /* To avoid compiler warning */ struct Scheduler_Cell; /* To avoid compiler warning */ typedef void (*Scheduler_DP_Func)(struct Scheduler_Cell *cell, struct Scheduler_Row *row, struct Scheduler_Pair *spair); #define Scheduler_DP_Func_ARGS_STR \ "(struct Scheduler_Cell *cell, \ struct Scheduler_Row *row, \ struct Scheduler_Pair *pair)" \ typedef struct { gboolean is_forward; gboolean has_traceback; gboolean use_boundary; gint shadow_start; gsize cell_score_offset; gsize cell_traceback_offset; gsize cell_size; Scheduler_Seed_init_Func init_func; Scheduler_Seed_next_Func next_func; Scheduler_Seed_get_Func get_func; /* Scheduler_obstruct_Func obstruct_func; */ Scheduler_start_Func start_func; Scheduler_end_Func end_func; C4_Model *model; C4_Span **span_map; C4_Score dropoff; gchar *name; Scheduler_DP_Func cell_func; } Scheduler; Scheduler *Scheduler_create(C4_Model *model, gboolean is_forward, gboolean has_traceback, gboolean use_boundary, Scheduler_Seed_init_Func init_func, Scheduler_Seed_next_Func next_func, Scheduler_Seed_get_Func get_func, /* Scheduler_obstruct_Func obstruct_func, */ Scheduler_start_Func start_func, Scheduler_end_Func end_func, C4_Score dropoff); void Scheduler_destroy(Scheduler *scheduler); Codegen *Scheduler_make_Codegen(Scheduler *scheduler); /**/ typedef struct { C4_Score score; C4_Score max; gint seed_id; gint query_entry; gint target_entry; STraceback_Cell *cell; C4_Score *shadow_data; } Scheduler_SpanSeed; typedef struct { C4_Span *span; Scheduler_SpanSeed *curr_span_seed; } Scheduler_SpanData; typedef struct { Scheduler_SpanData **span_data; C4_Model *model; /* FIXME: move to Scheduler ? */ } Scheduler_SpanHistory; /**/ typedef struct Scheduler_Pair { Scheduler *scheduler; SListSet *slist_set; Region *region; gint best_seed_id; gpointer user_data; gpointer seed_data; Lookahead *row_index; RecycleBin *cell_recycle; STraceback *straceback; /**/ Boundary *boundary; SparseCache *span_seed_cache; Scheduler_SpanHistory *span_history; /**/ SubOpt_Index *subopt_index; } Scheduler_Pair; Scheduler_Pair *Scheduler_Pair_create( Scheduler *scheduler, STraceback *straceback, gint query_len, gint target_len, SubOpt *subopt, Boundary *boundary, gint best_seed_id, gpointer seed_data, gpointer user_data); void Scheduler_Pair_destroy(Scheduler_Pair *spair); void Scheduler_Pair_calculate(Scheduler_Pair *spair); /**/ typedef struct Scheduler_Cell { gint query_pos; gboolean permit_span_thaw; /* FIXME: remove */ C4_Score **score; STraceback_Cell **traceback; /* Only when performing traceback */ } Scheduler_Cell; /* score matrix[state->list->len ] * [ 4 + shadow_total ] * all: 0:[score] * all: 1:[max] * all: 2:[seed_id] * {other shadows ...} * * traceback[state->list->len] * */ /**/ typedef struct Scheduler_Row { gint target_pos; Lookahead *cell_index; SList *used_list; SList *unused_list; } Scheduler_Row; /* FIXME: BEGIN temp for codegen */ void Scheduler_Cell_process(Scheduler_Cell *cell, Scheduler_Row *row, Scheduler_Pair *spair); Scheduler_Row *Scheduler_Row_create(gint target_pos, Scheduler_Pair *spair); Scheduler_Cell *Scheduler_Cell_create(gint query_pos, gboolean permit_span_thaw, Scheduler_Pair *spair); void Scheduler_Cell_assign(Scheduler_Pair *spair, Scheduler_Cell *src_cell, gint input_pos, Scheduler_Cell *dst_cell, gint output_pos, C4_Score dst_score, C4_Score max_score, C4_Transition *transition, gint seed_id, gint dst_query_pos, gint dst_target_pos); void Scheduler_SpanData_get_curr(Scheduler_SpanData *span_data, SparseCache *cache, gint query_pos, gint target_pos, STraceback *straceback); void Scheduler_SpanData_submit(Scheduler_SpanData *span_data, Scheduler_SpanSeed *span_seed, SparseCache *cache, Scheduler_Pair *spair); STraceback_Cell *Scheduler_Cell_add_span(STraceback_Cell *prev, STraceback *straceback, C4_Span *span, gint query_len, gint target_len); /* FIXME: END temp for codegen */ /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_SCHEDULER_H */ exonerate-2.4.0/src/sdp/sdp.h0000644000175000017500000000616211222240154012742 00000000000000/****************************************************************\ * * * C4 dynamic programming library - Seeded Dynamic Programming * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_SDP_H #define INCLUDED_SDP_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "c4.h" #include "comparison.h" #include "slist.h" #include "recyclebin.h" #include "alignment.h" #include "lookahead.h" #include "subopt.h" #include "region.h" #include "sparsecache.h" #include "boundary.h" #include "scheduler.h" #include "threadref.h" /**/ typedef struct { gint dropoff; gboolean single_pass_subopt; } SDP_ArgumentSet; SDP_ArgumentSet *SDP_ArgumentSet_create(Argument *arg); /**/ typedef struct { gint query_pos; gint target_pos; C4_Score score; STraceback_Cell *cell; } SDP_Terminal; typedef struct SDP_Seed { gint seed_id; HSP *hsp; SDP_Terminal *max_start; /* Best start from this HSP */ SDP_Terminal *max_end; /* Best end from this HSP */ PQueue_Node *pq_node; /* To allow raising in PQ */ } SDP_Seed; /* first_pre->*->last_pre->[INDEX]->first_post->*->last_post */ /**/ typedef struct { ThreadRef *thread_ref; SDP_ArgumentSet *sas; C4_Model *model; gboolean use_boundary; Scheduler *find_starts_scheduler; Scheduler *find_ends_scheduler; } SDP; SDP *SDP_create(C4_Model *model); SDP *SDP_share(SDP *sdp); void SDP_destroy(SDP *sdp); GPtrArray *SDP_get_codegen_list(SDP *sdp); /**/ typedef struct { SDP *sdp; Comparison *comparison; gint alignment_count; /* Number reported so far */ gpointer user_data; SubOpt *subopt; GPtrArray *seed_list; gpointer *seed_list_by_score; /* Sorted in score order */ gint single_pass_pos; Boundary *boundary; C4_Score last_score; STraceback *fwd_straceback; STraceback *rev_straceback; } SDP_Pair; SDP_Pair *SDP_Pair_create(SDP *sdp, SubOpt *subopt, Comparison *comparison, gpointer user_data); void SDP_Pair_destroy(SDP_Pair *sdp_pair); /**/ Alignment *SDP_Pair_next_path(SDP_Pair *sdp_pair, C4_Score threshold); /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_SDP_H */ exonerate-2.4.0/src/sdp/lookahead.h0000644000175000017500000000531011162714327014110 00000000000000/****************************************************************\ * * * C4 dynamic programming library - SDP Lookahead Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_LOOKAHEAD_H #define INCLUDED_LOOKAHEAD_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include /* For CHAR_BIT */ #include /* For ffs() */ #include #ifdef ffsll typedef long long int Lookahead_Mask; #define Lookahead_Mask_ffs ffsll #else /* ffsll */ typedef int Lookahead_Mask; #define Lookahead_Mask_ffs ffs #endif /* ffsll */ #define Lookahead_Mask_WIDTH (sizeof(Lookahead_Mask)*(CHAR_BIT)) /* It is assumed that Lookahead_Mask_WIDTH is always greather * than the model max advance. */ /**/ typedef void (*Lookahead_FreeFunc)(gpointer data, gpointer user_data); typedef struct { gint reset_pos; gint pos; gint max_advance; gpointer *index[Lookahead_Mask_WIDTH]; Lookahead_Mask mask; Lookahead_FreeFunc free_func; gpointer user_data; } Lookahead; Lookahead *Lookahead_create(gint reset_pos, gint max_advance, Lookahead_FreeFunc free_func, gpointer user_data); void Lookahead_destroy(Lookahead *lookahead); void Lookahead_reset(Lookahead *lookahead); gpointer Lookahead_get(Lookahead *lookahead, gint advance); void Lookahead_set(Lookahead *lookahead, gint advance, gpointer data); gint Lookahead_next(Lookahead *lookahead); void Lookahead_move(Lookahead *lookahead, gint pos); /* Lookahead_next() will assume that the curr is set and clear it * and then return new position * Lookahead_move() will set pos as curr clearing any data inbetween */ /* FIXME: optimisation : replace get/set with macros ? */ /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_LOOKAHEAD_H */ exonerate-2.4.0/src/sdp/boundary.h0000644000175000017500000000415211162714320014000 00000000000000/****************************************************************\ * * * C4 dynamic programming library - Boundary Object * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_BOUNDARY_H #define INCLUDED_BOUNDARY_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "region.h" /**/ typedef struct { gint query_pos; gint length; gint seed_id; } Boundary_Interval; typedef struct { gint target_pos; GPtrArray *interval_list; } Boundary_Row; typedef struct { gint ref_count; GPtrArray *row_list; } Boundary; /**/ Boundary *Boundary_create(void); void Boundary_destroy(Boundary *boundary); Boundary *Boundary_share(Boundary *boundary); /**/ Boundary_Row *Boundary_add_row(Boundary *boundary, gint target_pos); Boundary_Row *Boundary_get_last_row(Boundary *boundary); void Boundary_remove_empty_last_row(Boundary *boundary); /**/ void Boundary_reverse(Boundary *boundary); Boundary *Boundary_select(Boundary *boundary, Region *region); /**/ void Boundary_Row_prepend(Boundary_Row *boundary_row, gint query_pos, gint seed_id); void Boundary_insert(Boundary *boundary, Boundary *insert); /**/ void Boundary_print_gnuplot(Boundary *boundary, gint colour); /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_BOUNDARY_H */ exonerate-2.4.0/src/sdp/straceback.h0000644000175000017500000000472211162714330014263 00000000000000/****************************************************************\ * * * C4 dynamic programming library - Scheduler Traceback * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_STRACEBACK_H #define INCLUDED_STRACEBACK_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "recyclebin.h" #include "c4.h" #include "region.h" /**/ typedef struct STraceback_Cell { gint ref_count; C4_Transition *transition; gint length; struct STraceback_Cell *prev; } STraceback_Cell; typedef struct { gint ref_count; gboolean is_forward; RecycleBin *cell_recycle; C4_Model *model; } STraceback; STraceback *STraceback_create(C4_Model *model, gboolean is_forward); STraceback *STraceback_share(STraceback *straceback); void STraceback_destroy(STraceback *straceback); STraceback_Cell *STraceback_add(STraceback *straceback, C4_Transition *transition, gint length, STraceback_Cell *prev); STraceback_Cell *STraceback_Cell_share(STraceback_Cell *cell); void STraceback_Cell_destroy(STraceback_Cell *cell, STraceback *straceback); typedef struct { C4_Transition *transition; gint length; } STraceback_Operation; typedef struct { GPtrArray *operation_list; } STraceback_List; STraceback_List *STraceback_List_create(STraceback *straceback, STraceback_Cell *source); void STraceback_List_destroy(STraceback_List *stlist); /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_STRACEBACK_H */ exonerate-2.4.0/src/model/0000777000175000017500000000000011222703703012400 500000000000000exonerate-2.4.0/src/model/Makefile.am0000644000175000017500000001446711222643665014376 00000000000000 TESTS = edit_distance.test affine.test protein2dna.test \ ner.test est2genome.test ungapped.test intron.test \ protein2genome.test coding2coding.test \ frameshift.test coding2genome.test \ cdna2genome.test genome2genome.test \ phase.test modeltype.test noinst_PROGRAMS = $(TESTS) bootstrapper INCLUDES = -I$(top_srcdir)/src/c4 \ -I$(top_srcdir)/src/bsdp \ -I$(top_srcdir)/src/sdp \ -I$(top_srcdir)/src/sequence \ -I$(top_srcdir)/src/general \ -I$(top_srcdir)/src/comparison \ -I$(top_srcdir)/src/struct \ -DSOURCE_ROOT_DIR="\"@source_root_dir@\"" \ -DGLIB_CFLAGS="\"@glib_cflags@\"" SEQUENCE_OBJ = $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o \ -lm ALIGNMENT_OBJ = $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(SEQUENCE_OBJ) C4_OBJECTS = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o \ $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o \ $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o \ $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o \ $(top_srcdir)/src/c4/subopt.o \ $(ALIGNMENT_OBJ) # -lm should be at end of argument list for IRIX noinst_HEADERS = edit_distance.h affine.h protein2dna.h \ ner.h est2genome.h ungapped.h intron.h \ protein2genome.h coding2coding.h frameshift.h \ coding2genome.h genome2genome.h cdna2genome.h \ phase.h modeltype.h bootstrapper_SOURCES = bootstrapper.c modeltype.c \ ungapped.c affine.c est2genome.c ner.c \ protein2dna.c protein2genome.c coding2coding.c \ coding2genome.c genome2genome.c cdna2genome.c \ frameshift.c intron.c phase.c bootstrapper_LDADD = $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/comparison/comparison.o \ $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/bsdp/heuristic.o \ $(top_srcdir)/src/sdp/sdp.o \ $(top_srcdir)/src/sdp/scheduler.o \ $(top_srcdir)/src/sdp/straceback.o \ $(top_srcdir)/src/sdp/lookahead.o \ $(top_srcdir)/src/sdp/boundary.o \ $(C4_OBJECTS) edit_distance_test_SOURCES = edit_distance.test.c edit_distance.c edit_distance_test_LDADD = $(C4_OBJECTS) affine_test_SOURCES = affine.test.c affine.c ungapped.c affine_test_LDADD = $(C4_OBJECTS) protein2dna_test_SOURCES = protein2dna.test.c protein2dna.c \ affine.c ungapped.c frameshift.c protein2dna_test_LDADD = $(C4_OBJECTS) ner_test_SOURCES = ner.test.c ner.c affine.c ungapped.c ner_test_LDADD = $(C4_OBJECTS) est2genome_test_SOURCES = est2genome.test.c est2genome.c \ affine.c ungapped.c intron.c est2genome_test_LDADD = $(C4_OBJECTS) ungapped_test_SOURCES = ungapped.test.c ungapped.c ungapped_test_LDADD = $(C4_OBJECTS) intron_test_SOURCES = intron.test.c intron.c intron_test_LDADD = $(C4_OBJECTS) protein2genome_test_SOURCES = protein2genome.test.c protein2genome.c \ protein2dna.c ungapped.c intron.c \ affine.c frameshift.c phase.c protein2genome_test_LDADD = $(C4_OBJECTS) coding2coding_test_SOURCES = coding2coding.test.c coding2coding.c \ affine.c ungapped.c frameshift.c coding2coding_test_LDADD = $(C4_OBJECTS) frameshift_test_SOURCES = frameshift.test.c frameshift.c frameshift_test_LDADD = $(C4_OBJECTS) coding2genome_test_SOURCES = coding2genome.test.c coding2genome.c \ coding2coding.c affine.c ungapped.c \ frameshift.c intron.c phase.c coding2genome_test_LDADD = $(C4_OBJECTS) cdna2genome_test_SOURCES = cdna2genome.test.c cdna2genome.c \ affine.c ungapped.c \ frameshift.c intron.c phase.c \ coding2coding.c coding2genome.c \ est2genome.c cdna2genome_test_LDADD = $(C4_OBJECTS) genome2genome_test_SOURCES = genome2genome.test.c genome2genome.c \ coding2coding.c affine.c ungapped.c \ frameshift.c intron.c phase.c \ coding2genome.c cdna2genome.c \ est2genome.c genome2genome_test_LDADD = $(C4_OBJECTS) phase_test_SOURCES = phase.test.c phase.c intron.c phase_test_LDADD = $(C4_OBJECTS) modeltype_test_SOURCES = modeltype.test.c modeltype.c \ ungapped.c affine.c est2genome.c ner.c \ protein2dna.c protein2genome.c \ coding2coding.c coding2genome.c \ genome2genome.c cdna2genome.c \ frameshift.c phase.c intron.c modeltype_test_LDADD = $(C4_OBJECTS) # Files to clear away MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/src/model/Makefile.in0000644000175000017500000010337611222703651014375 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ TESTS = edit_distance.test affine.test protein2dna.test ner.test est2genome.test ungapped.test intron.test protein2genome.test coding2coding.test frameshift.test coding2genome.test cdna2genome.test genome2genome.test phase.test modeltype.test noinst_PROGRAMS = $(TESTS) bootstrapper INCLUDES = -I$(top_srcdir)/src/c4 -I$(top_srcdir)/src/bsdp -I$(top_srcdir)/src/sdp -I$(top_srcdir)/src/sequence -I$(top_srcdir)/src/general -I$(top_srcdir)/src/comparison -I$(top_srcdir)/src/struct -DSOURCE_ROOT_DIR="\"@source_root_dir@\"" -DGLIB_CFLAGS="\"@glib_cflags@\"" SEQUENCE_OBJ = $(top_srcdir)/src/struct/sparsecache.o $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o $(top_srcdir)/src/sequence/alphabet.o $(top_srcdir)/src/sequence/splice.o $(top_srcdir)/src/sequence/translate.o $(top_srcdir)/src/general/argument.o $(top_srcdir)/src/general/lineparse.o $(top_srcdir)/src/general/threadref.o -lm ALIGNMENT_OBJ = $(top_srcdir)/src/comparison/match.o $(top_srcdir)/src/sequence/codonsubmat.o $(top_srcdir)/src/sequence/submat.o $(SEQUENCE_OBJ) C4_OBJECTS = $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/rangetree.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o $(ALIGNMENT_OBJ) # -lm should be at end of argument list for IRIX noinst_HEADERS = edit_distance.h affine.h protein2dna.h ner.h est2genome.h ungapped.h intron.h protein2genome.h coding2coding.h frameshift.h coding2genome.h genome2genome.h cdna2genome.h phase.h modeltype.h bootstrapper_SOURCES = bootstrapper.c modeltype.c ungapped.c affine.c est2genome.c ner.c protein2dna.c protein2genome.c coding2coding.c coding2genome.c genome2genome.c cdna2genome.c frameshift.c intron.c phase.c bootstrapper_LDADD = $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/comparison/comparison.o $(top_srcdir)/src/comparison/hspset.o $(top_srcdir)/src/comparison/wordhood.o $(top_srcdir)/src/bsdp/heuristic.o $(top_srcdir)/src/sdp/sdp.o $(top_srcdir)/src/sdp/scheduler.o $(top_srcdir)/src/sdp/straceback.o $(top_srcdir)/src/sdp/lookahead.o $(top_srcdir)/src/sdp/boundary.o $(C4_OBJECTS) edit_distance_test_SOURCES = edit_distance.test.c edit_distance.c edit_distance_test_LDADD = $(C4_OBJECTS) affine_test_SOURCES = affine.test.c affine.c ungapped.c affine_test_LDADD = $(C4_OBJECTS) protein2dna_test_SOURCES = protein2dna.test.c protein2dna.c affine.c ungapped.c frameshift.c protein2dna_test_LDADD = $(C4_OBJECTS) ner_test_SOURCES = ner.test.c ner.c affine.c ungapped.c ner_test_LDADD = $(C4_OBJECTS) est2genome_test_SOURCES = est2genome.test.c est2genome.c affine.c ungapped.c intron.c est2genome_test_LDADD = $(C4_OBJECTS) ungapped_test_SOURCES = ungapped.test.c ungapped.c ungapped_test_LDADD = $(C4_OBJECTS) intron_test_SOURCES = intron.test.c intron.c intron_test_LDADD = $(C4_OBJECTS) protein2genome_test_SOURCES = protein2genome.test.c protein2genome.c protein2dna.c ungapped.c intron.c affine.c frameshift.c phase.c protein2genome_test_LDADD = $(C4_OBJECTS) coding2coding_test_SOURCES = coding2coding.test.c coding2coding.c affine.c ungapped.c frameshift.c coding2coding_test_LDADD = $(C4_OBJECTS) frameshift_test_SOURCES = frameshift.test.c frameshift.c frameshift_test_LDADD = $(C4_OBJECTS) coding2genome_test_SOURCES = coding2genome.test.c coding2genome.c coding2coding.c affine.c ungapped.c frameshift.c intron.c phase.c coding2genome_test_LDADD = $(C4_OBJECTS) cdna2genome_test_SOURCES = cdna2genome.test.c cdna2genome.c affine.c ungapped.c frameshift.c intron.c phase.c coding2coding.c coding2genome.c est2genome.c cdna2genome_test_LDADD = $(C4_OBJECTS) genome2genome_test_SOURCES = genome2genome.test.c genome2genome.c coding2coding.c affine.c ungapped.c frameshift.c intron.c phase.c coding2genome.c cdna2genome.c est2genome.c genome2genome_test_LDADD = $(C4_OBJECTS) phase_test_SOURCES = phase.test.c phase.c intron.c phase_test_LDADD = $(C4_OBJECTS) modeltype_test_SOURCES = modeltype.test.c modeltype.c ungapped.c affine.c est2genome.c ner.c protein2dna.c protein2genome.c coding2coding.c coding2genome.c genome2genome.c cdna2genome.c frameshift.c phase.c intron.c modeltype_test_LDADD = $(C4_OBJECTS) # Files to clear away MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = PROGRAMS = $(noinst_PROGRAMS) DEFS = @DEFS@ -I. -I$(srcdir) CPPFLAGS = @CPPFLAGS@ LDFLAGS = @LDFLAGS@ LIBS = @LIBS@ edit_distance_test_OBJECTS = edit_distance.test.o edit_distance.o edit_distance_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o edit_distance_test_LDFLAGS = affine_test_OBJECTS = affine.test.o affine.o ungapped.o affine_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o affine_test_LDFLAGS = protein2dna_test_OBJECTS = protein2dna.test.o protein2dna.o affine.o \ ungapped.o frameshift.o protein2dna_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o protein2dna_test_LDFLAGS = ner_test_OBJECTS = ner.test.o ner.o affine.o ungapped.o ner_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o ner_test_LDFLAGS = est2genome_test_OBJECTS = est2genome.test.o est2genome.o affine.o \ ungapped.o intron.o est2genome_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o est2genome_test_LDFLAGS = ungapped_test_OBJECTS = ungapped.test.o ungapped.o ungapped_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o ungapped_test_LDFLAGS = intron_test_OBJECTS = intron.test.o intron.o intron_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o intron_test_LDFLAGS = protein2genome_test_OBJECTS = protein2genome.test.o protein2genome.o \ protein2dna.o ungapped.o intron.o affine.o frameshift.o phase.o protein2genome_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o protein2genome_test_LDFLAGS = coding2coding_test_OBJECTS = coding2coding.test.o coding2coding.o \ affine.o ungapped.o frameshift.o coding2coding_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o coding2coding_test_LDFLAGS = frameshift_test_OBJECTS = frameshift.test.o frameshift.o frameshift_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o frameshift_test_LDFLAGS = coding2genome_test_OBJECTS = coding2genome.test.o coding2genome.o \ coding2coding.o affine.o ungapped.o frameshift.o intron.o phase.o coding2genome_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o coding2genome_test_LDFLAGS = cdna2genome_test_OBJECTS = cdna2genome.test.o cdna2genome.o affine.o \ ungapped.o frameshift.o intron.o phase.o coding2coding.o \ coding2genome.o est2genome.o cdna2genome_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o cdna2genome_test_LDFLAGS = genome2genome_test_OBJECTS = genome2genome.test.o genome2genome.o \ coding2coding.o affine.o ungapped.o frameshift.o intron.o phase.o \ coding2genome.o cdna2genome.o est2genome.o genome2genome_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o genome2genome_test_LDFLAGS = phase_test_OBJECTS = phase.test.o phase.o intron.o phase_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o phase_test_LDFLAGS = modeltype_test_OBJECTS = modeltype.test.o modeltype.o ungapped.o \ affine.o est2genome.o ner.o protein2dna.o protein2genome.o \ coding2coding.o coding2genome.o genome2genome.o cdna2genome.o \ frameshift.o phase.o intron.o modeltype_test_DEPENDENCIES = $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o modeltype_test_LDFLAGS = bootstrapper_OBJECTS = bootstrapper.o modeltype.o ungapped.o affine.o \ est2genome.o ner.o protein2dna.o protein2genome.o coding2coding.o \ coding2genome.o genome2genome.o cdna2genome.o frameshift.o intron.o \ phase.o bootstrapper_DEPENDENCIES = $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/comparison/comparison.o \ $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/bsdp/heuristic.o $(top_srcdir)/src/sdp/sdp.o \ $(top_srcdir)/src/sdp/scheduler.o $(top_srcdir)/src/sdp/straceback.o \ $(top_srcdir)/src/sdp/lookahead.o $(top_srcdir)/src/sdp/boundary.o \ $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o bootstrapper_LDFLAGS = CFLAGS = @CFLAGS@ COMPILE = $(CC) $(DEFS) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) CCLD = $(CC) LINK = $(CCLD) $(AM_CFLAGS) $(CFLAGS) $(LDFLAGS) -o $@ HEADERS = $(noinst_HEADERS) DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best SOURCES = $(edit_distance_test_SOURCES) $(affine_test_SOURCES) $(protein2dna_test_SOURCES) $(ner_test_SOURCES) $(est2genome_test_SOURCES) $(ungapped_test_SOURCES) $(intron_test_SOURCES) $(protein2genome_test_SOURCES) $(coding2coding_test_SOURCES) $(frameshift_test_SOURCES) $(coding2genome_test_SOURCES) $(cdna2genome_test_SOURCES) $(genome2genome_test_SOURCES) $(phase_test_SOURCES) $(modeltype_test_SOURCES) $(bootstrapper_SOURCES) OBJECTS = $(edit_distance_test_OBJECTS) $(affine_test_OBJECTS) $(protein2dna_test_OBJECTS) $(ner_test_OBJECTS) $(est2genome_test_OBJECTS) $(ungapped_test_OBJECTS) $(intron_test_OBJECTS) $(protein2genome_test_OBJECTS) $(coding2coding_test_OBJECTS) $(frameshift_test_OBJECTS) $(coding2genome_test_OBJECTS) $(cdna2genome_test_OBJECTS) $(genome2genome_test_OBJECTS) $(phase_test_OBJECTS) $(modeltype_test_OBJECTS) $(bootstrapper_OBJECTS) all: all-redirect .SUFFIXES: .SUFFIXES: .S .c .o .s $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps src/model/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status mostlyclean-noinstPROGRAMS: clean-noinstPROGRAMS: -test -z "$(noinst_PROGRAMS)" || rm -f $(noinst_PROGRAMS) distclean-noinstPROGRAMS: maintainer-clean-noinstPROGRAMS: .c.o: $(COMPILE) -c $< .s.o: $(COMPILE) -c $< .S.o: $(COMPILE) -c $< mostlyclean-compile: -rm -f *.o core *.core clean-compile: distclean-compile: -rm -f *.tab.c maintainer-clean-compile: edit_distance.test: $(edit_distance_test_OBJECTS) $(edit_distance_test_DEPENDENCIES) @rm -f edit_distance.test $(LINK) $(edit_distance_test_LDFLAGS) $(edit_distance_test_OBJECTS) $(edit_distance_test_LDADD) $(LIBS) affine.test: $(affine_test_OBJECTS) $(affine_test_DEPENDENCIES) @rm -f affine.test $(LINK) $(affine_test_LDFLAGS) $(affine_test_OBJECTS) $(affine_test_LDADD) $(LIBS) protein2dna.test: $(protein2dna_test_OBJECTS) $(protein2dna_test_DEPENDENCIES) @rm -f protein2dna.test $(LINK) $(protein2dna_test_LDFLAGS) $(protein2dna_test_OBJECTS) $(protein2dna_test_LDADD) $(LIBS) ner.test: $(ner_test_OBJECTS) $(ner_test_DEPENDENCIES) @rm -f ner.test $(LINK) $(ner_test_LDFLAGS) $(ner_test_OBJECTS) $(ner_test_LDADD) $(LIBS) est2genome.test: $(est2genome_test_OBJECTS) $(est2genome_test_DEPENDENCIES) @rm -f est2genome.test $(LINK) $(est2genome_test_LDFLAGS) $(est2genome_test_OBJECTS) $(est2genome_test_LDADD) $(LIBS) ungapped.test: $(ungapped_test_OBJECTS) $(ungapped_test_DEPENDENCIES) @rm -f ungapped.test $(LINK) $(ungapped_test_LDFLAGS) $(ungapped_test_OBJECTS) $(ungapped_test_LDADD) $(LIBS) intron.test: $(intron_test_OBJECTS) $(intron_test_DEPENDENCIES) @rm -f intron.test $(LINK) $(intron_test_LDFLAGS) $(intron_test_OBJECTS) $(intron_test_LDADD) $(LIBS) protein2genome.test: $(protein2genome_test_OBJECTS) $(protein2genome_test_DEPENDENCIES) @rm -f protein2genome.test $(LINK) $(protein2genome_test_LDFLAGS) $(protein2genome_test_OBJECTS) $(protein2genome_test_LDADD) $(LIBS) coding2coding.test: $(coding2coding_test_OBJECTS) $(coding2coding_test_DEPENDENCIES) @rm -f coding2coding.test $(LINK) $(coding2coding_test_LDFLAGS) $(coding2coding_test_OBJECTS) $(coding2coding_test_LDADD) $(LIBS) frameshift.test: $(frameshift_test_OBJECTS) $(frameshift_test_DEPENDENCIES) @rm -f frameshift.test $(LINK) $(frameshift_test_LDFLAGS) $(frameshift_test_OBJECTS) $(frameshift_test_LDADD) $(LIBS) coding2genome.test: $(coding2genome_test_OBJECTS) $(coding2genome_test_DEPENDENCIES) @rm -f coding2genome.test $(LINK) $(coding2genome_test_LDFLAGS) $(coding2genome_test_OBJECTS) $(coding2genome_test_LDADD) $(LIBS) cdna2genome.test: $(cdna2genome_test_OBJECTS) $(cdna2genome_test_DEPENDENCIES) @rm -f cdna2genome.test $(LINK) $(cdna2genome_test_LDFLAGS) $(cdna2genome_test_OBJECTS) $(cdna2genome_test_LDADD) $(LIBS) genome2genome.test: $(genome2genome_test_OBJECTS) $(genome2genome_test_DEPENDENCIES) @rm -f genome2genome.test $(LINK) $(genome2genome_test_LDFLAGS) $(genome2genome_test_OBJECTS) $(genome2genome_test_LDADD) $(LIBS) phase.test: $(phase_test_OBJECTS) $(phase_test_DEPENDENCIES) @rm -f phase.test $(LINK) $(phase_test_LDFLAGS) $(phase_test_OBJECTS) $(phase_test_LDADD) $(LIBS) modeltype.test: $(modeltype_test_OBJECTS) $(modeltype_test_DEPENDENCIES) @rm -f modeltype.test $(LINK) $(modeltype_test_LDFLAGS) $(modeltype_test_OBJECTS) $(modeltype_test_LDADD) $(LIBS) bootstrapper: $(bootstrapper_OBJECTS) $(bootstrapper_DEPENDENCIES) @rm -f bootstrapper $(LINK) $(bootstrapper_LDFLAGS) $(bootstrapper_OBJECTS) $(bootstrapper_LDADD) $(LIBS) tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = src/model distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done check-TESTS: $(TESTS) @failed=0; all=0; \ srcdir=$(srcdir); export srcdir; \ for tst in $(TESTS); do \ if test -f $$tst; then dir=.; \ else dir="$(srcdir)"; fi; \ if $(TESTS_ENVIRONMENT) $$dir/$$tst; then \ all=`expr $$all + 1`; \ echo "PASS: $$tst"; \ elif test $$? -ne 77; then \ all=`expr $$all + 1`; \ failed=`expr $$failed + 1`; \ echo "FAIL: $$tst"; \ fi; \ done; \ if test "$$failed" -eq 0; then \ banner="All $$all tests passed"; \ else \ banner="$$failed of $$all tests failed"; \ fi; \ dashes=`echo "$$banner" | sed s/./=/g`; \ echo "$$dashes"; \ echo "$$banner"; \ echo "$$dashes"; \ test "$$failed" -eq 0 info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am $(MAKE) $(AM_MAKEFLAGS) check-TESTS check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall: uninstall-am all-am: Makefile $(PROGRAMS) $(HEADERS) all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-noinstPROGRAMS mostlyclean-compile \ mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-noinstPROGRAMS clean-compile clean-tags clean-generic \ mostlyclean-am clean: clean-am distclean-am: distclean-noinstPROGRAMS distclean-compile distclean-tags \ distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-noinstPROGRAMS \ maintainer-clean-compile maintainer-clean-tags \ maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: mostlyclean-noinstPROGRAMS distclean-noinstPROGRAMS \ clean-noinstPROGRAMS maintainer-clean-noinstPROGRAMS \ mostlyclean-compile distclean-compile clean-compile \ maintainer-clean-compile tags mostlyclean-tags distclean-tags \ clean-tags maintainer-clean-tags distdir check-TESTS info-am info \ dvi-am dvi check check-am installcheck-am installcheck install-exec-am \ install-exec install-data-am install-data install-am install \ uninstall-am uninstall all-redirect all-am all installdirs \ mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/src/model/edit_distance.test.c0000644000175000017500000000624611162714316016252 00000000000000/****************************************************************\ * * * edit distance model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "edit_distance.h" #include "alignment.h" #include "optimal.h" #include "sequence.h" int Argument_main(Argument *arg){ register C4_Model *edit_distance = EditDistance_create(); register Alphabet *alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE); register Sequence *query = Sequence_create("qy", NULL, "gtgcactacgtacgtnatcgtgcttnaacgcg" "tacgtgatngtgcttgaacgtacgtacgtgatcg" "tgcttga", 0, Sequence_Strand_UNKNOWN, alphabet), *target = Sequence_create("tg", NULL, "actacgtacgtgatcgtgcaacgcactacg" "tacgtgancttgaacgcactacgtacgtgatcg" "tgcntgaacgn", 0, Sequence_Strand_UNKNOWN, alphabet); register gchar *qy = Sequence_get_str(query), *tg = Sequence_get_str(target); register EditDistance_Data *edd = EditDistance_Data_create(qy, tg); /**/ register C4_Score score; register Alignment *alignment; register Optimal *optimal = Optimal_create(edit_distance, NULL, Optimal_Type_SCORE |Optimal_Type_PATH, FALSE); /**/ register Region *region = Region_create(0, 0, edd->query_len, edd->target_len); score = Optimal_find_score(optimal, region, edd, NULL); g_message("Score is [%d] (expect -23)", score); g_assert(score == -23); /**/ alignment = Optimal_find_path(optimal, region, edd, C4_IMPOSSIBLY_LOW_SCORE, NULL); g_message("Alignment score is [%d] (expect -23)", alignment->score); Alignment_display(alignment, query, target, NULL, NULL, NULL, stdout); g_assert(score == alignment->score); /**/ C4_Model_destroy(edit_distance); Alignment_destroy(alignment); Optimal_destroy(optimal); EditDistance_Data_destroy(edd); Sequence_destroy(query); Sequence_destroy(target); Alphabet_destroy(alphabet); Region_destroy(region); g_free(qy); g_free(tg); return 0; } exonerate-2.4.0/src/model/edit_distance.c0000644000175000017500000000706511162714316015274 00000000000000/****************************************************************\ * * * edit distance model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strlen() */ #include "edit_distance.h" EditDistance_Data *EditDistance_Data_create(gchar *query, gchar *target){ register EditDistance_Data *edd = g_new(EditDistance_Data, 1); edd->query = g_strdup(query); edd->target = g_strdup(target); edd->query_len = strlen(query); edd->target_len = strlen(target); return edd; } void EditDistance_Data_destroy(EditDistance_Data *edd){ g_free(edd->query); g_free(edd->target); g_free(edd); return; } static C4_Score edit_match_calc_func(gint query_pos, gint target_pos, gpointer user_data){ register EditDistance_Data *edd = (EditDistance_Data*)user_data; g_assert(query_pos >= 0); g_assert(target_pos >= 0); g_assert(query_pos < edd->query_len); g_assert(target_pos < edd->target_len); if(edd->query[query_pos] == edd->target[target_pos]) return 0; else return -1; } C4_Model *EditDistance_create(void){ register C4_Model *edit_distance = C4_Model_create("edit distance"); register C4_State *main_state = C4_Model_add_state(edit_distance, "main"); register C4_Calc *indel_calc = C4_Model_add_calc(edit_distance, "indel", -1, NULL, NULL, NULL, NULL, NULL, NULL, C4_Protect_NONE), *match_calc = C4_Model_add_calc(edit_distance, "match", 0, edit_match_calc_func, NULL, NULL, NULL, NULL, NULL, C4_Protect_NONE); /**/ C4_Model_configure_start_state(edit_distance, C4_Scope_CORNER, NULL, NULL); C4_Model_configure_end_state(edit_distance, C4_Scope_CORNER, NULL, NULL); /**/ C4_Model_add_transition(edit_distance, "start to main", NULL, main_state, 0, 0, NULL, C4_Label_NONE, NULL); C4_Model_add_transition(edit_distance, "main to end", main_state, NULL, 0, 0, NULL, C4_Label_NONE, NULL); C4_Model_add_transition(edit_distance, "match", main_state, main_state, 1, 1, match_calc, C4_Label_MATCH, NULL); C4_Model_add_transition(edit_distance, "query insert", main_state, main_state, 1, 0, indel_calc, C4_Label_GAP, NULL); C4_Model_add_transition(edit_distance, "target insert", main_state, main_state, 0, 1, indel_calc, C4_Label_GAP, NULL); /**/ C4_Model_add_portal(edit_distance, "match portal", match_calc, 1, 1); /**/ C4_Model_close(edit_distance); return edit_distance; } exonerate-2.4.0/src/model/affine.test.c0000644000175000017500000001054611162714302014674 00000000000000/****************************************************************\ * * * Module for various affine gapped models * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "affine.h" #include "alignment.h" #include "optimal.h" static void test_model(Affine_Model_Type type, C4_Score expected_score){ register C4_Model *affine = Affine_create(type, Alphabet_Type_PROTEIN, Alphabet_Type_PROTEIN, FALSE); /**/ register C4_Score score; register Alignment *alignment; register gchar *name = NULL; register Optimal *optimal; register Alphabet *alphabet = Alphabet_create(Alphabet_Type_PROTEIN, FALSE); register Sequence *query = Sequence_create("qy", NULL, "MEEPQSDPSVEPPLSQETFSDLWKLL", 0, Sequence_Strand_UNKNOWN, alphabet), *target = Sequence_create("tg", NULL, "PENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP" "EHSCETFDIWKWCPIECDFLNVISEPNEPIPSQ", 0, Sequence_Strand_UNKNOWN, alphabet); register Affine_Data *ad = Affine_Data_create(query, target, FALSE); /* "TVPEPIVTEPTTITEPEVPEKEEPKAEVEKTKKAKGSKPKKASKPRNPA" "SHPTYEEMIKDAIVSLKEKNGSSQYAIA", "VVEEQAAPETVKDEANPPAKSGKAKKETKAKKPAAPRKRSATPTHPPYF" "EMIKDAIVTLKERTGSSQHAIT", */ /* Paracelsus challenge sequences: */ /* "MTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE", "MTKKAILALNTAKFLRTQAAVLAAKLEKLGAQEANDNAVDLEDTADDLYKTLLVLA", */ Region region; /**/ switch(type){ case Affine_Model_Type_GLOBAL: name = "global affine"; break; case Affine_Model_Type_BESTFIT: name = "bestfit affine"; break; case Affine_Model_Type_LOCAL: name = "local affine"; break; case Affine_Model_Type_OVERLAP: name = "overlap affine"; break; default: g_error("Model Type not supported [%d]", type); break; } optimal = Optimal_create(affine, NULL, Optimal_Type_SCORE|Optimal_Type_PATH, FALSE); /**/ Region_init_static(®ion, 0, 0, ad->ud.query->len, ad->ud.target->len); score = Optimal_find_score(optimal, ®ion, ad, NULL); g_message("Score for [%s] [%d]", name, score); g_assert(score == expected_score); /**/ alignment = Optimal_find_path(optimal, ®ion, ad, C4_IMPOSSIBLY_LOW_SCORE, NULL); g_message("Alignment score is [%d]", alignment->score); Alignment_display(alignment, query, target, Affine_Data_get_dna_submat(ad), Affine_Data_get_protein_submat(ad), NULL, stdout); Alignment_display_vulgar(alignment, query, target, stdout); g_assert(score == alignment->score); /**/ C4_Model_destroy(affine); Alignment_destroy(alignment); Optimal_destroy(optimal); Affine_Data_destroy(ad); Sequence_destroy(query); Sequence_destroy(target); Alphabet_destroy(alphabet); return; } int Argument_main(Argument *arg){ Match_ArgumentSet_create(arg); Affine_ArgumentSet_create(arg); Argument_process(arg, "affine.test", NULL, NULL); test_model(Affine_Model_Type_GLOBAL, -151); test_model(Affine_Model_Type_BESTFIT, 18); test_model(Affine_Model_Type_LOCAL, 32); test_model(Affine_Model_Type_OVERLAP, 18); return 0; } exonerate-2.4.0/src/model/affine.c0000644000175000017500000002421611162714302013715 00000000000000/****************************************************************\ * * * Module for various affine gapped models * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strlen() */ #include "affine.h" Affine_ArgumentSet *Affine_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static Affine_ArgumentSet aas; if(arg){ as = ArgumentSet_create("Affine Model Options"); ArgumentSet_add_option(as, 'o', "gapopen", "penalty", "Affine gap open penalty", "-12", Argument_parse_int, &aas.gap_open); ArgumentSet_add_option(as, 'e', "gapextend", "penalty", "Affine gap extend penalty", "-4", Argument_parse_int, &aas.gap_extend); ArgumentSet_add_option(as, '\0', "codongapopen", "penalty", "Codon affine gap open penalty", "-18", Argument_parse_int, &aas.codon_gap_open); ArgumentSet_add_option(as, '\0', "codongapextend", "penalty", "Codon affine gap extend penalty", "-8", Argument_parse_int, &aas.codon_gap_extend); Argument_absorb_ArgumentSet(arg, as); if(aas.gap_open > 0) g_warning("Gap open penalty [%d] should be negative", aas.gap_open); if(aas.gap_extend > 0) g_warning("Gap extend penalty [%d] should be negative", aas.gap_extend); if(aas.codon_gap_open > 0) g_warning("Codon gap open penalty [%d] should be negative", aas.codon_gap_open); if(aas.codon_gap_extend > 0) g_warning("Codon gap extend penalty [%d] should be negative", aas.codon_gap_extend); } return &aas; } /**/ void Affine_Data_init(Affine_Data *ad, Sequence *query, Sequence *target, gboolean translate_both){ register Match_Type match_type = Match_Type_find(query->alphabet->type, target->alphabet->type, translate_both); Ungapped_Data_init(&ad->ud, query, target, match_type); if(!ad->aas) ad->aas = Affine_ArgumentSet_create(NULL); return; } Affine_Data *Affine_Data_create(Sequence *query, Sequence *target, gboolean translate_both){ register Affine_Data *ad = g_new0(Affine_Data, 1); Affine_Data_init(ad, query, target, translate_both); return ad; } void Affine_Data_clear(Affine_Data *ad){ Ungapped_Data_clear(&ad->ud); return; } void Affine_Data_destroy(Affine_Data *ad){ Affine_Data_clear(ad); g_free(ad); return; } /**/ static C4_Score affine_gap_open_calc_func(gint query_pos, gint target_pos, gpointer user_data){ register Affine_Data *ad = (Affine_Data*)user_data; return ad->aas->gap_open; } static gchar *affine_gap_open_calc_macro = "(ad->aas->gap_open)"; static C4_Score affine_gap_extend_calc_func(gint query_pos, gint target_pos, gpointer user_data){ register Affine_Data *ad = (Affine_Data*)user_data; return ad->aas->gap_extend; } static gchar *affine_gap_extend_calc_macro = "(ad->aas->gap_extend)"; /**/ static C4_Score affine_codon_gap_open_calc_func(gint query_pos, gint target_pos, gpointer user_data){ register Affine_Data *ad = (Affine_Data*)user_data; return ad->aas->codon_gap_open; } static gchar *affine_codon_gap_open_calc_macro = "(ad->aas->codon_gap_open)"; static C4_Score affine_codon_gap_extend_calc_func(gint query_pos, gint target_pos, gpointer user_data){ register Affine_Data *ad = (Affine_Data*)user_data; return ad->aas->codon_gap_extend; } static gchar *affine_codon_gap_extend_calc_macro = "(ad->aas->codon_gap_extend)"; /**/ gchar *Affine_Model_Type_get_name(Affine_Model_Type type){ register gchar *model_name = NULL; switch(type){ case Affine_Model_Type_GLOBAL: model_name = "global"; break; case Affine_Model_Type_BESTFIT: model_name = "bestfit"; break; case Affine_Model_Type_LOCAL: model_name = "local"; break; case Affine_Model_Type_OVERLAP: model_name = "overlap"; break; default: g_error("Unknown affine model type [%d]", type); break; } return model_name; } C4_Model *Affine_create(Affine_Model_Type type, Alphabet_Type query_type, Alphabet_Type target_type, gboolean translate_both){ register C4_Scope scope = C4_Scope_ANYWHERE; register C4_Model *affine; register C4_State *insert_state, *delete_state; register C4_Transition *match_transition; register C4_Calc *gap_open_calc, *gap_extend_calc; register Affine_ArgumentSet *aas = Affine_ArgumentSet_create(NULL); register Match_Type match_type = Match_Type_find(query_type, target_type, translate_both); register C4_CalcFunc gap_open_calc_func = NULL, gap_extend_calc_func = NULL; register gchar *gap_open_calc_macro = NULL, *gap_extend_calc_macro = NULL; register gchar *model_name = Affine_Model_Type_get_name(type); affine = Ungapped_create(match_type); switch(type){ case Affine_Model_Type_GLOBAL: scope = C4_Scope_CORNER; break; case Affine_Model_Type_BESTFIT: scope = C4_Scope_QUERY; break; case Affine_Model_Type_LOCAL: scope = C4_Scope_ANYWHERE; break; case Affine_Model_Type_OVERLAP: scope = C4_Scope_EDGE; break; default: g_error("Unknown affine model type [%d]", type); break; } g_assert(model_name); model_name = g_strdup_printf("affine:%s:%s", model_name, Match_Type_get_name(match_type)); C4_Model_rename(affine, model_name); g_free(model_name); C4_Model_configure_start_state(affine, scope, NULL, NULL); C4_Model_configure_end_state(affine, scope, NULL, NULL); C4_Model_open(affine); insert_state = C4_Model_add_state(affine, "insert"); delete_state = C4_Model_add_state(affine, "delete"); match_transition = C4_Model_select_single_transition(affine, C4_Label_MATCH); g_assert(match_transition); if(MAX(match_transition->advance_query, match_transition->advance_target) == 3){ gap_open_calc_func = affine_codon_gap_open_calc_func; gap_extend_calc_func = affine_codon_gap_extend_calc_func; gap_open_calc_macro = affine_codon_gap_open_calc_macro; gap_extend_calc_macro = affine_codon_gap_extend_calc_macro; } else { gap_open_calc_func = affine_gap_open_calc_func; gap_extend_calc_func = affine_gap_extend_calc_func; gap_open_calc_macro = affine_gap_open_calc_macro; gap_extend_calc_macro = affine_gap_extend_calc_macro; } gap_open_calc = C4_Model_add_calc(affine, "gap open", aas->gap_open, gap_open_calc_func, gap_open_calc_macro, NULL, NULL, NULL, NULL, C4_Protect_NONE); gap_extend_calc = C4_Model_add_calc(affine, "gap extend", aas->gap_extend, gap_extend_calc_func, gap_extend_calc_macro, NULL, NULL, NULL, NULL, C4_Protect_NONE); /**/ /* Transitions from match */ C4_Model_add_transition(affine, "match to insert", match_transition->input, insert_state, match_transition->advance_query, 0, gap_open_calc, C4_Label_GAP, NULL); C4_Model_add_transition(affine, "match to delete", match_transition->input, delete_state, 0, match_transition->advance_target, gap_open_calc, C4_Label_GAP, NULL); /* Transitions from insert */ C4_Model_add_transition(affine, "insert", insert_state, insert_state, match_transition->advance_query, 0, gap_extend_calc, C4_Label_GAP, NULL); C4_Model_add_transition(affine, "insert to match", insert_state, match_transition->output, 0, 0, NULL, C4_Label_NONE, NULL); /* Transitions from delete */ C4_Model_add_transition(affine, "delete", delete_state, delete_state, 0, match_transition->advance_target, gap_extend_calc, C4_Label_GAP, NULL); C4_Model_add_transition(affine, "delete to match", delete_state, match_transition->output, 0, 0, NULL, C4_Label_NONE, NULL); /**/ C4_Model_add_portal(affine, "match portal", match_transition->calc, match_transition->advance_query, match_transition->advance_target); /**/ C4_Model_append_codegen(affine, "#include \"affine.h\"\n", "register Affine_Data *ad = (Affine_Data*)user_data;\n", NULL); C4_Model_close(affine); return affine; } exonerate-2.4.0/src/model/ungapped.c0000644000175000017500000001732011162714320014266 00000000000000/****************************************************************\ * * * Module for various ungapped alignment models * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "ungapped.h" void Ungapped_Data_init(Ungapped_Data *ud, Sequence *query, Sequence *target, Match_Type match_type){ g_assert(query); g_assert(target); if(!ud->query) ud->query = Sequence_share(query); if(!ud->target) ud->target = Sequence_share(target); if(!ud->mas) ud->mas = Match_ArgumentSet_create(NULL); if(!ud->match_list[match_type]) ud->match_list[match_type] = Match_find(match_type); return; } Ungapped_Data *Ungapped_Data_create(Sequence *query, Sequence *target, Match_Type match_type){ register Ungapped_Data *ud = g_new0(Ungapped_Data, 1); Ungapped_Data_init(ud, query, target, match_type); return ud; } void Ungapped_Data_clear(Ungapped_Data *ud){ register gint i; if(ud->query){ Sequence_destroy(ud->query); ud->query = NULL; } if(ud->target){ Sequence_destroy(ud->target); ud->target = NULL; } for(i = 0; i < Match_Type_TOTAL; i++) if(ud->match_list[i]) ud->match_list[i] = NULL; return; } void Ungapped_Data_destroy(Ungapped_Data *ud){ Ungapped_Data_clear(ud); g_free(ud); return; } /**/ #define Ungapped_CalcFunc(type) \ static C4_Score Ungapped_calc_func_##type(gint query_pos, \ gint target_pos, \ gpointer user_data){ \ register Ungapped_Data *ud = (Ungapped_Data*)user_data; \ register Match *match = ud->match_list[Match_Type_##type]; \ g_assert(ud); \ g_assert(query_pos >= 0); \ g_assert(target_pos >= 0); \ g_assert(query_pos < ud->query->len); \ g_assert(target_pos < ud->target->len); \ g_assert(match); \ return match->score_func(match, ud->query, ud->target, \ query_pos, target_pos); \ } #if 0 #define Ungapped_CalcMacro(type) \ static gchar *Ungapped_calc_macro_##type = \ "ud->match_list[Match_Type_"#type"]->score_func(\n" \ " ud->match_list[Match_Type_"#type"],\n" \ " ud->query, ud->target, %QP, %TP)"; /* FIXME: implement Ungapped_get_calc_macro() */ /* FIXME: add proper macro support for Match * ie. without calling match->score_func() * by calling Match_macro_create() etc */ /* Ungapped_CalcMacro(type) */ #endif /* 0 */ #define Ungapped_CalcElements(type) \ Ungapped_CalcFunc(type) Ungapped_CalcElements(DNA2DNA) Ungapped_CalcElements(PROTEIN2PROTEIN) Ungapped_CalcElements(DNA2PROTEIN) Ungapped_CalcElements(PROTEIN2DNA) Ungapped_CalcElements(CODON2CODON) /**/ C4_Model *Ungapped_create(Match_Type match_type){ register gchar *model_name; register C4_Model *ungapped; register C4_State *match_state; register C4_Calc *match_calc; register C4_CalcFunc calc_func = NULL; register gchar *calc_macro = Match_Type_get_score_macro(match_type); register Match *match = Match_find(match_type); model_name = g_strdup_printf("ungapped:%s", Match_Type_get_name(match->type)); ungapped = C4_Model_create(model_name); g_free(model_name); match_state = C4_Model_add_state(ungapped, "match"); switch(match_type){ case Match_Type_DNA2DNA: calc_func = Ungapped_calc_func_DNA2DNA; break; case Match_Type_PROTEIN2PROTEIN: calc_func = Ungapped_calc_func_PROTEIN2PROTEIN; break; case Match_Type_DNA2PROTEIN: calc_func = Ungapped_calc_func_DNA2PROTEIN; break; case Match_Type_PROTEIN2DNA: calc_func = Ungapped_calc_func_PROTEIN2DNA; break; case Match_Type_CODON2CODON: calc_func = Ungapped_calc_func_CODON2CODON; break; default: g_error("Unknown Match_Type [%d]", match_type); break; } match_calc = C4_Model_add_calc(ungapped, "match", Match_max_score(match), calc_func, calc_macro, NULL, NULL, NULL, NULL, C4_Protect_NONE); /**/ C4_Model_add_transition(ungapped, "start to match", NULL, match_state, 0, 0, NULL, C4_Label_NONE, NULL); C4_Model_add_transition(ungapped, "match to end", match_state, NULL, 0, 0, NULL, C4_Label_NONE, NULL); /* Transitions from match */ C4_Model_add_transition(ungapped, "match", match_state, match_state, match->query->advance, match->target->advance, match_calc, C4_Label_MATCH, match); /* FIXME: need to keep match allocated */ /**/ C4_Model_append_codegen(ungapped, "#include \"ungapped.h\"\n" "#include \"submat.h\"\n", "register Ungapped_Data *ud = (Ungapped_Data*)user_data;\n", " -I" SOURCE_ROOT_DIR "/src/model" " -I" SOURCE_ROOT_DIR "/src/comparison" " -I" SOURCE_ROOT_DIR "/src/c4" " -I" SOURCE_ROOT_DIR "/src/bsdp" " -I" SOURCE_ROOT_DIR "/src/sdp"); C4_Model_close(ungapped); return ungapped; } Alignment *Ungapped_Alignment_create(C4_Model *model, Ungapped_Data *ud, HSP *hsp){ register gint i; register C4_Transition *transition, *start2match = NULL, *match2match = NULL, *match2end = NULL; register Region *region = Region_create(hsp->query_start, hsp->target_start, HSP_query_end(hsp)-hsp->query_start, HSP_target_end(hsp)-hsp->target_start); register Alignment *alignment = Alignment_create(model, region, hsp->score); g_assert(model->transition_list->len == 3); for(i = 0; i < model->transition_list->len; i++){ transition = model->transition_list->pdata[i]; if(transition->input == model->start_state->state) start2match = transition; else if(transition->output == model->end_state->state) match2end = transition; else match2match = transition; } g_assert(start2match); g_assert(match2match); g_assert(match2end); Alignment_add(alignment, start2match, 1); Alignment_add(alignment, match2match, hsp->length); Alignment_add(alignment, match2end, 1); g_assert(Alignment_is_valid(alignment, region, ud)); Region_destroy(region); return alignment; } exonerate-2.4.0/src/model/protein2dna.test.c0000644000175000017500000000733611162714317015702 00000000000000/****************************************************************\ * * * Protein <-> DNA comparison model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "protein2dna.h" #include "alignment.h" #include "optimal.h" #include "frameshift.h" static void test_alignment(C4_Model *model, Sequence *query, Sequence *target, Protein2DNA_Data *p2dd){ register C4_Score score; register Alignment *alignment; register Optimal *optimal = Optimal_create( model, NULL, Optimal_Type_SCORE|Optimal_Type_PATH, FALSE); Region region; Region_init_static(®ion, 0, 0, query->len, target->len); score = Optimal_find_score(optimal, ®ion, p2dd, NULL); g_message("Score is [%d]", score); g_assert(score == 134); alignment = Optimal_find_path(optimal, ®ion, p2dd, C4_IMPOSSIBLY_LOW_SCORE, NULL); g_message("Alignment score is [%d]", alignment->score); Alignment_display(alignment, query, target, NULL, Protein2DNA_Data_get_submat(p2dd), Protein2DNA_Data_get_translate(p2dd), stdout); g_assert(score == alignment->score); Alignment_destroy(alignment); Optimal_destroy(optimal); return; } static void test_protein2dna(Sequence *query, Sequence *target){ register C4_Model *protein2dna = Protein2DNA_create(Affine_Model_Type_LOCAL); register Protein2DNA_Data *p2dd = Protein2DNA_Data_create(query, target); test_alignment(protein2dna, query, target, p2dd); Protein2DNA_Data_destroy(p2dd); C4_Model_destroy(protein2dna); return; } int Argument_main(Argument *arg){ register Alphabet *dna_alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE), *protein_alphabet = Alphabet_create(Alphabet_Type_PROTEIN, FALSE); register Sequence *dna = Sequence_create("dna", NULL, "ATGGCTGACCAGCTGACTGAGGAGCAGATT" "GCAGAGTTCNAAGGAGGCCTTCTCCCTCTTT" "GACAAGGATGGA" "NNACTGTCCATAATTGC" "TGGTACTTCAGCGGTCGATGG" "GATGGCACTCTGACCACC", 0, Sequence_Strand_UNKNOWN, dna_alphabet), *protein = Sequence_create("protein", NULL, "NNNNNNMADQLTEQIAEFKEAFSLFDKDG" "TVHNC" "X" "WYFSGRW" "DGTITT", 0, Sequence_Strand_UNKNOWN, protein_alphabet); Match_ArgumentSet_create(arg); Affine_ArgumentSet_create(arg); Frameshift_ArgumentSet_create(arg); Argument_process(arg, "dna2protein.test", NULL, NULL); g_message("Testing protein2dna"); test_protein2dna(protein, dna); Sequence_destroy(dna); Sequence_destroy(protein); Alphabet_destroy(dna_alphabet); Alphabet_destroy(protein_alphabet); return 0; } exonerate-2.4.0/src/model/protein2dna.c0000644000175000017500000000561311162714317014720 00000000000000/****************************************************************\ * * * Protein <-> DNA comparison model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strlen() */ #include "protein2dna.h" #include "frameshift.h" void Protein2DNA_Data_init(Protein2DNA_Data *p2dd, Sequence *query, Sequence *target){ g_assert(query->alphabet->type == Alphabet_Type_PROTEIN); g_assert(target->alphabet->type == Alphabet_Type_DNA); Affine_Data_init(&p2dd->ad, query, target, FALSE); if(!Protein2DNA_Data_get_Frameshift_Data(p2dd)) Protein2DNA_Data_get_Frameshift_Data(p2dd) = Frameshift_Data_create(); return; } Protein2DNA_Data *Protein2DNA_Data_create( Sequence *query, Sequence *target){ register Protein2DNA_Data *p2dd = g_new0(Protein2DNA_Data, 1); Protein2DNA_Data_init(p2dd, query, target); return p2dd; } void Protein2DNA_Data_clear(Protein2DNA_Data *p2dd){ Affine_Data_clear(&p2dd->ad); if(Protein2DNA_Data_get_Frameshift_Data(p2dd)){ Frameshift_Data_destroy( Protein2DNA_Data_get_Frameshift_Data(p2dd)); Protein2DNA_Data_get_Frameshift_Data(p2dd) = NULL; } return; } void Protein2DNA_Data_destroy(Protein2DNA_Data *p2dd){ Protein2DNA_Data_clear(p2dd); g_free(p2dd); return; } C4_Model *Protein2DNA_create(Affine_Model_Type type){ register gchar *name = g_strdup_printf("protein2dna:%s", Affine_Model_Type_get_name(type)); register C4_Model *model = NULL; register C4_Transition *match_transition; model = Affine_create(type, Alphabet_Type_PROTEIN, Alphabet_Type_DNA, FALSE); g_assert(model); C4_Model_rename(model, name); g_free(name); C4_Model_open(model); match_transition = C4_Model_select_single_transition(model, C4_Label_MATCH); g_assert(match_transition); Frameshift_add(model, match_transition->input, "p2d", FALSE); C4_Model_close(model); return model; } exonerate-2.4.0/src/model/frameshift.c0000644000175000017500000001163411162714316014622 00000000000000/****************************************************************\ * * * Module for frameshift modelling * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "frameshift.h" #include "ungapped.h" Frameshift_ArgumentSet *Frameshift_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static Frameshift_ArgumentSet fas; if(arg){ as = ArgumentSet_create("Frameshift Options"); ArgumentSet_add_option(as, 'f', "frameshift", "penalty", "Frameshift creation penalty", "-28", Argument_parse_int, &fas.frameshift_penalty); Argument_absorb_ArgumentSet(arg, as); if(fas.frameshift_penalty > 0) g_warning("Frameshift penalty should be negative [%d]", fas.frameshift_penalty); } return &fas; } /**/ Frameshift_Data *Frameshift_Data_create(void){ register Frameshift_Data *fd = g_new0(Frameshift_Data, 1); fd->fas = Frameshift_ArgumentSet_create(NULL); return fd; } void Frameshift_Data_destroy(Frameshift_Data *fd){ g_free(fd); return; } /**/ static C4_Score frameshift_penalty_calc_func(gint query_pos, gint target_pos, gpointer user_data){ register Ungapped_Data *ud = user_data; register Frameshift_Data *fd = Ungapped_Data_get_Frameshift_Data(ud); return fd->fas->frameshift_penalty; } static gchar *frameshift_penalty_calc_macro = "(fd->fas->frameshift_penalty)"; /**/ static C4_Calc *Frameshift_find_existing_calc(C4_Model *model){ register gint i; register C4_Calc *calc; for(i = 0; i < model->calc_list->len; i++){ calc = model->calc_list->pdata[i]; if(calc->calc_func == frameshift_penalty_calc_func) return calc; } return NULL; } void Frameshift_add(C4_Model *model, C4_State *match_state, gchar *suffix, gboolean apply_to_query){ register gchar *frameshift_name = g_strconcat("frameshift ", suffix, NULL); register C4_State *frameshift_state = C4_Model_add_state(model, frameshift_name); register C4_Calc *frameshift_calc = Frameshift_find_existing_calc( model); register Frameshift_ArgumentSet *fas = Frameshift_ArgumentSet_create(NULL); register gchar *name; if(!frameshift_calc) frameshift_calc = C4_Model_add_calc(model, "frameshift", fas->frameshift_penalty, frameshift_penalty_calc_func, frameshift_penalty_calc_macro, NULL, NULL, NULL, NULL, C4_Protect_NONE); /* Transitions from match */ name = g_strconcat("frameshift open 1 ", suffix, NULL); C4_Model_add_transition(model, name, match_state, frameshift_state, apply_to_query?1:0, apply_to_query?0:1, frameshift_calc, C4_Label_FRAMESHIFT, NULL); g_free(name); name = g_strconcat("frameshift open 2 ", suffix, NULL); C4_Model_add_transition(model, name, match_state, frameshift_state, apply_to_query?2:0, apply_to_query?0:2, frameshift_calc, C4_Label_FRAMESHIFT, NULL); g_free(name); /* Transition from frameshift */ name = g_strconcat("frameshift close 0 ", suffix, NULL); C4_Model_add_transition(model, name, frameshift_state, match_state, 0, 0, NULL, C4_Label_NONE, NULL); g_free(name); name = g_strconcat("frameshift close 3 ", suffix, NULL); C4_Model_add_transition(model, name, frameshift_state, match_state, apply_to_query?3:0, apply_to_query?0:3, NULL, C4_Label_FRAMESHIFT, NULL); g_free(name); C4_Model_append_codegen(model, NULL, "register Frameshift_Data *fd = ud->frameshift_data;\n", NULL); g_free(frameshift_name); return; } /* A frameshift state is used, rather than using 0/4 and 0/5 * transitions to keep max_advance for most models down to 3 * as this is more efficient the current viterbi implementation. */ exonerate-2.4.0/src/model/ner.test.c0000644000175000017500000000612511162714317014234 00000000000000/****************************************************************\ * * * Model for alignments with non-equivalenced regions * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "ner.h" #include "alignment.h" #include "optimal.h" int Argument_main(Argument *arg){ register C4_Score score; register C4_Model *ner; register Alignment *alignment; register Optimal *optimal; register Alphabet *dna_alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE); register Sequence *query = Sequence_create("qy", NULL, "TTTTATCTTCCCAAGAGNCCCCATNNNGCGA" "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" "AAAAAAAAAAAAAA" "GTGATTGAAATGTGGATGAAACATTTC", 0, Sequence_Strand_UNKNOWN, dna_alphabet), *target = Sequence_create("tg", NULL, "TTTTATCTTCCCAAGAGCCCCATGAGGCGA" "TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT" "TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT" "TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT" "GTGANTGAAATGTGGATGAACATTTC", 0, Sequence_Strand_UNKNOWN, dna_alphabet); register NER_Data *nd; /**/ Region region; Match_ArgumentSet_create(arg); Affine_ArgumentSet_create(arg); NER_ArgumentSet_create(arg); Argument_process(arg, "ner.test", NULL, NULL); ner = NER_create(Alphabet_Type_DNA, Alphabet_Type_DNA); nd = NER_Data_create(query, target); optimal = Optimal_create(ner, NULL, Optimal_Type_SCORE|Optimal_Type_PATH, FALSE); /**/ Region_init_static(®ion, 0, 0, query->len, target->len); score = Optimal_find_score(optimal, ®ion, nd, NULL); g_message("Score is [%d]", score); g_assert(score == 208); /**/ alignment = Optimal_find_path(optimal, ®ion, nd, C4_IMPOSSIBLY_LOW_SCORE, NULL); g_message("Alignment score is [%d]", alignment->score); Alignment_display(alignment, query, target, NER_Data_get_dna_submat(nd), NER_Data_get_protein_submat(nd), NULL, stdout); g_assert(score == alignment->score); /**/ Alignment_destroy(alignment); Optimal_destroy(optimal); NER_Data_destroy(nd); /**/ C4_Model_destroy(ner); Alphabet_destroy(dna_alphabet); Sequence_destroy(query); Sequence_destroy(target); return 0; } exonerate-2.4.0/src/model/ner.c0000644000175000017500000001064611162714317013261 00000000000000/****************************************************************\ * * * Model for alignments with non-equivalenced regions * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strlen() */ #include "ner.h" NER_ArgumentSet *NER_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static NER_ArgumentSet nas; if(arg){ as = ArgumentSet_create("NER Model Options"); ArgumentSet_add_option(as, 0, "minner", "length", "Minimum NER length", "10", Argument_parse_int, &nas.min_ner); ArgumentSet_add_option(as, 0, "maxner", "length", "Maximum NER length", "50000", Argument_parse_int, &nas.max_ner); ArgumentSet_add_option(as, 0, "neropen", "score", "NER open penalty", "-20", Argument_parse_int, &nas.ner_open_penalty); Argument_absorb_ArgumentSet(arg, as); } return &nas; } NER_Data *NER_Data_create(Sequence *query, Sequence *target){ register NER_Data *nd = g_new0(NER_Data, 1); Affine_Data_init(&nd->ad, query, target, FALSE); nd->nas = NER_ArgumentSet_create(NULL); return nd; } void NER_Data_destroy(NER_Data *nd){ Affine_Data_clear(&nd->ad); g_free(nd); return; } /* Calc functions */ static C4_Score ner_ner_open_calc_func(gint query_pos, gint target_pos, gpointer user_data){ register NER_Data *nd = (NER_Data*)user_data; return nd->nas->ner_open_penalty; } static gchar *ner_ner_open_calc_macro = "(nd->nas->ner_open_penalty)\n"; /**/ C4_Model *NER_create(Alphabet_Type query_type, Alphabet_Type target_type){ register C4_Model *ner = Affine_create(Affine_Model_Type_LOCAL, query_type, target_type, FALSE); register C4_State *ner_state; register C4_Calc *ner_open_calc; register NER_ArgumentSet *nas = NER_ArgumentSet_create(NULL); register C4_Transition *match_transition; register gchar *model_name = g_strdup_printf("NER:%s", ner->name); C4_Model_rename(ner, model_name); g_free(model_name); C4_Model_open(ner); match_transition = C4_Model_select_single_transition(ner, C4_Label_MATCH); g_assert(match_transition); ner_state = C4_Model_add_state(ner, "ner"); ner_open_calc = C4_Model_add_calc(ner, "ner open", nas->ner_open_penalty, ner_ner_open_calc_func, ner_ner_open_calc_macro, NULL, NULL, NULL, NULL, C4_Protect_NONE); /**/ /* Transitions from match */ C4_Model_add_transition(ner, "match to ner", match_transition->input, ner_state, 1, 1, ner_open_calc, C4_Label_NER, NULL); /* Transitions from ner */ C4_Model_add_transition(ner, "ner to match", ner_state, match_transition->input, 0, 0, NULL, C4_Label_NONE, NULL); C4_Model_add_transition(ner, "ner loop insert", ner_state, ner_state, 1, 0, NULL, C4_Label_NER, NULL); C4_Model_add_transition(ner, "ner loop delete", ner_state, ner_state, 0, 1, NULL, C4_Label_NER, NULL); /**/ C4_Model_add_span(ner, "ner span", ner_state, nas->min_ner, nas->max_ner, nas->min_ner, nas->max_ner); /**/ C4_Model_append_codegen(ner, "#include \"ner.h\"\n", "register NER_Data *nd = (NER_Data*)user_data;\n", NULL); C4_Model_close(ner); return ner; } exonerate-2.4.0/src/model/est2genome.test.c0000644000175000017500000000721711162714316015522 00000000000000/****************************************************************\ * * * Module for EST <-> genome alignments * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "est2genome.h" #include "alignment.h" #include "optimal.h" int Argument_main(Argument *arg){ register C4_Model *est2genome; register C4_Score score; register Alignment *alignment; register Optimal *optimal; register gchar *query_seq = "CGATCGATCGNATCGATCGATC" "CATCTATCTAGCGAGCGATCTA", *target_seq = "CGATCGATCGATCGATCGATC" "GTNNNNNNNNNNNNNNNNNNNN" "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN" "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN" "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN" "NNNNNNNNNNNNNNNNNNNNNNNNNNNAG" /* "CTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAC" */ "CATCTATCTANNNGCGAGCGATCTA"; register Alphabet *dna_alphabet = Alphabet_create( Alphabet_Type_DNA, FALSE); register Sequence *query = Sequence_create("query", NULL, query_seq, 0, Sequence_Strand_UNKNOWN, dna_alphabet), *target = Sequence_create("target", NULL, target_seq, 0, Sequence_Strand_UNKNOWN, dna_alphabet); /**/ register Region *region = Region_create(0, 0, query->len, target->len); register EST2Genome_Data *e2gd; Match_ArgumentSet_create(arg); Affine_ArgumentSet_create(arg); Intron_ArgumentSet_create(arg); Viterbi_ArgumentSet_create(arg); Argument_process(arg, "est2genome.test", NULL, NULL); est2genome = EST2Genome_create(); g_message("Has [%d] shadows", est2genome->shadow_list->len); optimal = Optimal_create(est2genome, NULL, Optimal_Type_SCORE|Optimal_Type_PATH, FALSE); e2gd = EST2Genome_Data_create(query, target); score = Optimal_find_score(optimal, region, e2gd, NULL); g_message("Score is [%d]", score); g_assert(score == 157); /**/ alignment = Optimal_find_path(optimal, region, e2gd, C4_IMPOSSIBLY_LOW_SCORE, NULL); g_message("Alignment score is [%d]", alignment->score); Alignment_display(alignment, query, target, EST2Genome_Data_get_submat(e2gd), EST2Genome_Data_get_submat(e2gd), NULL, stdout); g_assert(score == alignment->score); /**/ Alphabet_destroy(dna_alphabet); C4_Model_destroy(est2genome); Alignment_destroy(alignment); Optimal_destroy(optimal); EST2Genome_Data_destroy(e2gd); Sequence_destroy(query); Sequence_destroy(target); Region_destroy(region); return 0; } exonerate-2.4.0/src/model/est2genome.c0000644000175000017500000000744511162714316014547 00000000000000/****************************************************************\ * * * Module for EST <-> genome alignments * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For tolower() */ #include /* For strlen() */ #include "est2genome.h" void EST2Genome_Data_init(EST2Genome_Data *e2gd, Sequence *query, Sequence *target){ g_assert(e2gd); g_assert(query); g_assert(target); Affine_Data_init(&e2gd->ad, query, target, FALSE); if(!EST2Genome_Data_get_Intron_Data(e2gd)) EST2Genome_Data_get_Intron_Data(e2gd) = Intron_Data_create(); return; } EST2Genome_Data *EST2Genome_Data_create(Sequence *query, Sequence *target){ register EST2Genome_Data *e2gd = g_new0(EST2Genome_Data, 1); EST2Genome_Data_init(e2gd, query, target); return e2gd; } void EST2Genome_Data_clear(EST2Genome_Data *e2gd){ g_assert(e2gd); if(EST2Genome_Data_get_Intron_Data(e2gd)){ Intron_Data_destroy(EST2Genome_Data_get_Intron_Data(e2gd)); EST2Genome_Data_get_Intron_Data(e2gd) = NULL; } Affine_Data_clear(&e2gd->ad); return; } void EST2Genome_Data_destroy(EST2Genome_Data *e2gd){ g_assert(e2gd); EST2Genome_Data_clear(e2gd); g_free(e2gd); return; } /**/ C4_Model *EST2Genome_create(void){ register C4_Model *est2genome = Affine_create( Affine_Model_Type_LOCAL, Alphabet_Type_DNA, Alphabet_Type_DNA, FALSE); register C4_Model *intron_forward_model, *intron_reverse_model; register C4_Transition *match_forward_transition, *match_reverse_transition; register GPtrArray *match_transition_list; C4_Model_rename(est2genome, "est2genome"); C4_Model_open(est2genome); C4_Model_make_stereo(est2genome, "forward", "reverse"); match_transition_list = C4_Model_select_transitions(est2genome, C4_Label_MATCH); g_assert(match_transition_list); g_assert(match_transition_list->len == 2); match_forward_transition = match_transition_list->pdata[0]; match_reverse_transition = match_transition_list->pdata[1]; g_ptr_array_free(match_transition_list, TRUE); intron_forward_model = Intron_create("forward", FALSE, TRUE, TRUE); intron_reverse_model = Intron_create("reverse", FALSE, TRUE, FALSE); C4_Model_insert(est2genome, intron_forward_model, match_forward_transition->input, match_forward_transition->input); C4_Model_insert(est2genome, intron_reverse_model, match_reverse_transition->input, match_reverse_transition->input); C4_Model_destroy(intron_forward_model); C4_Model_destroy(intron_reverse_model); C4_Model_append_codegen(est2genome, "#include \"est2genome.h\"\n", NULL, NULL); C4_Model_close(est2genome); return est2genome; } exonerate-2.4.0/src/model/intron.c0000644000175000017500000006745111162714317014014 00000000000000/****************************************************************\ * * * Module of splice site and intron modelling * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "intron.h" #include "ungapped.h" Intron_ArgumentSet *Intron_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static Intron_ArgumentSet ias; if(arg){ as = ArgumentSet_create("Intron Modelling Options"); ArgumentSet_add_option(as, 0, "minintron", "length", "Minimum intron length", "30", Argument_parse_int, &ias.min_intron); ArgumentSet_add_option(as, 0, "maxintron", "length", "Maximum intron length", "200000", Argument_parse_int, &ias.max_intron); ArgumentSet_add_option(as, 'i', "intronpenalty", "score", "Intron Opening penalty", "-30", Argument_parse_int, &ias.intron_open_penalty); Argument_absorb_ArgumentSet(arg, as); ias.sps = NULL; if(ias.intron_open_penalty > 0) g_warning("Intron open penalty should be negative [%d]", ias.intron_open_penalty); } else { if(!ias.sps) ias.sps = SplicePredictorSet_create(); /* FIXME: this should be freed somewhere */ } return &ias; } /**/ static gchar *Intron_ChainData_get_seq_func(gint start, gint len, gpointer user_data){ register Sequence *s = user_data; return Sequence_get_substr(s, start, len); } void Intron_ChainData_init_splice_prediction(Intron_ChainData *icd, Sequence *s, SplicePredictorSet *sps, SpliceType type, gboolean use_single){ if(!icd->sps) icd->sps = g_new0(SplicePrediction_Set, 1); icd->sps = SplicePrediction_Set_add(icd->sps, sps, type, s->len, Intron_ChainData_get_seq_func, use_single, s); return; } static void Intron_ChainData_init(Intron_ChainData *icd){ icd->curr_intron_start = 0; return; } static Intron_ChainData *Intron_ChainData_create(void){ register Intron_ChainData *icd = g_new0(Intron_ChainData, 1); Intron_ChainData_init(icd); return icd; } static void Intron_ChainData_destroy(Intron_ChainData *icd){ if(icd->sps) SplicePrediction_Set_destroy(icd->sps); g_free(icd); return; } /**/ Intron_Data *Intron_Data_create(void){ register Intron_Data *id = g_new0(Intron_Data, 1); id->ias = Intron_ArgumentSet_create(NULL); id->query_data = Intron_ChainData_create(); id->target_data = Intron_ChainData_create(); return id; } void Intron_Data_destroy(Intron_Data *id){ Intron_ChainData_destroy(id->query_data); Intron_ChainData_destroy(id->target_data); g_free(id); return; } /**/ static void intron_init_func(Region *region, gpointer user_data){ register Ungapped_Data *ud = user_data; register Intron_Data *id = Ungapped_Data_get_Intron_Data(ud); if(!id->curr_region){ id->curr_region = Region_share(region); Intron_ChainData_init(id->query_data); Intron_ChainData_init(id->target_data); } return; } static gchar *intron_init_macro = "if(!id->curr_region){\n" " id->curr_region = Region_share(region);\n" " Intron_ChainData_init(id->query_data);\n" " Intron_ChainData_init(id->target_data);\n" " }\n"; static void intron_exit_func(Region *region, gpointer user_data){ register Ungapped_Data *ud = user_data; register Intron_Data *id = Ungapped_Data_get_Intron_Data(ud); if(id->curr_region){ Region_destroy(id->curr_region); id->curr_region = NULL; } return; } static gchar *intron_exit_macro = "if(id->curr_region){\n" " Region_destroy(id->curr_region);\n" " id->curr_region = NULL;\n" " }\n"; /**/ #define Intron_CalcFunc(site, chain, is_pre) \ static C4_Score Intron_calc_##site##_##chain( \ gint query_pos, gint target_pos, gpointer user_data){ \ register Ungapped_Data *ud = user_data; \ register Intron_Data *id \ = Ungapped_Data_get_Intron_Data(ud); \ register C4_Score score = 0; \ register gint intron_length; \ g_assert(id->chain##_data->sps->site); \ g_assert(id->curr_region); \ if(is_pre){ \ score = id->ias->intron_open_penalty; \ } else { \ intron_length = chain##_pos \ - id->chain##_data->curr_intron_start \ + 2; \ if((intron_length < id->ias->min_intron) \ || (intron_length > id->ias->max_intron)) \ return C4_IMPOSSIBLY_LOW_SCORE; \ } \ score += SplicePrediction_get(id->chain##_data->sps->site, \ chain##_pos); \ return score; \ } #define Intron_JointCalcFunc(site, is_pre) \ static C4_Score Intron_joint_calc_##site( \ gint query_pos, gint target_pos, gpointer user_data){ \ register Ungapped_Data *ud = user_data; \ register Intron_Data *id \ = Ungapped_Data_get_Intron_Data(ud); \ register C4_Score score = 0; \ register gint intron_length; \ g_assert(id->query##_data->sps->site); \ g_assert(id->target##_data->sps->site); \ g_assert(id->curr_region); \ if(is_pre){ \ score = id->ias->intron_open_penalty; \ } else { \ intron_length = query_pos \ - id->query_data->curr_intron_start \ + 2; \ if((intron_length < id->ias->min_intron) \ || (intron_length > id->ias->max_intron)) \ return C4_IMPOSSIBLY_LOW_SCORE; \ intron_length = target_pos \ - id->target_data->curr_intron_start \ + 2; \ if((intron_length < id->ias->min_intron) \ || (intron_length > id->ias->max_intron)) \ return C4_IMPOSSIBLY_LOW_SCORE; \ } \ score += SplicePrediction_get(id->query##_data->sps->site, \ query_pos); \ score += SplicePrediction_get(id->target##_data->sps->site, \ target_pos); \ return score; \ } static gchar *Intron_get_calc_macro_splice(gboolean on_query, gchar *site){ register gchar *chain = on_query?"query":"target"; register gchar *chain_macro = on_query?"QP":"TP"; return g_strdup_printf("SplicePrediction_get(id->%s_data->sps->%s, %%%s)", chain, site, chain_macro); } static gchar *Intron_get_calc_macro_length(gboolean on_query, gchar *input_macro){ register gchar *chain = on_query?"query":"target"; register gchar *chain_macro = on_query?"QP":"TP"; register gchar *intron_length = g_strdup_printf( "(%%%s - id->%s_data->curr_intron_start + 2)", chain_macro, chain); register gchar *macro = g_strdup_printf( "(((%s < id->ias->min_intron)" "||(%s > id->ias->max_intron))" "?C4_IMPOSSIBLY_LOW_SCORE:%s)", intron_length, intron_length, input_macro); g_free(intron_length); return macro; } static gchar *Intron_get_calc_macro(gboolean is_5_prime, gboolean is_forward, gboolean on_query, gboolean on_target, gboolean is_pre){ register gchar *macro = NULL; register gchar *site = g_strdup_printf("ss%d_%s", is_5_prime?5:3, is_forward?"forward":"reverse"); register gchar *splice_calc; register gchar *a, *b; if(on_query && on_target){ /* joint */ a = Intron_get_calc_macro_splice(TRUE, site); b = Intron_get_calc_macro_splice(FALSE, site); splice_calc = g_strdup_printf("(%s + %s)", a, b); g_free(a); g_free(b); } else { /* single */ splice_calc = Intron_get_calc_macro_splice(on_query, site); } /**/ if(is_pre){ macro = g_strdup_printf("(%s + %s)", "id->ias->intron_open_penalty", splice_calc); } else { if(on_query && on_target){ /* joint */ a = Intron_get_calc_macro_length(TRUE, splice_calc); macro = Intron_get_calc_macro_length(FALSE, a); g_free(a); } else { /* single */ macro = Intron_get_calc_macro_length(on_query, splice_calc); } } g_free(splice_calc); g_free(site); g_assert(macro); return macro; } #define Intron_InitFunc(site, chain) \ static void Intron_init_##site##_##chain( \ Region *region, gpointer user_data){ \ register Ungapped_Data *ud = user_data; \ register Intron_Data *id = Ungapped_Data_get_Intron_Data(ud); \ Intron_ChainData_init_splice_prediction(id->chain##_data, ud->chain, \ id->ias->sps, \ SpliceType_##site, \ FALSE); \ return; \ } #define Intron_JointInitFunc(site) \ static void Intron_joint_init_##site( \ Region *region, gpointer user_data){ \ Intron_init_##site##_query(region, user_data); \ Intron_init_##site##_target(region, user_data); \ return; \ } static gchar *Intron_get_strand_init_macro(gboolean is_5_prime, gboolean is_forward, gboolean on_query){ register gchar *chain = on_query?"query":"target"; register gchar *site = g_strdup_printf("ss%d_%s", is_5_prime?5:3, is_forward?"forward":"reverse"); register gchar *macro = g_strdup_printf( "Intron_ChainData_init_splice_prediction(id->%s_data, ud->%s, id->ias->sps," " SpliceType_%s," " ((id->curr_region->%s_start == 0)" " && (id->curr_region->%s_length == ud->%s->len)));", chain, chain, site, chain, chain, chain); g_free(site); return macro; } static gchar *Intron_get_init_macro(gboolean is_5_prime, gboolean is_forward, gboolean on_query, gboolean on_target){ register gchar *qy_macro, *tg_macro, *macro; if(on_query && on_target){ qy_macro = Intron_get_strand_init_macro(is_5_prime, is_forward, TRUE); tg_macro = Intron_get_strand_init_macro(is_5_prime, is_forward, FALSE); macro = g_strdup_printf("%s;%s", qy_macro, tg_macro); g_free(qy_macro); g_free(tg_macro); } else { macro = Intron_get_strand_init_macro(is_5_prime, is_forward, on_query); } return macro; } #if 0 #define Intron_InitMacro(site, chain, chainmacro) \ static gchar *intron_## site ##_## chain ##_init_macro = \ "if(!id->"#chain"_data->"#site"){\n" \ " id->"#chain"_data->"#site" = SplicePrediction_create(" \ " id->ias->sps->"# site ",\n" \ " ud->"#chain"->seq,\n" \ " ud->"#chain"->len,\n" \ " %"#chainmacro"S, %"#chainmacro"L);\n" \ " }\n"; /* FIXME: do not give len when doing DP over unknown region * or give hint, eg: computing_full_dp */ #endif /* 0 */ #if 0 #define Intron_ExitFunc(site, chain) \ static void Intron_exit_##site##_##chain( \ Region *region, gpointer user_data){ \ register Ungapped_Data *ud = user_data; \ register Intron_Data *id = Ungapped_Data_get_Intron_Data(ud); \ if(id->chain##_data->site){ \ SplicePrediction_destroy(id->chain##_data->site); \ id->chain##_data->site = NULL; \ } \ return; \ } #define Intron_JointExitFunc(site) \ static void Intron_joint_exit_##site( \ Region *region, gpointer user_data){ \ Intron_exit_##site##_query(region, user_data); \ Intron_exit_##site##_target(region, user_data); \ return; \ } #endif /* 0 */ #if 0 #define Intron_ExitMacro(site, chain) \ static gchar *intron_##site##_##chain##_exit_macro = \ "if(id->"#chain"_data->"#site"){\n" \ " SplicePrediction_destroy(id->"#chain"_data->"#site");\n" \ " id->"#chain"_data->"#site" = NULL;" \ " }\n"; #endif /* 0 */ #if 0 static gchar *Intron_get_strand_exit_macro(gboolean is_5_prime, gboolean is_forward, gboolean on_query){ register gchar *chain = on_query?"query":"target"; register gchar *site = g_strdup_printf("ss%d_%s", is_5_prime?5:3, is_forward?"forward":"reverse"); return g_strdup_printf( "if(id->%s_data->%s){\n" " SplicePrediction_destroy(id->%s_data->%s);\n" " id->%s_data->%s = NULL;" " }\n", chain, site, chain, site, chain, site); } static gchar *Intron_get_exit_macro(gboolean is_5_prime, gboolean is_forward, gboolean on_query, gboolean on_target){ register gchar *qy_macro, *tg_macro, *macro; if(on_query && on_target){ qy_macro = Intron_get_strand_exit_macro(is_5_prime, is_forward, TRUE); tg_macro = Intron_get_strand_exit_macro(is_5_prime, is_forward, FALSE); macro = g_strdup_printf("%s;%s", qy_macro, tg_macro); g_free(qy_macro); g_free(tg_macro); } else { macro = Intron_get_strand_exit_macro(is_5_prime, is_forward, on_query); } return macro; } #endif /* 0 */ /**/ #define Intron_CalcElements(site, chain, chainmacro, is_pre) \ Intron_CalcFunc(site, chain, is_pre) \ Intron_InitFunc(site, chain) /* Intron_ExitFunc(site, chain) */ #define Intron_JointCalcElements(site, is_pre) \ Intron_JointCalcFunc(site, is_pre) \ Intron_JointInitFunc(site) /* Intron_JointExitFunc(site) */ /* pre */ Intron_CalcElements(ss5_forward, query, Q, TRUE) Intron_CalcElements(ss5_forward, target, T, TRUE) Intron_JointCalcElements(ss5_forward, TRUE) /* post */ Intron_CalcElements(ss3_forward, query, Q, FALSE) Intron_CalcElements(ss3_forward, target, T, FALSE) Intron_JointCalcElements(ss3_forward, FALSE) /* pre */ Intron_CalcElements(ss3_reverse, query, Q, TRUE) Intron_CalcElements(ss3_reverse, target, T, TRUE) Intron_JointCalcElements(ss3_reverse, FALSE) /* post */ Intron_CalcElements(ss5_reverse, query, Q, FALSE) Intron_CalcElements(ss5_reverse, target, T, FALSE) Intron_JointCalcElements(ss5_reverse, FALSE) #define Intron_assign_calc_elements(site, chain) \ calc_func = Intron_calc_##site##_##chain; \ init_func = Intron_init_##site##_##chain; /* exit_func = Intron_exit_##site##_##chain; */ /**/ #define Intron_assign_joint_calc_elements(site) \ calc_func = Intron_joint_calc_##site; \ init_func = Intron_joint_init_##site; /* exit_func = Intron_joint_exit_##site; */ /**/ /* Functions and macros for intron length shadow: Intron_{query,target}_{start,end}_{func,macro} */ #define Intron_StartFunc(chain) \ static C4_Score Intron_##chain##_start_func( \ gint query_pos, gint target_pos, gpointer user_data){ \ return (C4_Score)chain##_pos; \ } #define Intron_StartMacro(chain, chainmacro) \ static gchar *Intron_##chain##_start_macro \ = "(C4_Score)(%"#chainmacro"P)"; #define Intron_StartElements(chain, chainmacro) \ Intron_StartFunc(chain) \ Intron_StartMacro(chain, chainmacro) #define Intron_EndFunc(chain) \ static void Intron_##chain##_end_func(C4_Score score, \ gint query_pos, gint target_pos, gpointer user_data){ \ register Ungapped_Data *ud = user_data; \ register Intron_Data *id = Ungapped_Data_get_Intron_Data(ud); \ g_assert(id); \ id->chain##_data->curr_intron_start = (gint)score; \ return; \ } #define Intron_EndMacro(chain) \ static gchar *Intron_##chain##_end_macro \ = "id->"#chain"_data->curr_intron_start = (gint)%SS"; /**/ #define Intron_EndElements(chain) \ Intron_EndFunc(chain) \ Intron_EndMacro(chain) #define Intron_ShadowElements(chain, chainmacro) \ Intron_StartElements(chain, chainmacro) \ Intron_EndElements(chain) Intron_ShadowElements(query, Q) Intron_ShadowElements(target, T) /**/ static C4_Calc *Intron_add_calc(C4_Model *model, gchar *prefix, gchar *suffix, gboolean use_pre, gboolean is_forward, gboolean on_query, gboolean on_target){ register gchar *name = g_strconcat(prefix, " ", suffix, NULL); register Intron_ArgumentSet *ias = Intron_ArgumentSet_create(NULL); register SplicePredictor *sp; register C4_Calc *calc; register C4_CalcFunc calc_func; register C4_PrepFunc init_func; register gchar *calc_macro, *init_macro; register C4_Score bound = 0; register gboolean is_5_prime = is_forward?(use_pre?TRUE:FALSE) :(use_pre?FALSE:TRUE); if(is_forward){ if(use_pre){ sp = ias->sps->ss5_forward; bound = ias->intron_open_penalty; if(on_query){ if(on_target){ Intron_assign_joint_calc_elements(ss5_forward); } else { Intron_assign_calc_elements(ss5_forward, query); } } else { Intron_assign_calc_elements(ss5_forward, target); } } else { sp = ias->sps->ss3_forward; if(on_query){ if(on_target){ Intron_assign_joint_calc_elements(ss3_forward); } else { Intron_assign_calc_elements(ss3_forward, query); } } else { Intron_assign_calc_elements(ss3_forward, target); } } } else { if(use_pre){ sp = ias->sps->ss3_reverse; bound = ias->intron_open_penalty; if(on_query){ if(on_target){ Intron_assign_joint_calc_elements(ss3_reverse); } else { Intron_assign_calc_elements(ss3_reverse, query); } } else { Intron_assign_calc_elements(ss3_reverse, target); } } else { sp = ias->sps->ss5_reverse; if(on_query){ if(on_target){ Intron_assign_joint_calc_elements(ss5_reverse); } else { Intron_assign_calc_elements(ss5_reverse, query); } } else { Intron_assign_calc_elements(ss5_reverse, target); } } } bound += SplicePredictor_get_max_score(sp); if(on_query && on_target) /* Double for joint intron */ bound += SplicePredictor_get_max_score(sp); calc_macro = Intron_get_calc_macro(is_5_prime, is_forward, on_query, on_target, use_pre); init_macro = Intron_get_init_macro(is_5_prime, is_forward, on_query, on_target); /* exit_macro = Intron_get_exit_macro(is_5_prime, is_forward, on_query, on_target); */ calc = C4_Model_add_calc(model, name, bound, calc_func, calc_macro, init_func, init_macro, NULL, NULL, C4_Protect_UNDERFLOW); g_assert(calc_macro); g_assert(init_macro); g_free(calc_macro); g_free(init_macro); g_free(name); return calc; } C4_Model *Intron_create(gchar *suffix, gboolean on_query, gboolean on_target, gboolean is_forward){ register gchar *name, *pre_name, *post_name; register C4_Model *model; register C4_State *intron_state; register gint qy_splice_advance, tg_splice_advance; register C4_Calc *pre_calc, *post_calc; register C4_Label pre_label, post_label; register Intron_ArgumentSet *ias = Intron_ArgumentSet_create(NULL); g_assert(on_query || on_target); name = g_strconcat("intron ", suffix, NULL); model = C4_Model_create(name); g_free(name); if(is_forward){ pre_name = "5'ss forward"; post_name = "3'ss forward"; pre_label = C4_Label_5SS; post_label = C4_Label_3SS; } else { pre_name = "3'ss reverse"; post_name = "5'ss reverse"; pre_label = C4_Label_3SS; post_label = C4_Label_5SS; } if(on_query){ if(on_target){ /* Joint intron */ qy_splice_advance = 2; tg_splice_advance = 2; } else { qy_splice_advance = 2; tg_splice_advance = 0; } } else { /* on target */ qy_splice_advance = 0; tg_splice_advance = 2; } /* Add calcs */ pre_calc = Intron_add_calc(model, pre_name, suffix, TRUE, is_forward, on_query, on_target); post_calc = Intron_add_calc(model, post_name, suffix, FALSE, is_forward, on_query, on_target); /* Add intron state */ name = g_strconcat("intron ", suffix, NULL); intron_state = C4_Model_add_state(model, name); g_free(name); /* Add transition START->intron */ name = g_strconcat("(START) to ", intron_state->name, NULL); C4_Model_add_transition(model, name, NULL, intron_state, qy_splice_advance, tg_splice_advance, pre_calc, pre_label, NULL); g_free(name); /* Add transition intron->intron */ if(on_query){ name = g_strconcat("query intron loop ", suffix, NULL); C4_Model_add_transition(model, name, intron_state, intron_state, 1, 0, NULL, C4_Label_INTRON, NULL); g_free(name); } if(on_target){ name = g_strconcat("target intron loop ", suffix, NULL); C4_Model_add_transition(model, name, intron_state, intron_state, 0, 1, NULL, C4_Label_INTRON, NULL); g_free(name); } /* Add transition intron->(END)*/ name = g_strconcat(intron_state->name, " to (END)", NULL); C4_Model_add_transition(model, name, intron_state, NULL, qy_splice_advance, tg_splice_advance, post_calc, post_label, NULL); g_free(name); /* Add span state */ name = g_strconcat("intron span", suffix, NULL); if(on_query){ if(on_target){ C4_Model_add_span(model, name, intron_state, ias->min_intron, ias->max_intron, ias->min_intron, ias->max_intron); } else { C4_Model_add_span(model, name, intron_state, ias->min_intron, ias->max_intron, 0, 0); } } else { C4_Model_add_span(model, name, intron_state, 0, 0, ias->min_intron, ias->max_intron); } g_free(name); /* Add shadows */ if(on_query){ name = g_strconcat("query intron ", suffix, NULL); C4_Model_add_shadow(model, name, NULL, NULL, Intron_query_start_func, Intron_query_start_macro, Intron_query_end_func, Intron_query_end_macro); g_free(name); } if(on_target){ name = g_strconcat("target intron ", suffix, NULL); C4_Model_add_shadow(model, name, NULL, NULL, Intron_target_start_func, Intron_target_start_macro, Intron_target_end_func, Intron_target_end_macro); g_free(name); } /* Add DP init/exit funcs/macros */ C4_Model_configure_extra(model, intron_init_func, intron_init_macro, intron_exit_func, intron_exit_macro); C4_Model_append_codegen(model, NULL, "register Intron_Data *id = ud->intron_data;\n", NULL); C4_Model_close(model); return model; } exonerate-2.4.0/src/model/ungapped.test.c0000644000175000017500000001022411162714320015240 00000000000000/****************************************************************\ * * * Module for various ungapped alignment models * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "ungapped.h" #include "alignment.h" #include "argument.h" #include "submat.h" #include "optimal.h" #include "alignment.h" static void run_ungapped_test(Sequence *query, Sequence *target, gboolean translate_both, gint crib){ register Match_Type match_type = Match_Type_find(query->alphabet->type, target->alphabet->type, translate_both); register Ungapped_Data *ud = Ungapped_Data_create(query, target, match_type); register C4_Model *ungapped = Ungapped_create(match_type); register C4_Score score; register Alignment *alignment; register Optimal *optimal = Optimal_create(ungapped, NULL, Optimal_Type_SCORE |Optimal_Type_PATH, FALSE); g_message("Using Match_Type [%s]", Match_Type_get_name(match_type)); Region region; Region_init_static(®ion, 0,0, query->len, target->len); score = Optimal_find_score(optimal, ®ion, ud, NULL); g_message("Score is [%d] crib [%d]", score, crib); g_assert(score == crib); alignment = Optimal_find_path(optimal, ®ion, ud, C4_IMPOSSIBLY_LOW_SCORE, NULL); g_message("Alignment score is [%d] expect [%d]", alignment->score, score); g_assert(score == alignment->score); g_message("Using model [%s]", ungapped->name); Alignment_display(alignment, query, target, Ungapped_Data_get_dna_submat(ud), Ungapped_Data_get_protein_submat(ud), Ungapped_Data_get_translate(ud), stdout); Alignment_destroy(alignment); Optimal_destroy(optimal); C4_Model_destroy(ungapped); Ungapped_Data_destroy(ud); return; } int Argument_main(Argument *arg){ register Alphabet *dna_alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE), *protein_alphabet = Alphabet_create(Alphabet_Type_PROTEIN, FALSE); register Sequence *dna1 = Sequence_create("dna1", NULL, "CGATCAGCTAGCTAGCTACGATCGATCGAT", 0, Sequence_Strand_UNKNOWN, dna_alphabet); register Sequence *dna2 = Sequence_create("dna2", NULL, "CGATACGATCGCTCTGAGATCTCGACTCAG", 0, Sequence_Strand_UNKNOWN, dna_alphabet); register Sequence *protein1 = Sequence_create("protein1", NULL, "DFCPIECFLNHILCIPEF", 0, Sequence_Strand_UNKNOWN, protein_alphabet); register Sequence *protein2 = Sequence_create("protein2", NULL, "DFHLISCPIRYLICIEFP", 0, Sequence_Strand_UNKNOWN, protein_alphabet); Match_ArgumentSet_create(arg); Argument_process(arg, "ungapped.test", NULL, NULL); run_ungapped_test(dna1, dna2, FALSE, 47); run_ungapped_test(protein1, protein2, FALSE, 36); run_ungapped_test(dna1, protein1, FALSE, 7); run_ungapped_test(protein2, dna2, FALSE, 12); run_ungapped_test(dna1, dna2, TRUE, 22); Sequence_destroy(dna1); Sequence_destroy(dna2); Sequence_destroy(protein1); Sequence_destroy(protein2); Alphabet_destroy(dna_alphabet); Alphabet_destroy(protein_alphabet); return 0; } exonerate-2.4.0/src/model/intron.test.c0000644000175000017500000000226111162714317014756 00000000000000/****************************************************************\ * * * Module for splice site and intron modelling * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "intron.h" int Argument_main(Argument *arg){ register C4_Model *intron = Intron_create("test", TRUE, FALSE, TRUE); g_warning("test does nothing"); C4_Model_destroy(intron); return 0; } exonerate-2.4.0/src/model/protein2genome.test.c0000644000175000017500000001003211162714320016367 00000000000000/****************************************************************\ * * * Protein <-> Genome comparison model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "protein2genome.h" #include "alignment.h" #include "optimal.h" static void test_alignment(C4_Model *model, Sequence *query, Sequence *target, Protein2Genome_Data *p2gd){ register C4_Score score; register Alignment *alignment; register Optimal *optimal = Optimal_create(model, NULL, Optimal_Type_SCORE |Optimal_Type_PATH, FALSE); register Region *region = Region_create(0, 0, query->len, target->len); score = Optimal_find_score(optimal, region, p2gd, NULL); g_message("Score is [%d] (expect 125)", score); g_assert(score == 125); alignment = Optimal_find_path(optimal, region, p2gd, C4_IMPOSSIBLY_LOW_SCORE, NULL); g_message("Alignment score is [%d]", alignment->score); Alignment_display(alignment, query, target, NULL, Protein2Genome_Data_get_submat(p2gd), Protein2Genome_Data_get_translate(p2gd), stdout); g_assert(score == alignment->score); Alignment_destroy(alignment); Optimal_destroy(optimal); Region_destroy(region); return; } static void test_protein2genome(Sequence *query, Sequence *target){ register C4_Model *protein2genome = Protein2Genome_create(Affine_Model_Type_LOCAL); register Protein2Genome_Data *p2gd = Protein2Genome_Data_create(query, target); test_alignment(protein2genome, query, target, p2gd); Protein2Genome_Data_destroy(p2gd); C4_Model_destroy(protein2genome); return; } int Argument_main(Argument *arg){ register Alphabet *dna_alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE), *protein_alphabet = Alphabet_create(Alphabet_Type_PROTEIN, FALSE); register Sequence *protein = Sequence_create("protein", NULL, "MADQLTEQIAEFKEAFSLFDKDGDGTITT", 0, Sequence_Strand_UNKNOWN, protein_alphabet), *genome = Sequence_create("genome", NULL, "ATGGCTGACCAGCTGACTGAGCAGATT" /* "GCAGAGTTCAAG" */ "GCAGAGTTCAA" "GTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAG" /* "GAGGCCTTCTCCCTCTTT" */ "GGAGGCCTTCTCCCTCTTT" "GACAAGGATGGAGATGGCACTATTACCACC", 0, Sequence_Strand_UNKNOWN, dna_alphabet); Match_ArgumentSet_create(arg); Affine_ArgumentSet_create(arg); Frameshift_ArgumentSet_create(arg); Intron_ArgumentSet_create(arg); Viterbi_ArgumentSet_create(arg); Argument_process(arg, "protein2genome.test", NULL, NULL); g_message("Testing protein2genome"); test_protein2genome(protein, genome); Sequence_destroy(protein); Sequence_destroy(genome); Alphabet_destroy(dna_alphabet); Alphabet_destroy(protein_alphabet); return 0; } exonerate-2.4.0/src/model/protein2genome.c0000644000175000017500000000551111162714320015417 00000000000000/****************************************************************\ * * * Protein <-> Genome comparison model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strlen() */ #include "protein2genome.h" #include "phase.h" Protein2Genome_Data *Protein2Genome_Data_create( Sequence *query, Sequence *target){ register Protein2Genome_Data *p2gd = g_new0(Protein2Genome_Data, 1); g_assert(query->alphabet->type == Alphabet_Type_PROTEIN); g_assert(target->alphabet->type == Alphabet_Type_DNA); Protein2DNA_Data_init(&p2gd->p2dd, query, target); if(!Protein2Genome_Data_get_Intron_Data(p2gd)) Protein2Genome_Data_get_Intron_Data(p2gd) = Intron_Data_create(); return p2gd; } void Protein2Genome_Data_destroy(Protein2Genome_Data *p2gd){ if(Protein2Genome_Data_get_Intron_Data(p2gd)){ Intron_Data_destroy(Protein2Genome_Data_get_Intron_Data(p2gd)); Protein2Genome_Data_get_Intron_Data(p2gd) = NULL; } Protein2DNA_Data_clear(&p2gd->p2dd); g_free(p2gd); return; } /**/ C4_Model *Protein2Genome_create(Affine_Model_Type type){ register C4_Model *model = Protein2DNA_create(type); register C4_Transition *match_transition; register C4_Model *phase_model; register Match *match; register gchar *name = g_strdup_printf("protein2genome:%s", Affine_Model_Type_get_name(type)); g_assert(model); C4_Model_rename(model, name); g_free(name); C4_Model_open(model); match_transition = C4_Model_select_single_transition(model, C4_Label_MATCH); g_assert(match_transition); /* Add phased intron model */ match = match_transition->label_data; phase_model = Phase_create(NULL, match, FALSE, TRUE); C4_Model_insert(model, phase_model, match_transition->input, match_transition->output); C4_Model_destroy(phase_model); /**/ C4_Model_close(model); return model; } exonerate-2.4.0/src/model/phase.c0000644000175000017500000005420011162714317013567 00000000000000/****************************************************************\ * * * Model for phased introns. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strlen() */ #include "protein2genome.h" /**/ /* 16: * Phase_{dna,protein}_phase{12}_{query,target}_calc_{func,macro} */ /*...*/ #if 0 static gchar Phase_calc_get_aa(Sequence *seq, gint pos, gint phase, gboolean has_intron, Translate *translate, Intron_ChainData *intron_chain_data){ register gchar aa = '\0'; register gint codon_1, codon_2, codon_3; g_assert(has_intron?(seq->alphabet->type == Alphabet_Type_DNA) :TRUE); g_assert(seq->alphabet->type != Alphabet_Type_UNKNOWN); if(seq->alphabet->type == Alphabet_Type_PROTEIN){ g_assert(!has_intron); aa = Sequence_get_symbol(seq, pos); } else { /* seq->type == Alphabet_Type_DNA */ if(phase == 1){ if(has_intron){ g_assert(intron_chain_data->curr_intron_start >= 0); codon_1 = intron_chain_data->curr_intron_start - 1; } else { codon_1 = pos-1; } codon_2 = pos; codon_3 = pos+1; } else { /* phase == 2 */ if(has_intron){ g_assert(intron_chain_data->curr_intron_start >= 0); codon_1 = intron_chain_data->curr_intron_start - 2; codon_2 = intron_chain_data->curr_intron_start - 1; } else { codon_1 = pos-2; codon_2 = pos-1; } codon_3 = pos; } /**/ g_assert(codon_1 >= 0); g_assert(codon_1 < codon_2); g_assert(codon_2 < codon_3); g_assert(codon_3 < seq->len); aa = Translate_base(translate, Sequence_get_symbol(seq, codon_1), Sequence_get_symbol(seq, codon_2), Sequence_get_symbol(seq, codon_3)); } g_assert(aa); return aa; } static gboolean Phase_calc_is_valid(Sequence *seq, gint pos, gint phase, gboolean has_intron, Intron_ChainData *intron_chain_data){ if(seq->alphabet->type == Alphabet_Type_PROTEIN) return TRUE; if(has_intron){ if(intron_chain_data->curr_intron_start < phase) return FALSE; } else { if(pos < phase) return FALSE; } return TRUE; } static C4_Score Phase_calc_general(gint query_pos, gint target_pos, gpointer user_data, gint phase, gboolean intron_on_query){ register Ungapped_Data *ud = user_data; register Sequence *query = Ungapped_Data_get_query(ud), *target = Ungapped_Data_get_target(ud); register Translate *translate = Ungapped_Data_get_translate(ud); register Submat *submat = Ungapped_Data_get_protein_submat(ud); register gchar qy_aa, tg_aa; if((!Phase_calc_is_valid(query, query_pos, phase, intron_on_query, ud->intron_data->query_data)) || (!Phase_calc_is_valid(target, target_pos, phase, !intron_on_query, ud->intron_data->target_data))) return C4_IMPOSSIBLY_LOW_SCORE; qy_aa = Phase_calc_get_aa(query, query_pos, phase, intron_on_query, translate, ud->intron_data->query_data), tg_aa = Phase_calc_get_aa(target, target_pos, phase, !intron_on_query, translate, ud->intron_data->target_data); return Submat_lookup(submat, qy_aa, tg_aa); } /* Phase_{dna,protein}_phase{12}_{query,target}_calc_{func,macro} */ #define Phase_CalcFunc(type, phase, chain, intron_on_query) \ static C4_Score Phase_##type##_phase##phase##_##chain##_calc_func( \ gint query_pos, gint target_pos, gpointer user_data){ \ return Phase_calc_general(query_pos, target_pos, user_data, \ phase, intron_on_query); \ } /**/ Phase_CalcFunc(protein, 1, query, TRUE, PROTEIN2DNA) Phase_CalcFunc(dna, 1, query, TRUE, CODON2CODON) Phase_CalcFunc(protein, 2, query, TRUE, PROTEIN2DNA) Phase_CalcFunc(dna, 2, query, TRUE, CODON2CODON) /**/ Phase_CalcFunc(protein, 1, target, FALSE, PROTEIN2DNA) Phase_CalcFunc(dna, 1, target, FALSE, CODON2CODON) Phase_CalcFunc(protein, 2, target, FALSE, PROTEIN2DNA) Phase_CalcFunc(dna, 2, target, FALSE, CODON2CODON) /**/ #endif /* 0 */ static void Phase_calc_get_positions(Sequence *seq, gint pos, gint phase, gboolean has_intron, Intron_ChainData *intron_chain_data, gint *p1, gint *p2, gint *p3){ g_assert(has_intron?(seq->alphabet->type == Alphabet_Type_DNA):TRUE); g_assert(seq->alphabet->type != Alphabet_Type_UNKNOWN); if(seq->alphabet->type == Alphabet_Type_PROTEIN){ g_assert(!has_intron); (*p1) = pos; } else { /* seq->type == Alphabet_Type_DNA */ g_assert(seq->alphabet->type == Alphabet_Type_DNA); if(phase == 1){ if(has_intron){ g_assert(intron_chain_data->curr_intron_start >= 0); (*p1) = intron_chain_data->curr_intron_start - 1; } else { (*p1) = pos-1; } (*p2) = pos; (*p3) = pos+1; } else { /* phase == 2 */ if(has_intron){ g_assert(intron_chain_data->curr_intron_start >= 0); (*p1) = intron_chain_data->curr_intron_start - 2; (*p2) = intron_chain_data->curr_intron_start - 1; } else { (*p1) = pos-2; (*p2) = pos-1; } (*p3) = pos; } /**/ g_assert((*p1) >= 0); g_assert((*p1) < (*p2)); g_assert((*p2) < (*p3)); g_assert((*p3) < seq->len); } return; } static gboolean Phase_calc_is_valid(Sequence *seq, gint pos, gint phase, gboolean has_intron, Intron_ChainData *intron_chain_data){ if(seq->alphabet->type == Alphabet_Type_PROTEIN) return TRUE; if(has_intron){ if(intron_chain_data->curr_intron_start < phase) return FALSE; } else { if(pos < phase) return FALSE; } return TRUE; } #define Phase_CalcFunc(type, phase, qy_intron, tg_intron) \ static C4_Score \ Phase_##phase##_##type##_##qy_intron##_##tg_intron##_calc_func( \ gint query_pos, gint target_pos, gpointer user_data){ \ register Ungapped_Data *ud = user_data; \ register Match *match = ud->match_list[Match_Type_##type]; \ gint qp1, qp2, qp3, tp1, tp2, tp3; \ if((!Phase_calc_is_valid(ud->query, query_pos, phase, \ qy_intron, ud->intron_data->query_data)) \ || (!Phase_calc_is_valid(ud->target, target_pos, phase, \ tg_intron, ud->intron_data->target_data))) \ return C4_IMPOSSIBLY_LOW_SCORE; \ Phase_calc_get_positions(ud->query, query_pos, phase, qy_intron, \ ud->intron_data->query_data, \ &qp1, &qp2, &qp3); \ Phase_calc_get_positions(ud->target, target_pos, phase, tg_intron, \ ud->intron_data->target_data, \ &tp1, &tp2, &tp3); \ return match->split_score_func(match, ud->query, ud->target, \ qp1, qp2, qp3, tp1, tp2, tp3); \ } Phase_CalcFunc(PROTEIN2DNA, 1, FALSE, TRUE) Phase_CalcFunc(PROTEIN2DNA, 2, FALSE, TRUE) Phase_CalcFunc(DNA2PROTEIN, 1, TRUE, FALSE) Phase_CalcFunc(DNA2PROTEIN, 2, TRUE, FALSE) Phase_CalcFunc(CODON2CODON, 1, TRUE, TRUE) Phase_CalcFunc(CODON2CODON, 2, TRUE, TRUE) Phase_CalcFunc(CODON2CODON, 1, TRUE, FALSE) Phase_CalcFunc(CODON2CODON, 2, TRUE, FALSE) Phase_CalcFunc(CODON2CODON, 1, FALSE, TRUE) Phase_CalcFunc(CODON2CODON, 2, FALSE, TRUE) static gchar *Phase_get_calc_aa_macro(Alphabet_Type type, gint phase, gboolean is_query, gboolean has_intron){ register gchar *macro, *codon_1, *codon_2, *codon_3; register gchar *chain_name = is_query?"query":"target"; register gchar chain_symbol = is_query?'Q':'T'; g_assert(has_intron?(type == Alphabet_Type_DNA):TRUE); g_assert(type != Alphabet_Type_UNKNOWN); if(type == Alphabet_Type_PROTEIN){ g_assert(!has_intron); /* macro = g_strdup_printf("Sequence_get_symbol(ud->%s, %%%cP)", chain_name, chain_symbol); */ macro = g_strdup_printf("%%%cP, 0, 0", chain_symbol); } else { /* query is DNA */ if(phase == 1){ if(has_intron){ codon_1 = g_strdup_printf( "(id->%s_data->curr_intron_start - 1)", chain_name); } else { codon_1 = g_strdup_printf("%%%cP-1", chain_symbol); } codon_2 = g_strdup_printf("%%%cP", chain_symbol); codon_3 = g_strdup_printf("%%%cP+1", chain_symbol); } else { /* phase == 2 */ if(has_intron){ codon_1 = g_strdup_printf( "(id->%s_data->curr_intron_start - 2)", chain_name); codon_2 = g_strdup_printf( "(id->%s_data->curr_intron_start - 1)", chain_name); } else { codon_1 = g_strdup_printf("%%%cP-2", chain_symbol); codon_2 = g_strdup_printf("%%%cP-1", chain_symbol); } codon_3 = g_strdup_printf("%%%cP", chain_symbol); } g_assert(codon_1); g_assert(codon_2); g_assert(codon_3); macro = g_strdup_printf("%s, %s, %s", codon_1, codon_2, codon_3); /* macro = g_strdup_printf("Translate_base(ud->mas->translate,\n" " Sequence_get_symbol(ud->%s, %s]),\n" " Sequence_get_symbol(ud->%s, %s]),\n" " Sequence_get_symbol(ud->%s, %s]))\n", chain_name, codon_1, chain_name, codon_2, chain_name, codon_3); */ g_free(codon_1); g_free(codon_2); g_free(codon_3); } return macro; } static gchar *Phase_get_calc_is_valid_macro_chain(Alphabet_Type type, gboolean is_query, gint phase, gboolean has_intron){ register gchar *macro = NULL; if(type == Alphabet_Type_PROTEIN) return NULL; if(has_intron){ macro = g_strdup_printf( "(id->%s_data->curr_intron_start >= %d)", is_query?"query":"target", phase); } else { macro = g_strdup_printf("(%%%cP >= %d)", is_query?'Q':'T', phase); } return macro; } static gchar *Phase_get_calc_is_valid_macro(Match *match, gint phase, gboolean on_query, gboolean on_target){ register gchar *query_macro, *target_macro, *macro = NULL; query_macro = Phase_get_calc_is_valid_macro_chain( match->query->alphabet->type, TRUE, phase, on_query); target_macro = Phase_get_calc_is_valid_macro_chain( match->target->alphabet->type, FALSE, phase, on_target); if(query_macro){ if(target_macro){ /* QT */ macro = g_strdup_printf("(%s && %s)", query_macro, target_macro); g_free(query_macro); g_free(target_macro); } else { /* Q- */ macro = query_macro; } } else { if(target_macro){ /* -T */ macro = target_macro; } else { /* -- */ g_assert_not_reached(); } } return macro; } static gchar *Phase_get_calc_macro(Match *match, gint phase, gboolean on_query, gboolean on_target){ register gchar *qy_pos_macro, *tg_pos_macro, *macro; register gchar *is_valid_macro = Phase_get_calc_is_valid_macro(match, phase, on_query, on_target); qy_pos_macro = Phase_get_calc_aa_macro(match->query->alphabet->type, phase, TRUE, on_query); tg_pos_macro = Phase_get_calc_aa_macro(match->target->alphabet->type, phase, FALSE, on_target); macro = g_strdup_printf( "(%s)?ud->match_list[%d]->split_score_func(\n" "ud->match_list[%d], ud->query, ud->target,\n" "%s, %s):%d", is_valid_macro, match->type, match->type, qy_pos_macro, tg_pos_macro, C4_IMPOSSIBLY_LOW_SCORE); g_free(is_valid_macro); g_free(qy_pos_macro); g_free(tg_pos_macro); return macro; } /**/ /* p2d d2d phase1 0,1...1,2 1,1...2,2 phase2 0,2...1,1 2,2...1,1 */ C4_Model *Phase_create(gchar *suffix, Match *match, gboolean on_query, gboolean on_target){ register gchar *name, *intron_name, *calc_name; register C4_Model *model, *intron_model_00, *intron_model_12, *intron_model_21; register C4_Calc *phase1post_to_dst_calc, *phase2post_to_dst_calc; register C4_State *phase1pre_state, *phase1post_state, *phase2pre_state, *phase2post_state; register gint pre1_qa = 0, pre1_ta = 0, post2_qa = 0, post2_ta = 0, pre2_qa = 0, pre2_ta = 0, post1_qa = 0, post1_ta = 0; register C4_CalcFunc phase1_calc_func = NULL, phase2_calc_func = NULL; register C4_Transition *phase1post_transition, *phase2post_transition; register C4_Shadow *shadow; register gchar *phase1_calc_macro = NULL, *phase2_calc_macro = NULL; register gchar *full_suffix = NULL; register gboolean against_peptide = ((match->query->alphabet->type == Alphabet_Type_PROTEIN) || (match->target->alphabet->type == Alphabet_Type_PROTEIN)); g_assert(on_query || on_target); /* Cannot have joint introns vs peptide */ g_assert((!(on_query && on_target)) || (!against_peptide)); full_suffix = g_strconcat("phase", suffix?" ":"", suffix?suffix:"", suffix?" ":"", on_query?"Q":"-", on_target?"T":"-", NULL); model = C4_Model_create(full_suffix); /* Prepare intron models */ intron_name = g_strconcat("0:0 ", full_suffix, NULL); intron_model_00 = Intron_create(intron_name, on_query, on_target, TRUE); intron_name[0] = '1'; intron_name[2] = '2'; intron_model_12 = Intron_create(intron_name, on_query, on_target, TRUE); intron_name[0] = '2'; intron_name[2] = '1'; intron_model_21 = Intron_create(intron_name, on_query, on_target, TRUE); g_free(intron_name); if(against_peptide){ if(on_query){ pre1_qa = 1; pre1_ta = 0; post1_qa = 2; post1_ta = 1; pre2_qa = 2; pre2_ta = 0; post2_qa = 1; post2_ta = 1; } else { /* on_target */ pre1_qa = 0; pre1_ta = 1; post1_qa = 1; post1_ta = 2; pre2_qa = 0; pre2_ta = 2; post2_qa = 1; post2_ta = 1; } } else { /* Against DNA */ pre1_qa = 1; pre1_ta = 1; post1_qa = 2; post1_ta = 2; pre2_qa = 2; pre2_ta = 2; post2_qa = 1; post2_ta = 1; } /* Select C4 calc functions */ switch(match->type){ case Match_Type_PROTEIN2DNA: g_assert(!on_query); g_assert(on_target); phase1_calc_func = Phase_1_PROTEIN2DNA_FALSE_TRUE_calc_func; phase2_calc_func = Phase_2_PROTEIN2DNA_FALSE_TRUE_calc_func; break; case Match_Type_DNA2PROTEIN: g_assert(on_query); /* FAILS */ g_assert(!on_target); phase1_calc_func = Phase_1_DNA2PROTEIN_TRUE_FALSE_calc_func; phase2_calc_func = Phase_2_DNA2PROTEIN_TRUE_FALSE_calc_func; break; case Match_Type_CODON2CODON: g_assert(on_query || on_target); if(on_query){ if(on_target){ /* Joint */ phase1_calc_func = Phase_1_CODON2CODON_TRUE_TRUE_calc_func; phase2_calc_func = Phase_2_CODON2CODON_TRUE_TRUE_calc_func; } else { /* Query */ phase1_calc_func = Phase_1_CODON2CODON_TRUE_FALSE_calc_func; phase2_calc_func = Phase_2_CODON2CODON_TRUE_FALSE_calc_func; } } else { /* Target */ g_assert(on_target); phase1_calc_func = Phase_1_CODON2CODON_FALSE_TRUE_calc_func; phase2_calc_func = Phase_2_CODON2CODON_FALSE_TRUE_calc_func; } break; default: g_error("Match_Type [%s] unsuitable for phase model", Match_Type_get_name(match->type)); break; } phase1_calc_macro = Phase_get_calc_macro(match, 1, on_query, on_target); phase2_calc_macro = Phase_get_calc_macro(match, 2, on_query, on_target); g_assert(phase1_calc_macro); g_assert(phase2_calc_macro); /* Add calcs */ calc_name = g_strconcat("phase1post to dst ", full_suffix, NULL); phase1post_to_dst_calc = C4_Model_add_calc(model, calc_name, Match_max_score(match), phase1_calc_func, phase1_calc_macro, NULL, NULL, NULL, NULL, C4_Protect_NONE); g_free(calc_name); calc_name = g_strconcat("phase2post to dst ", full_suffix, NULL); phase2post_to_dst_calc = C4_Model_add_calc(model, calc_name, Match_max_score(match), phase2_calc_func, phase2_calc_macro, NULL, NULL, NULL, NULL, C4_Protect_NONE); g_free(calc_name); g_free(phase1_calc_macro); g_free(phase2_calc_macro); /* Add split codon states */ name = g_strconcat("phase1pre ", full_suffix, NULL); phase1pre_state = C4_Model_add_state(model, name); g_free(name); name = g_strconcat("phase1post ", full_suffix, NULL); phase1post_state = C4_Model_add_state(model, name); g_free(name); name = g_strconcat("phase2pre ", full_suffix, NULL); phase2pre_state = C4_Model_add_state(model, name); g_free(name); name = g_strconcat("phase2post ", full_suffix, NULL); phase2post_state = C4_Model_add_state(model, name); g_free(name); /* Transitions from [START]->phase_pre transitions */ name = g_strconcat("(START) to ", phase1pre_state->name, NULL); C4_Model_add_transition(model, name, NULL, phase1pre_state, pre1_qa, pre1_ta, NULL, C4_Label_SPLIT_CODON, NULL); g_free(name); name = g_strconcat("(START) to ", phase2pre_state->name, NULL); C4_Model_add_transition(model, name, NULL, phase2pre_state, pre2_qa, pre2_ta, NULL, C4_Label_SPLIT_CODON, NULL); g_free(name); /* Transitions from post->[END] transitions. */ name = g_strconcat(phase1post_state->name, " to (END)", NULL); phase1post_transition = C4_Model_add_transition(model, name, phase1post_state, NULL, post1_qa, post1_ta, phase1post_to_dst_calc, C4_Label_SPLIT_CODON, NULL); g_free(name); name = g_strconcat(phase2post_state->name, " to (END)", NULL); phase2post_transition = C4_Model_add_transition(model, name, phase2post_state, NULL, post2_qa, post2_ta, phase2post_to_dst_calc, C4_Label_SPLIT_CODON, NULL); g_free(name); /* Insert intron models */ C4_Model_insert(model, intron_model_00, NULL, NULL); C4_Model_insert(model, intron_model_12, phase1pre_state, phase1post_state); C4_Model_insert(model, intron_model_21, phase2pre_state, phase2post_state); /* Add shadows */ if(on_query && on_target){ g_assert(model->shadow_list->len == 6); shadow = model->shadow_list->pdata[2]; C4_Shadow_add_dst_transition(shadow, phase1post_transition); shadow = model->shadow_list->pdata[3]; C4_Shadow_add_dst_transition(shadow, phase1post_transition); shadow = model->shadow_list->pdata[4]; C4_Shadow_add_dst_transition(shadow, phase2post_transition); shadow = model->shadow_list->pdata[5]; C4_Shadow_add_dst_transition(shadow, phase2post_transition); } else { g_assert(model->shadow_list->len == 3); shadow = model->shadow_list->pdata[1]; C4_Shadow_add_dst_transition(shadow, phase1post_transition); shadow = model->shadow_list->pdata[2]; C4_Shadow_add_dst_transition(shadow, phase2post_transition); } /**/ C4_Model_destroy(intron_model_00); C4_Model_destroy(intron_model_12); C4_Model_destroy(intron_model_21); /**/ C4_Model_close(model); g_free(full_suffix); return model; } /**/ /* FIXME: tidy */ exonerate-2.4.0/src/model/coding2coding.test.c0000644000175000017500000000647711162714316016172 00000000000000/****************************************************************\ * * * Coding DNA comparison model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "coding2coding.h" #include "alignment.h" #include "optimal.h" #include "submat.h" static void test_coding2coding(Sequence *query, Sequence *target){ register C4_Score score; register C4_Model *model = Coding2Coding_create(); register Coding2Coding_Data *c2cd = Coding2Coding_Data_create( query, target); register Alignment *alignment; register Optimal *optimal = Optimal_create(model, NULL, Optimal_Type_SCORE |Optimal_Type_PATH, FALSE); Region region; Region_init_static(®ion, 0, 0, query->len, target->len); score = Optimal_find_score(optimal, ®ion, c2cd, NULL); g_message("Score is [%d]", score); g_assert(score == 169); alignment = Optimal_find_path(optimal, ®ion, c2cd, C4_IMPOSSIBLY_LOW_SCORE, NULL); g_message("Alignment score is [%d]", alignment->score); Alignment_display(alignment, query, target, Coding2Coding_Data_get_submat(c2cd), Coding2Coding_Data_get_submat(c2cd), Coding2Coding_Data_get_translate(c2cd), stdout); g_assert(score == alignment->score); Coding2Coding_Data_destroy(c2cd); C4_Model_destroy(model); return; } int Argument_main(Argument *arg){ register Alphabet *alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE); register Sequence *qy = Sequence_create("qy", NULL, "AGCCCAGCCAAGCACTGTCAGGAATCCTGTGAAGCAGCTCCAGCTATGTGTGAAGAAG" "AGGACAGCACTGCCTTGGTGTGTGACAATGGCTCTGGGCTCTGTAAGGCCGGCTTTGCT", 0, Sequence_Strand_UNKNOWN, alphabet), *tg = Sequence_create("tg", NULL, "AGCCCAGCCAAACACTGTCAGGAATCCTGT" "NNN" "GAAGCAGCTCCAGCTATGTGTGAAGAAG" "AGGACAGCACTGCCTTGGTGTGTGACAATGGC" "NN" "TCTGGGCTCTGTAAGGCCGGCTTTGCT", 0, Sequence_Strand_UNKNOWN, alphabet); Match_ArgumentSet_create(arg); Affine_ArgumentSet_create(arg); Frameshift_ArgumentSet_create(arg); Viterbi_ArgumentSet_create(arg); Argument_process(arg, "coding2coding.test", NULL, NULL); g_message("Testing coding2coding"); test_coding2coding(qy, tg); Sequence_destroy(qy); Sequence_destroy(tg); Alphabet_destroy(alphabet); return 0; } exonerate-2.4.0/src/model/coding2coding.c0000644000175000017500000000530311162714302015172 00000000000000/****************************************************************\ * * * Coding DNA comparison model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "coding2coding.h" #include "frameshift.h" void Coding2Coding_Data_init(Coding2Coding_Data *c2cd, Sequence *query, Sequence *target){ g_assert(query->alphabet->type == Alphabet_Type_DNA); g_assert(target->alphabet->type == Alphabet_Type_DNA); Affine_Data_init(&c2cd->ad, query, target, TRUE); if(!Coding2Coding_Data_get_Frameshift_Data(c2cd)) Coding2Coding_Data_get_Frameshift_Data(c2cd) = Frameshift_Data_create(); return; } Coding2Coding_Data *Coding2Coding_Data_create( Sequence *query, Sequence *target){ register Coding2Coding_Data *c2cd = g_new0(Coding2Coding_Data, 1); Coding2Coding_Data_init(c2cd, query, target); return c2cd; } void Coding2Coding_Data_clear(Coding2Coding_Data *c2cd){ Affine_Data_clear(&c2cd->ad); Frameshift_Data_destroy( Coding2Coding_Data_get_Frameshift_Data(c2cd)); return; } void Coding2Coding_Data_destroy(Coding2Coding_Data *c2cd){ Coding2Coding_Data_clear(c2cd); g_free(c2cd); return; } C4_Model *Coding2Coding_create(void){ register gchar *name = "coding2coding"; register C4_Model *model = NULL; register C4_Transition *match_transition; model = Affine_create(Affine_Model_Type_LOCAL, Alphabet_Type_DNA, Alphabet_Type_DNA, TRUE); g_assert(model); C4_Model_rename(model, name); C4_Model_open(model); match_transition = C4_Model_select_single_transition(model, C4_Label_MATCH); g_assert(match_transition); Frameshift_add(model, match_transition->input, "query", TRUE); Frameshift_add(model, match_transition->input, "target", FALSE); C4_Model_close(model); return model; } exonerate-2.4.0/src/model/frameshift.test.c0000644000175000017500000000203711162714316015575 00000000000000/****************************************************************\ * * * Module for frameshift modelling * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "frameshift.h" int Argument_main(Argument *arg){ g_warning("test does nothing"); return 0; } exonerate-2.4.0/src/model/coding2genome.test.c0000644000175000017500000000711611162714316016170 00000000000000/****************************************************************\ * * * Coding <-> Genome comparison model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "coding2genome.h" #include "alignment.h" #include "optimal.h" #include "submat.h" static void test_coding2genome(Sequence *query, Sequence *target){ register C4_Score score; register C4_Model *model = Coding2Genome_create(); register Coding2Genome_Data *c2gd = Coding2Genome_Data_create( query, target); register Alignment *alignment; register Optimal *optimal = Optimal_create(model, NULL, Optimal_Type_SCORE |Optimal_Type_PATH, FALSE); register Region *region = Region_create(0, 0, query->len, target->len); score = Optimal_find_score(optimal, region, c2gd, NULL); g_message("Score is [%d]", score); /* g_assert(score == 151); */ alignment = Optimal_find_path(optimal, region, c2gd, C4_IMPOSSIBLY_LOW_SCORE, NULL); g_message("Alignment score is [%d]", alignment->score); Alignment_display(alignment, query, target, Coding2Genome_Data_get_submat(c2gd), Coding2Genome_Data_get_submat(c2gd), Coding2Genome_Data_get_translate(c2gd), stdout); g_assert(score == alignment->score); Coding2Genome_Data_destroy(c2gd); C4_Model_destroy(model); Region_destroy(region); Optimal_destroy(optimal); return; } int Argument_main(Argument *arg){ register Alphabet *alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE); register Sequence *qy = Sequence_create("qyr", NULL, "AGCCCAGCCAAGCACTGTCAGGAATCCTGT" "GAAGCAGCTCCAGCTATGTGTGAAGAA" "GAGGACAGCACTGCCTTGGTGTGTGACAATG" /* "GTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAG" */ "GCTCTGGGCTCTGTAAGGCCGGCTTTGCT", 0, Sequence_Strand_UNKNOWN, alphabet), *tg = Sequence_create("tgr", NULL, "AGCCCAGCCAAGCACTGTCAGGAATCCTG" "GTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAG" "T" "GAAGCAGCTCCAGCTATGTGTGAAGAA" "GAGGACAGCACTGCCTTGGTGTGTGACAATGGC" "TCTGGGCTCTGTAAGGCCGGCTTTGCT", 0, Sequence_Strand_UNKNOWN, alphabet); Match_ArgumentSet_create(arg); Affine_ArgumentSet_create(arg); Viterbi_ArgumentSet_create(arg); Frameshift_ArgumentSet_create(arg); Intron_ArgumentSet_create(arg); Argument_process(arg, "coding2genome.test", NULL, NULL); g_message("Testing coding2genome"); g_message("seq lengths [%d,%d]", qy->len, tg->len); test_coding2genome(qy, tg); Sequence_destroy(qy); Sequence_destroy(tg); Alphabet_destroy(alphabet); return 0; } exonerate-2.4.0/src/model/coding2genome.c0000644000175000017500000000556111162714316015214 00000000000000/****************************************************************\ * * * Coding <-> Genome comparison model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "coding2genome.h" void Coding2Genome_Data_init(Coding2Genome_Data *c2gd, Sequence *query, Sequence *target){ g_assert(query->alphabet->type == Alphabet_Type_DNA); g_assert(target->alphabet->type == Alphabet_Type_DNA); Coding2Coding_Data_init(&c2gd->c2cd, query, target); if(!Coding2Genome_Data_get_Intron_Data(c2gd)) Coding2Genome_Data_get_Intron_Data(c2gd) = Intron_Data_create(); return; } Coding2Genome_Data *Coding2Genome_Data_create( Sequence *query, Sequence *target){ register Coding2Genome_Data *c2gd = g_new0(Coding2Genome_Data, 1); Coding2Genome_Data_init(c2gd, query, target); return c2gd; } void Coding2Genome_Data_clear(Coding2Genome_Data *c2gd){ Coding2Coding_Data_clear(&c2gd->c2cd); return; } void Coding2Genome_Data_destroy(Coding2Genome_Data *c2gd){ Coding2Genome_Data_clear(c2gd); if(Coding2Genome_Data_get_Intron_Data(c2gd)){ Intron_Data_destroy(Coding2Genome_Data_get_Intron_Data(c2gd)); Coding2Genome_Data_get_Intron_Data(c2gd) = NULL; } g_free(c2gd); return; } /**/ C4_Model *Coding2Genome_create(void){ register C4_Model *model = Coding2Coding_create(); register C4_Model *phase_model; register C4_Transition *match_transition; register Match *match; g_assert(model); C4_Model_rename(model, "coding2genome"); C4_Model_open(model); /**/ match_transition = C4_Model_select_single_transition(model, C4_Label_MATCH); g_assert(match_transition); /* Target Introns */ match = match_transition->label_data; phase_model = Phase_create("target intron", match, FALSE, TRUE); C4_Model_insert(model, phase_model, match_transition->input, match_transition->input); C4_Model_destroy(phase_model); /**/ C4_Model_close(model); return model; } exonerate-2.4.0/src/model/cdna2genome.test.c0000644000175000017500000001063211162714302015622 00000000000000/****************************************************************\ * * * cDNA <-> genomic alignment model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "cdna2genome.h" #include "alignment.h" #include "optimal.h" #include "submat.h" static void test_cdna2genome(Sequence *query, Sequence *target){ register C4_Score score; register C4_Model *model = CDNA2Genome_create(); register CDNA2Genome_Data *cd2gd = CDNA2Genome_Data_create(query, target); register Alignment *alignment; register Optimal *optimal = Optimal_create(model, NULL, Optimal_Type_SCORE |Optimal_Type_PATH, FALSE); register Region *region = Region_create(0, 0, query->len, target->len); score = Optimal_find_score(optimal, region, cd2gd, NULL); g_message("Score is [%d]", score); g_assert(score == 1281); alignment = Optimal_find_path(optimal, region, cd2gd, C4_IMPOSSIBLY_LOW_SCORE, NULL); g_message("Alignment score is [%d]", alignment->score); Alignment_display(alignment, query, target, CDNA2Genome_Data_get_dna_submat(cd2gd), CDNA2Genome_Data_get_protein_submat(cd2gd), CDNA2Genome_Data_get_translate(cd2gd), stdout); g_assert(score == alignment->score); CDNA2Genome_Data_destroy(cd2gd); C4_Model_destroy(model); Region_destroy(region); Optimal_destroy(optimal); return; } int Argument_main(Argument *arg){ register Alphabet *alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE); register Sequence *qy = Sequence_create("qyr", NULL, /* utr */ "CGAGCTGAGTGGTTGTGTGGTCGCGTC" /* utr */ "TCGGAAACCGGTAGCGCTTGCAGCATG" /* cds */ "GCTGACCAACTGACTGAAGAGCAGATTGCAGAATTCAAAGAAGCTTTTTCATTA" /* cds */ "GATGGTGATGGTCAAGTAAACTATGAAGAGTTTGTACAAATGATGACAGCAAAG" /* trp */ "TGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGG" /* utr */ "GATGGTGATGGCACTATAACAACAAAG" /* utr */ "GAACTTGGGACTGTAATGAGATCTCTT", 0, Sequence_Strand_UNKNOWN, alphabet), *tg = Sequence_create("tgr", NULL, /* gen */ "GCCCAGGAGTTTGAGACCAGCCTGGGCAACAGACCGAGGCCCCGTCTCTACAAA" /* utr */ "CGAGCTGAGTGGTTGTGTGGTCGCGTC" /* int */ "GTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAG" /* utr */ "TCGGAAACCGGTAGCGCTTGCAGCATG" /* cds */ "GCTGACCAACTGACTGAAGAGCAGATTGCAGAATTCAAAGAAGCTTTTTCATTA" /* int */ "GTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAG" /* cds */ "GATGGTGATGGTCAAGTAAACTATGAAGAGTTTGTACAAATGATGACAGCAAAG" /* trp */ "TGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGGTGG" /* utr */ "GATGGTGATGGCACTATAACAACAAAG" /* int */ "GTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAG" /* utr */ "GAACTTGGGACTGTAATGAGATCTCTT" /* gen */ "TCTTTTCCGCCAGGCTGCCCACAGGGTGGATATCGAAGTTTTCGGGCAGCTGGA", 0, Sequence_Strand_UNKNOWN, alphabet); Match_ArgumentSet_create(arg); Affine_ArgumentSet_create(arg); Viterbi_ArgumentSet_create(arg); Frameshift_ArgumentSet_create(arg); Intron_ArgumentSet_create(arg); Argument_process(arg, "cdna2genome.test", NULL, NULL); g_message("Testing cdna2genome"); g_message("seq lengths [%d,%d]", qy->len, tg->len); test_cdna2genome(qy, tg); Sequence_destroy(qy); Sequence_destroy(tg); Alphabet_destroy(alphabet); return 0; } /* Query: (cDNA) UTR:CDS:UTR * Target: (gen) gen:UTR5:intron:CDS:intron:CDS:UTR3:intron:UTR3:gen */ exonerate-2.4.0/src/model/cdna2genome.c0000644000175000017500000000725711162714302014655 00000000000000/****************************************************************\ * * * cDNA <-> genomic alignment model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "cdna2genome.h" /**/ void CDNA2Genome_Data_init(CDNA2Genome_Data *cd2gd, Sequence *query, Sequence *target){ g_assert(query->alphabet->type == Alphabet_Type_DNA); g_assert(target->alphabet->type == Alphabet_Type_DNA); Coding2Genome_Data_init(&cd2gd->c2gd, query, target); EST2Genome_Data_init(&cd2gd->e2gd, query, target); return; } CDNA2Genome_Data *CDNA2Genome_Data_create(Sequence *query, Sequence *target){ register CDNA2Genome_Data *cd2gd = g_new0(CDNA2Genome_Data, 1); CDNA2Genome_Data_init(cd2gd, query, target); return cd2gd; } void CDNA2Genome_Data_clear(CDNA2Genome_Data *cd2gd){ Coding2Genome_Data_clear(&cd2gd->c2gd); EST2Genome_Data_clear(&cd2gd->e2gd); return; } void CDNA2Genome_Data_destroy(CDNA2Genome_Data *cd2gd){ CDNA2Genome_Data_clear(cd2gd); g_free(cd2gd); return; } /**/ /* Create a special single-orientation e2g model */ /* FIXME: move this to utr module ? */ static C4_Model *CDNA2Genome_UTR_create(void){ register C4_Model *model = Affine_create( Affine_Model_Type_LOCAL, Alphabet_Type_DNA, Alphabet_Type_DNA, FALSE); register C4_Model *intron_model = Intron_create("forward", FALSE, TRUE, TRUE); register C4_Transition *match_transition = C4_Model_select_single_transition(model, C4_Label_MATCH); g_assert(match_transition); C4_Model_open(model); C4_Model_insert(model, intron_model, match_transition->input, match_transition->output); C4_Model_close(model); C4_Model_destroy(intron_model); return model; } C4_Model *CDNA2Genome_create(void){ register C4_Model *model = C4_Model_create("cdna2genome"); register C4_Model *c2g_model = Coding2Genome_create(); register C4_Model *utr_model = CDNA2Genome_UTR_create(); register C4_Transition *transition; g_assert(model); /**/ C4_Model_insert(model, c2g_model, NULL, NULL); transition = C4_Model_select_single_transition(model, C4_Label_MATCH); g_assert(transition); g_assert(transition->input == transition->output); g_assert(transition->label_data); g_assert(((Match*)transition->label_data)->type == Match_Type_CODON2CODON); C4_Model_insert(model, utr_model, NULL, transition->input); C4_Model_insert(model, utr_model, transition->input, NULL); /* FIXME: add transition start->post_utr || post_utr->end */ C4_Model_destroy(c2g_model); C4_Model_destroy(utr_model); C4_Model_close(model); /* C4_Model_dump_graphviz(model); */ return model; } /**/ exonerate-2.4.0/src/model/genome2genome.test.c0000644000175000017500000000721411162714317016177 00000000000000/****************************************************************\ * * * Genome <-> Genome comparison model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "genome2genome.h" #include "alignment.h" #include "optimal.h" #include "submat.h" static void test_genome2genome(Sequence *query, Sequence *target){ register C4_Score score; register C4_Model *model = Genome2Genome_create(); register Genome2Genome_Data *g2gd = Genome2Genome_Data_create( query, target); register Alignment *alignment; register Optimal *optimal = Optimal_create(model, NULL, Optimal_Type_SCORE |Optimal_Type_PATH, FALSE); register Region *region = Region_create(0, 0, query->len, target->len); score = Optimal_find_score(optimal, region, g2gd, NULL); g_message("Score is [%d]", score); /* g_assert(score == 151); */ alignment = Optimal_find_path(optimal, region, g2gd, C4_IMPOSSIBLY_LOW_SCORE, NULL); g_message("Alignment score is [%d]", alignment->score); Alignment_display(alignment, query, target, Genome2Genome_Data_get_dna_submat(g2gd), Genome2Genome_Data_get_protein_submat(g2gd), Genome2Genome_Data_get_translate(g2gd), stdout); g_assert(score == alignment->score); Genome2Genome_Data_destroy(g2gd); C4_Model_destroy(model); Region_destroy(region); Optimal_destroy(optimal); return; } int Argument_main(Argument *arg){ register Alphabet *alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE); register Sequence *qy = Sequence_create("qyr", NULL, "AGCCCAGCCAAGCACTGTCAGGAATCCTG" "GTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAG" "TGAAGCAGCTCCAGCTATGTGTGAAGAA" "GAGGACAGCACTGCCTTGGTGTGTGACAATG" "GTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAG" "GCTCTGGGCTCTGTAAGGCCGGCTTTGCT", 0, Sequence_Strand_UNKNOWN, alphabet), *tg = Sequence_create("tgr", NULL, "AGCCCAGCCAAGCACTGTCAGGAATCCTG" "GTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAG" "TGAAGCAGCTCCAGCTATGTGTGAAGAA" "GAGGACAGCACTGCCTTGGTGTGTGACAATGGC" "TCTGGGCTCTGTAAGGCCGGCTTTGCT", 0, Sequence_Strand_UNKNOWN, alphabet); Match_ArgumentSet_create(arg); Affine_ArgumentSet_create(arg); Viterbi_ArgumentSet_create(arg); Frameshift_ArgumentSet_create(arg); Intron_ArgumentSet_create(arg); Argument_process(arg, "genome2genome.test", NULL, NULL); g_message("Testing genome2genome"); g_message("seq lengths [%d,%d]", qy->len, tg->len); test_genome2genome(qy, tg); Sequence_destroy(qy); Sequence_destroy(tg); Alphabet_destroy(alphabet); return 0; } exonerate-2.4.0/src/model/genome2genome.c0000644000175000017500000000707311162714316015223 00000000000000/****************************************************************\ * * * Genome <-> Genome comparison model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "genome2genome.h" Genome2Genome_Data *Genome2Genome_Data_create( Sequence *query, Sequence *target){ register Genome2Genome_Data *g2gd = g_new0(Genome2Genome_Data, 1); g_assert(query->alphabet->type == Alphabet_Type_DNA); g_assert(target->alphabet->type == Alphabet_Type_DNA); CDNA2Genome_Data_init(&g2gd->cd2gd, query, target); return g2gd; } void Genome2Genome_Data_destroy(Genome2Genome_Data *g2gd){ CDNA2Genome_Data_clear(&g2gd->cd2gd); g_free(g2gd); return; } /**/ C4_Model *Genome2Genome_create(void){ register C4_Model *model = C4_Model_create("genome2genome"); register C4_Model *cdna2genome = CDNA2Genome_create(); register GPtrArray *transition_list; register gint i; register C4_Transition *transition; register C4_Model *query_intron_model, *joint_intron_model, *query_phase_model, *joint_phase_model; register Match *codon_match; C4_Model_insert(model, cdna2genome, NULL, NULL); C4_Model_destroy(cdna2genome); /**/ query_intron_model = Intron_create("query", TRUE, FALSE, TRUE); joint_intron_model = Intron_create("joint", TRUE, TRUE, TRUE); /**/ codon_match = Match_find(Match_Type_CODON2CODON); query_phase_model = Phase_create("query", codon_match, TRUE, FALSE); joint_phase_model = Phase_create("joint", codon_match, TRUE, TRUE); /**/ transition_list = C4_Model_select_transitions(model, C4_Label_MATCH); g_assert(transition_list); g_assert(transition_list->len == 3); for(i = 0; i < transition_list->len; i++){ transition = transition_list->pdata[i]; if((transition->advance_query == 1) && (transition->advance_query == 1)){ C4_Model_insert(model, query_intron_model, transition->input, transition->output); C4_Model_insert(model, joint_intron_model, transition->input, transition->output); } else { g_assert(transition->advance_query == 3); g_assert(transition->advance_target == 3); C4_Model_insert(model, query_phase_model, transition->input, transition->output); C4_Model_insert(model, joint_phase_model, transition->input, transition->output); } } C4_Model_destroy(query_intron_model); C4_Model_destroy(joint_intron_model); C4_Model_destroy(query_phase_model); C4_Model_destroy(joint_phase_model); g_ptr_array_free(transition_list, TRUE); C4_Model_close(model); return model; } exonerate-2.4.0/src/model/phase.test.c0000644000175000017500000000247211162714317014551 00000000000000/****************************************************************\ * * * Model for phased introns. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "argument.h" #include "phase.h" gint Argument_main(Argument *arg){ register Match *match; register C4_Model *phase; Match_ArgumentSet_create(arg); Argument_process(arg, "phase.test", NULL, NULL); match = Match_find(Match_Type_DNA2PROTEIN); phase = Phase_create("test", match, TRUE, FALSE); C4_Model_destroy(phase); Match_destroy_all(); return 0; } exonerate-2.4.0/src/model/modeltype.test.c0000644000175000017500000000205511162714317015450 00000000000000/****************************************************************\ * * * Interface for different types of alignment model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "modeltype.h" int Argument_main(Argument *arg){ g_warning("test [%s] does nothing", __FILE__); return 0; } exonerate-2.4.0/src/model/modeltype.c0000644000175000017500000003515311162714317014477 00000000000000/****************************************************************\ * * * Interface for different types of alignment model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "modeltype.h" #include "ungapped.h" #include "affine.h" #include "est2genome.h" #include "ner.h" #include "protein2dna.h" #include "protein2genome.h" #include "coding2coding.h" #include "coding2genome.h" #include "cdna2genome.h" #include "genome2genome.h" gchar *Model_Type_to_string(Model_Type type){ register gchar *name = NULL; switch(type){ case Model_Type_UNGAPPED: name = "ungapped"; break; case Model_Type_UNGAPPED_TRANS: name = "ungapped:trans"; break; case Model_Type_AFFINE_GLOBAL: name = "affine:global"; break; case Model_Type_AFFINE_BESTFIT: name = "affine:bestfit"; break; case Model_Type_AFFINE_LOCAL: name = "affine:local"; break; case Model_Type_AFFINE_OVERLAP: name = "affine:overlap"; break; case Model_Type_EST2GENOME: name = "est2genome"; break; case Model_Type_NER: name = "ner"; break; case Model_Type_PROTEIN2DNA: name = "protein2dna"; break; case Model_Type_PROTEIN2DNA_BESTFIT: name = "protein2dna:bestfit"; break; case Model_Type_PROTEIN2GENOME: name = "protein2genome"; break; case Model_Type_PROTEIN2GENOME_BESTFIT: name = "protein2genome:bestfit"; break; case Model_Type_CODING2CODING: name = "coding2coding"; break; case Model_Type_CODING2GENOME: name = "coding2genome"; break; case Model_Type_CDNA2GENOME: name = "cdna2genome"; break; case Model_Type_GENOME2GENOME: name = "genome2genome"; break; default: g_error("Unknown Model Type [%d]", type); break; } return name; } Model_Type Model_Type_from_string(gchar *str){ gchar *name[Model_Type_TOTAL] = { "ungapped", "ungapped:trans", "affine:global", "affine:bestfit", "affine:local", "affine:overlap", "est2genome", "ner", "protein2dna", "protein2dna:bestfit", "protein2genome", "protein2genome:bestfit", "coding2coding", "coding2genome", "cdna2genome", "genome2genome"}; gchar *short_name[Model_Type_TOTAL] = { "u", "u:t", "a:g", "a:b", "a:l", "a:o", "e2g", "ner", "p2d", "p2d:b", "p2g", "p2g:b", "c2c", "c2g", "cd2g", "g2g"}; Model_Type type[Model_Type_TOTAL] = { Model_Type_UNGAPPED, Model_Type_UNGAPPED_TRANS, Model_Type_AFFINE_GLOBAL, Model_Type_AFFINE_BESTFIT, Model_Type_AFFINE_LOCAL, Model_Type_AFFINE_OVERLAP, Model_Type_EST2GENOME, Model_Type_NER, Model_Type_PROTEIN2DNA, Model_Type_PROTEIN2DNA_BESTFIT, Model_Type_PROTEIN2GENOME, Model_Type_PROTEIN2GENOME_BESTFIT, Model_Type_CODING2CODING, Model_Type_CODING2GENOME, Model_Type_CDNA2GENOME, Model_Type_GENOME2GENOME}; register gint i; for(i = 0; i < Model_Type_TOTAL; i++) if(!g_strcasecmp(name[i], str)) return type[i]; for(i = 0; i < Model_Type_TOTAL; i++) if(!g_strcasecmp(short_name[i], str)) return type[i]; g_error("Unknown model type [%s]", str); return Model_Type_UNGAPPED; /* Not reached */ } gboolean Model_Type_is_gapped(Model_Type type){ if((type == Model_Type_UNGAPPED) || (type == Model_Type_UNGAPPED_TRANS)) return FALSE; return TRUE; } gboolean Model_Type_translate_both(Model_Type type){ if((type == Model_Type_UNGAPPED_TRANS) || (type == Model_Type_CODING2CODING) || (type == Model_Type_CODING2GENOME) || (type == Model_Type_CDNA2GENOME) || (type == Model_Type_GENOME2GENOME)) return TRUE; return FALSE; } gboolean Model_Type_has_dual_match(Model_Type type){ if((type == Model_Type_CDNA2GENOME) || (type == Model_Type_GENOME2GENOME)) return TRUE; return FALSE; } gboolean Model_Type_has_genomic_target(Model_Type type){ if((type == Model_Type_EST2GENOME) || (type == Model_Type_PROTEIN2GENOME) || (type == Model_Type_PROTEIN2GENOME_BESTFIT) || (type == Model_Type_CODING2GENOME) || (type == Model_Type_CDNA2GENOME) || (type == Model_Type_GENOME2GENOME)) return TRUE; return FALSE; } static void Model_Type_check_input(Model_Type type, Alphabet_Type query_type, Alphabet_Type target_type){ switch(type){ case Model_Type_UNGAPPED: break; case Model_Type_UNGAPPED_TRANS: case Model_Type_EST2GENOME: case Model_Type_CODING2CODING: case Model_Type_CODING2GENOME: case Model_Type_CDNA2GENOME: case Model_Type_GENOME2GENOME: if(query_type != Alphabet_Type_DNA) g_error("Expected DNA query (not %s) for model [%s]", Alphabet_Type_get_name(query_type), Model_Type_to_string(type)); if(target_type != Alphabet_Type_DNA) g_error("Expected DNA target (not %s) for model [%s]", Alphabet_Type_get_name(target_type), Model_Type_to_string(type)); break; case Model_Type_AFFINE_GLOBAL: case Model_Type_AFFINE_BESTFIT: case Model_Type_AFFINE_LOCAL: case Model_Type_AFFINE_OVERLAP: case Model_Type_NER: if(query_type != target_type) g_error("Expected similar sequence types for model" " [%s] (not %s:%s)", Model_Type_to_string(type), Alphabet_Type_get_name(query_type), Alphabet_Type_get_name(target_type)); if(query_type == Alphabet_Type_UNKNOWN) g_error("Model [%s] cannot use unknown sequence type", Model_Type_to_string(type)); break; case Model_Type_PROTEIN2DNA: case Model_Type_PROTEIN2DNA_BESTFIT: case Model_Type_PROTEIN2GENOME: case Model_Type_PROTEIN2GENOME_BESTFIT: /* qy == AA, tg = NT */ if(query_type != Alphabet_Type_PROTEIN) g_error( "Expected protein query (not %s) for model [%s]", Alphabet_Type_get_name(query_type), Model_Type_to_string(type)); if(target_type != Alphabet_Type_DNA) g_error("Expected DNA target (not %s) for model [%s]", Alphabet_Type_get_name(target_type), Model_Type_to_string(type)); break; default: g_error("Unknown model type [%s]", Model_Type_to_string(type)); break; } return; } C4_Model *Model_Type_get_model(Model_Type type, Alphabet_Type query_type, Alphabet_Type target_type){ register C4_Model *model = NULL; register Match_Type match_type; Model_Type_check_input(type, query_type, target_type); switch(type){ case Model_Type_UNGAPPED: match_type = Match_Type_find(query_type, target_type, FALSE); model = Ungapped_create(match_type); break; case Model_Type_UNGAPPED_TRANS: match_type = Match_Type_find(query_type, target_type, TRUE); model = Ungapped_create(match_type); break; case Model_Type_AFFINE_GLOBAL: model = Affine_create(Affine_Model_Type_GLOBAL, query_type, target_type, FALSE); break; case Model_Type_AFFINE_BESTFIT: model = Affine_create(Affine_Model_Type_BESTFIT, query_type, target_type, FALSE); break; case Model_Type_AFFINE_LOCAL: model = Affine_create(Affine_Model_Type_LOCAL, query_type, target_type, FALSE); break; case Model_Type_AFFINE_OVERLAP: model = Affine_create(Affine_Model_Type_OVERLAP, query_type, target_type, FALSE); break; case Model_Type_EST2GENOME: model = EST2Genome_create(); break; case Model_Type_NER: model = NER_create(query_type, target_type); break; case Model_Type_PROTEIN2DNA: model = Protein2DNA_create(Affine_Model_Type_LOCAL); break; case Model_Type_PROTEIN2DNA_BESTFIT: model = Protein2DNA_create(Affine_Model_Type_BESTFIT); break; case Model_Type_PROTEIN2GENOME: model = Protein2Genome_create(Affine_Model_Type_LOCAL); break; case Model_Type_PROTEIN2GENOME_BESTFIT: model = Protein2Genome_create(Affine_Model_Type_BESTFIT); break; case Model_Type_CODING2CODING: model = Coding2Coding_create(); break; case Model_Type_CODING2GENOME: model = Coding2Genome_create(); break; case Model_Type_CDNA2GENOME: model = CDNA2Genome_create(); break; case Model_Type_GENOME2GENOME: model = Genome2Genome_create(); break; default: g_error("Unknown Model Type [%d]", type); break; } return model; } gpointer Model_Type_create_data(Model_Type type, Sequence *query, Sequence *target){ register gpointer model_data = NULL; register Match_Type match_type; switch(type){ case Model_Type_UNGAPPED: match_type = Match_Type_find(query->alphabet->type, target->alphabet->type, FALSE); model_data = Ungapped_Data_create(query, target, match_type); break; case Model_Type_UNGAPPED_TRANS: match_type = Match_Type_find(query->alphabet->type, target->alphabet->type, TRUE); model_data = Ungapped_Data_create(query, target, match_type); break; case Model_Type_AFFINE_GLOBAL: /*fallthrough*/ case Model_Type_AFFINE_BESTFIT: /*fallthrough*/ case Model_Type_AFFINE_LOCAL: /*fallthrough*/ case Model_Type_AFFINE_OVERLAP: model_data = Affine_Data_create(query, target, FALSE); break; case Model_Type_EST2GENOME: model_data = EST2Genome_Data_create(query, target); break; case Model_Type_NER: model_data = NER_Data_create(query, target); break; case Model_Type_PROTEIN2DNA: /*fallthrough*/ case Model_Type_PROTEIN2DNA_BESTFIT: model_data = Protein2DNA_Data_create(query, target); break; case Model_Type_PROTEIN2GENOME: /*fallthrough*/ case Model_Type_PROTEIN2GENOME_BESTFIT: model_data = Protein2Genome_Data_create(query, target); break; case Model_Type_CODING2CODING: model_data = Coding2Coding_Data_create(query, target); break; case Model_Type_CODING2GENOME: model_data = Coding2Genome_Data_create(query, target); break; case Model_Type_CDNA2GENOME: model_data = CDNA2Genome_Data_create(query, target); break; case Model_Type_GENOME2GENOME: model_data = Genome2Genome_Data_create(query, target); break; default: g_error("Unknown Model Type [%d]", type); } return model_data; } void Model_Type_destroy_data(Model_Type type, gpointer model_data){ switch(type){ case Model_Type_UNGAPPED: /*fallthrough*/ case Model_Type_UNGAPPED_TRANS: Ungapped_Data_destroy(model_data); break; case Model_Type_AFFINE_GLOBAL: /*fallthrough*/ case Model_Type_AFFINE_BESTFIT: /*fallthrough*/ case Model_Type_AFFINE_LOCAL: /*fallthrough*/ case Model_Type_AFFINE_OVERLAP: Affine_Data_destroy(model_data); break; case Model_Type_EST2GENOME: EST2Genome_Data_destroy(model_data); break; case Model_Type_NER: NER_Data_destroy(model_data); break; case Model_Type_PROTEIN2DNA: /*fallthrough*/ case Model_Type_PROTEIN2DNA_BESTFIT: Protein2DNA_Data_destroy(model_data); break; case Model_Type_PROTEIN2GENOME: /*fallthrough*/ case Model_Type_PROTEIN2GENOME_BESTFIT: Protein2Genome_Data_destroy(model_data); break; case Model_Type_CODING2CODING: Coding2Coding_Data_destroy(model_data); break; case Model_Type_CODING2GENOME: Coding2Genome_Data_destroy(model_data); break; case Model_Type_CDNA2GENOME: CDNA2Genome_Data_destroy(model_data); break; case Model_Type_GENOME2GENOME: Genome2Genome_Data_destroy(model_data); break; default: g_error("Unknown Model Type [%d]", type); } return; } exonerate-2.4.0/src/model/bootstrapper.c0000755000175000017500000004115111222643771015221 00000000000000/****************************************************************\ * * * bootstrapper: required prior to static model linking * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For remove() */ #include /* For system() */ #include /* For strstr() */ #include "ungapped.h" #include "affine.h" #include "est2genome.h" #include "protein2dna.h" #include "protein2genome.h" #include "ner.h" #include "coding2coding.h" #include "coding2genome.h" #include "cdna2genome.h" #include "genome2genome.h" #include "modeltype.h" #include "submat.h" #include "splice.h" #include "optimal.h" #include "heuristic.h" #include "codegen.h" #include "argument.h" #include "sdp.h" typedef struct { gchar *archive_path; gchar *header_path; gchar *lookup_path; gchar *object_path; gint archive_member_count; FILE *lookup_fp; FILE *header_fp; GTree *model_dictionary; GStringChunk *model_name_chunk; } Bootstrapper; static void Bootstrapper_destroy(Bootstrapper *bs){ g_tree_destroy(bs->model_dictionary); g_string_chunk_free(bs->model_name_chunk); g_free(bs->header_path); g_free(bs->lookup_path); g_free(bs->object_path); g_free(bs->archive_path); g_free(bs); return; } static gint Bootstrapper_model_name_compare(gconstpointer a, gconstpointer b){ register const gchar *id_a = a, *id_b = b; return strcmp(id_a, id_b); } static Bootstrapper *Bootstrapper_create(void){ register Bootstrapper *bs = g_new(Bootstrapper, 1); g_message("Starting building bootstrapping components"); bs->header_path = g_strdup("c4_model_archive.h"), bs->lookup_path = g_strdup("c4_model_lookup.c"); bs->object_path = g_strdup("c4_model_lookup.o"); bs->archive_path = g_strdup("c4_model_archive.a"); bs->archive_member_count = 0; bs->header_fp = fopen(bs->header_path, "w"); if(!bs->header_fp) g_error("Could not open [%s] for writing", bs->header_path); bs->lookup_fp = fopen(bs->lookup_path, "w"); if(!bs->lookup_fp) g_error("Could not open [%s] for writing", bs->lookup_path); fprintf(bs->lookup_fp, "#include \n" "#include \"%s\"\n" "\n" "gpointer Bootstrapper_lookup(gchar *name){\n", bs->header_path); fprintf(bs->header_fp, "#include \n" "\n" "#include \"c4.h\"\n" "#include \"viterbi.h\"\n" "#include \"alignment.h\"\n" "\n" "gpointer Bootstrapper_lookup(gchar *name);\n" "\n"); bs->model_dictionary = g_tree_new(Bootstrapper_model_name_compare); bs->model_name_chunk = g_string_chunk_new(64); return bs; } static void Bootstrapper_add_to_archive(Bootstrapper *bs, gchar *path){ register gchar *option, *command; register gint ret_val; register gchar *key, *tmp; register gchar *ar = "ar", *ar_init = "cSq", /* czr for OSF */ *ar_append = "Sq"; /* zr for OSF */ tmp = (gchar*)g_getenv("C4_AR"); if(tmp) ar = tmp; tmp = (gchar*)g_getenv("C4_AR_INIT"); if(tmp) ar_init = tmp; tmp = (gchar*)g_getenv("C4_AR_APPEND"); if(tmp) ar_append = tmp; if(g_tree_lookup(bs->model_dictionary, path)){ g_error("Duplicate model in [%s]", path); } else { g_message("Adding: [%s]", path); key = g_string_chunk_insert(bs->model_name_chunk, path); g_tree_insert(bs->model_dictionary, key, key); } if(bs->archive_member_count){ option = ar_append; } else { /* Is first */ option = ar_init; if(Codegen_file_exists(bs->archive_path)) remove(bs->archive_path); } command = g_strdup_printf("%s -%s %s %s", ar, option, bs->archive_path, path); g_message("Adding object file [%s] to archive", path); g_print("%s\n", command); ret_val = system(command); if(ret_val) g_error("Problem adding to archive [%d][%s]", ret_val, command); bs->archive_member_count++; g_free(command); return; } static void Bootstrapper_index_archive(Bootstrapper *bs){ register gchar *command; register gint ret_val; command = g_strdup_printf("ranlib %s", bs->archive_path); g_message("Indexing archive [%s]", bs->archive_path); g_print("%s\n", command); ret_val = system(command); g_message("Done"); if(ret_val) g_error("Problem indexing archive [%d][%s]", ret_val, command); bs->archive_member_count++; g_free(command); return; } static void Bootstrapper_close(Bootstrapper *bs){ register gchar *command; register gchar *cc_command = "gcc"; register gchar *cc_flags = "-O2"; register gchar *tmp; fprintf(bs->lookup_fp, " return NULL;\n"); fprintf(bs->lookup_fp, " }\n"); fclose(bs->header_fp); fclose(bs->lookup_fp); tmp = (gchar*)g_getenv("CC"); if(tmp) cc_command = tmp; tmp = (gchar*)g_getenv("CFLAGS"); if(tmp) cc_flags = tmp; command = g_strconcat(cc_command, " ", cc_flags, " -I" SOURCE_ROOT_DIR "/src/c4" " -I" SOURCE_ROOT_DIR "/src/sequence" " -I" SOURCE_ROOT_DIR "/src/general" " -I" SOURCE_ROOT_DIR "/src/struct" " " GLIB_CFLAGS, " -o ", bs->object_path, " -c ", bs->lookup_path, NULL); g_message("Compiling with: [%s]", command); if(system(command)) g_error("Problem compiling with: [%s]", command); g_free(command); /**/ Bootstrapper_add_to_archive(bs, bs->object_path); Bootstrapper_index_archive(bs); remove(bs->lookup_path); remove(bs->object_path); g_message("Finished building bootstrapping components"); return; } static void Bootstrapper_add_codegen(Bootstrapper *bs, Codegen *codegen){ g_assert(codegen); g_assert(codegen->name); Bootstrapper_add_to_archive(bs, codegen->object_path); g_message("Adding function [%s]", codegen->name); fprintf(bs->lookup_fp, " if(!strcmp(name, \"%s\"))\n", codegen->name); fprintf(bs->lookup_fp, " return %s;\n", codegen->name); fprintf(bs->header_fp, "C4_Score %s%s;\n", codegen->name, Viterbi_DP_Func_ARGS_STR); return; } static void Bootstrapper_process_Optimal(Bootstrapper *bs, Optimal *optimal){ register GPtrArray *codegen_list; register Codegen *codegen; register gint i; if(!optimal) return; g_message("Process optimal [%s]", optimal->name); codegen_list = Optimal_make_Codegen_list(optimal); for(i = 0; i < codegen_list->len; i++){ codegen = codegen_list->pdata[i]; Bootstrapper_add_codegen(bs, codegen); Codegen_destroy(codegen); } g_ptr_array_free(codegen_list, TRUE); return; } static void Bootstrapper_process_C4_Model(Bootstrapper *bs, C4_Model *model, gboolean generate_bsdp_code, gboolean generate_sdp_code){ register gint i; register Optimal *optimal = Optimal_create(model, NULL, Optimal_Type_SCORE |Optimal_Type_PATH |Optimal_Type_REDUCED_SPACE, TRUE); register Heuristic *heuristic; register SDP *sdp; register GPtrArray *codegen_list, *optimal_list; register Codegen *codegen; g_message("Bootstrapper processing model [%s]", model->name); Bootstrapper_process_Optimal(bs, optimal); if(generate_bsdp_code){ heuristic = Heuristic_create(model); optimal_list = Heuristic_get_Optimal_list(heuristic); for(i = 0; i < optimal_list->len; i++) Bootstrapper_process_Optimal(bs, optimal_list->pdata[i]); Heuristic_destroy(heuristic); g_ptr_array_free(optimal_list, TRUE); } if(generate_sdp_code){ sdp = SDP_create(model); codegen_list = SDP_get_codegen_list(sdp); for(i = 0; i < codegen_list->len; i++){ codegen = codegen_list->pdata[i]; Bootstrapper_add_codegen(bs, codegen); } g_ptr_array_free(codegen_list, TRUE); SDP_destroy(sdp); } Optimal_destroy(optimal); return; } static void Bootstrapper_process(Bootstrapper *bs, gchar *models){ register gchar **model_list = g_strsplit(models, " ", 1024); register gint i; register Model_Type type; register C4_Model *model; for(i = 0; model_list[i]; i++){ g_message("Processing model [%s]", model_list[i]); type = Model_Type_from_string(model_list[i]); switch(type){ case Model_Type_UNGAPPED: /* ungapped dna2dna */ model = Ungapped_create(Match_Type_DNA2DNA); Bootstrapper_process_C4_Model(bs, model, FALSE, FALSE); C4_Model_destroy(model); /* ungapped protein2protein */ model = Ungapped_create(Match_Type_PROTEIN2PROTEIN); Bootstrapper_process_C4_Model(bs, model, FALSE, FALSE); C4_Model_destroy(model); /* ungapped dna2protein */ model = Ungapped_create(Match_Type_DNA2PROTEIN); Bootstrapper_process_C4_Model(bs, model, FALSE, FALSE); C4_Model_destroy(model); /* ungapped protein2dna */ model = Ungapped_create(Match_Type_PROTEIN2DNA); Bootstrapper_process_C4_Model(bs, model, FALSE, FALSE); C4_Model_destroy(model); break; case Model_Type_UNGAPPED_TRANS: model = Ungapped_create(Match_Type_CODON2CODON); Bootstrapper_process_C4_Model(bs, model, FALSE, FALSE); C4_Model_destroy(model); break; case Model_Type_AFFINE_GLOBAL: model = Affine_create(Affine_Model_Type_GLOBAL, Alphabet_Type_DNA, Alphabet_Type_DNA, FALSE); Bootstrapper_process_C4_Model(bs, model, FALSE, FALSE); C4_Model_destroy(model); /**/ model = Affine_create(Affine_Model_Type_GLOBAL, Alphabet_Type_PROTEIN, Alphabet_Type_PROTEIN, FALSE); Bootstrapper_process_C4_Model(bs, model, FALSE, FALSE); C4_Model_destroy(model); break; case Model_Type_AFFINE_BESTFIT: model = Affine_create(Affine_Model_Type_BESTFIT, Alphabet_Type_DNA, Alphabet_Type_DNA, FALSE); Bootstrapper_process_C4_Model(bs, model, FALSE, FALSE); C4_Model_destroy(model); /**/ model = Affine_create(Affine_Model_Type_BESTFIT, Alphabet_Type_PROTEIN, Alphabet_Type_PROTEIN, FALSE); Bootstrapper_process_C4_Model(bs, model, FALSE, FALSE); C4_Model_destroy(model); break; case Model_Type_AFFINE_LOCAL: model = Affine_create(Affine_Model_Type_LOCAL, Alphabet_Type_DNA, Alphabet_Type_DNA, FALSE); Bootstrapper_process_C4_Model(bs, model, TRUE, TRUE); C4_Model_destroy(model); /**/ model = Affine_create(Affine_Model_Type_LOCAL, Alphabet_Type_PROTEIN, Alphabet_Type_PROTEIN, FALSE); Bootstrapper_process_C4_Model(bs, model, TRUE, TRUE); C4_Model_destroy(model); break; case Model_Type_AFFINE_OVERLAP: model = Affine_create(Affine_Model_Type_OVERLAP, Alphabet_Type_DNA, Alphabet_Type_DNA, FALSE); Bootstrapper_process_C4_Model(bs, model, FALSE, FALSE); C4_Model_destroy(model); break; case Model_Type_EST2GENOME: model = EST2Genome_create(); Bootstrapper_process_C4_Model(bs, model, TRUE, TRUE); C4_Model_destroy(model); break; case Model_Type_NER: model = NER_create(Alphabet_Type_DNA, Alphabet_Type_DNA); Bootstrapper_process_C4_Model(bs, model, TRUE, TRUE); C4_Model_destroy(model); /**/ model = NER_create(Alphabet_Type_PROTEIN, Alphabet_Type_PROTEIN); Bootstrapper_process_C4_Model(bs, model, TRUE, TRUE); C4_Model_destroy(model); break; case Model_Type_PROTEIN2DNA: model = Protein2DNA_create(Affine_Model_Type_LOCAL); Bootstrapper_process_C4_Model(bs, model, TRUE, TRUE); C4_Model_destroy(model); break; case Model_Type_PROTEIN2DNA_BESTFIT: model = Protein2DNA_create(Affine_Model_Type_BESTFIT); Bootstrapper_process_C4_Model(bs, model, FALSE, FALSE); C4_Model_destroy(model); break; case Model_Type_PROTEIN2GENOME: model = Protein2Genome_create(Affine_Model_Type_LOCAL); Bootstrapper_process_C4_Model(bs, model, TRUE, TRUE); C4_Model_destroy(model); break; case Model_Type_PROTEIN2GENOME_BESTFIT: model = Protein2Genome_create(Affine_Model_Type_BESTFIT); Bootstrapper_process_C4_Model(bs, model, FALSE, FALSE); C4_Model_destroy(model); break; case Model_Type_CODING2CODING: model = Coding2Coding_create(); Bootstrapper_process_C4_Model(bs, model, TRUE, TRUE); C4_Model_destroy(model); break; case Model_Type_CODING2GENOME: model = Coding2Genome_create(); Bootstrapper_process_C4_Model(bs, model, FALSE, TRUE); C4_Model_destroy(model); break; case Model_Type_CDNA2GENOME: model = CDNA2Genome_create(); Bootstrapper_process_C4_Model(bs, model, FALSE, TRUE); C4_Model_destroy(model); break; case Model_Type_GENOME2GENOME: model = Genome2Genome_create(); Bootstrapper_process_C4_Model(bs, model, FALSE, TRUE); C4_Model_destroy(model); break; default: g_error("Unknown Model type [%s]", model_list[i]); break; } } g_strfreev(model_list); return; } int Argument_main(Argument *arg){ register Bootstrapper *bs = Bootstrapper_create(); register gchar *models = (gchar*)g_getenv("C4_COMPILED_MODELS"); Heuristic_ArgumentSet_create(arg); Match_ArgumentSet_create(arg); Affine_ArgumentSet_create(arg); NER_ArgumentSet_create(arg); Intron_ArgumentSet_create(arg); Codegen_ArgumentSet_create(arg); Argument_process(arg, "exonerate:bootstrapper", NULL, NULL); if((!models) || (!g_strcasecmp(models, "all"))) models = "u u:t a:g a:b a:l a:o e2g ner p2d p2d:b p2g p2g:b c2c c2g cd2g"; Bootstrapper_process(bs, models); Bootstrapper_close(bs); Bootstrapper_destroy(bs); return 0; } exonerate-2.4.0/src/model/edit_distance.h0000644000175000017500000000273511162714316015300 00000000000000/****************************************************************\ * * * edit distance model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_EDIT_DISTANCE_H #define INCLUDED_EDIT_DISTANCE_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include "c4.h" typedef struct { gchar *query; gchar *target; gint query_len; gint target_len; } EditDistance_Data; EditDistance_Data *EditDistance_Data_create(gchar *query, gchar *target); void EditDistance_Data_destroy(EditDistance_Data *edd); C4_Model *EditDistance_create(void); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_EDIT_DISTANCE_H */ exonerate-2.4.0/src/model/affine.h0000644000175000017500000000556711162714302013732 00000000000000/****************************************************************\ * * * Module for various affine gapped models * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_AFFINE_H #define INCLUDED_AFFINE_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include "c4.h" #include "ungapped.h" #include "argument.h" typedef struct { C4_Score gap_open; C4_Score gap_extend; C4_Score codon_gap_open; C4_Score codon_gap_extend; } Affine_ArgumentSet; Affine_ArgumentSet *Affine_ArgumentSet_create(Argument *arg); typedef enum { Affine_Model_Type_GLOBAL, Affine_Model_Type_BESTFIT, Affine_Model_Type_LOCAL, Affine_Model_Type_OVERLAP, Affine_Model_Type_UNKNOWN } Affine_Model_Type; gchar *Affine_Model_Type_get_name(Affine_Model_Type type); typedef struct { Ungapped_Data ud; /* inherit */ Affine_ArgumentSet *aas; } Affine_Data; #define Affine_Data_get_query(ad) Ungapped_Data_get_query(&((ad)->ud)) #define Affine_Data_get_target(ad) Ungapped_Data_get_target(&((ad)->ud)) #define Affine_Data_get_dna_submat(ad) \ Ungapped_Data_get_dna_submat(&((ad)->ud)) #define Affine_Data_get_protein_submat(ad) \ Ungapped_Data_get_protein_submat(&((ad)->ud)) #define Affine_Data_get_translate(ad) \ Ungapped_Data_get_translate(&((ad)->ud)) #define Affine_Data_get_Intron_Data(ad) \ Ungapped_Data_get_Intron_Data(&((ad)->ud)) #define Affine_Data_get_Frameshift_Data(ad) \ Ungapped_Data_get_Frameshift_Data(&((ad)->ud)) void Affine_Data_init(Affine_Data *ad, Sequence *query, Sequence *target, gboolean translate_both); Affine_Data *Affine_Data_create(Sequence *query, Sequence *target, gboolean translate_both); void Affine_Data_clear(Affine_Data *ad); void Affine_Data_destroy(Affine_Data *ad); C4_Model *Affine_create(Affine_Model_Type type, Alphabet_Type query_type, Alphabet_Type target_type, gboolean translate_both); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_AFFINE_H */ exonerate-2.4.0/src/model/protein2dna.h0000644000175000017500000000436311162714317014726 00000000000000/****************************************************************\ * * * Protein <-> DNA comparison model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_PROTEIN2DNA_H #define INCLUDED_PROTEIN2DNA_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include "c4.h" #include "sequence.h" #include "affine.h" #include "frameshift.h" typedef struct { Affine_Data ad; /* inherit */ } Protein2DNA_Data; /**/ #define Protein2DNA_Data_get_submat(p2dd) \ Affine_Data_get_protein_submat(&((p2dd)->ad)) #define Protein2DNA_Data_get_translate(p2dd) \ Affine_Data_get_translate(&((p2dd)->ad)) #define Protein2DNA_Data_get_Intron_Data(p2dd) \ Affine_Data_get_Intron_Data(&((p2dd)->ad)) #define Protein2DNA_Data_get_Frameshift_Data(p2dd) \ Affine_Data_get_Frameshift_Data(&((p2dd)->ad)) #define Protein2DNA_Data_get_query(p2dd) \ Affine_Data_get_query(&((p2dd)->ad)) #define Protein2DNA_Data_get_target(p2dd) \ Affine_Data_get_target(&((p2dd)->ad)) void Protein2DNA_Data_init(Protein2DNA_Data *p2dd, Sequence *query, Sequence *target); Protein2DNA_Data *Protein2DNA_Data_create(Sequence *query, Sequence *target); void Protein2DNA_Data_clear(Protein2DNA_Data *p2dd); void Protein2DNA_Data_destroy(Protein2DNA_Data *p2dd); /**/ C4_Model *Protein2DNA_create(Affine_Model_Type type); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_PROTEIN2DNA_H */ exonerate-2.4.0/src/model/ner.h0000644000175000017500000000375111162714317013265 00000000000000/****************************************************************\ * * * Model for alignments with non-equivalenced regions * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_NER_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include "c4.h" #include "affine.h" #include "argument.h" typedef struct { gint min_ner; gint max_ner; C4_Score ner_open_penalty; } NER_ArgumentSet; NER_ArgumentSet *NER_ArgumentSet_create(Argument *arg); typedef struct { Affine_Data ad; /* inherit */ NER_ArgumentSet *nas; gint curr_ner_id; gint curr_ner_query_start; gint curr_ner_target_start; } NER_Data; #define NER_Data_get_query(nd) Affine_Data_get_query(&((nd)->ad)) #define NER_Data_get_target(nd) Affine_Data_get_target(&((nd)->ad)) #define NER_Data_get_dna_submat(nd) Affine_Data_get_dna_submat(&((nd)->ad)) #define NER_Data_get_protein_submat(nd) \ Affine_Data_get_protein_submat(&((nd)->ad)) NER_Data *NER_Data_create(Sequence *query, Sequence *target); void NER_Data_destroy(NER_Data *gnd); C4_Model *NER_create(Alphabet_Type query_type, Alphabet_Type target_type); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_NER_H */ exonerate-2.4.0/src/model/est2genome.h0000644000175000017500000000360511162714316014546 00000000000000/****************************************************************\ * * * Module for EST <-> genome alignments * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_EST2GENOME_H #define INCLUDED_EST2GENOME_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include /* For tolower() */ #include "c4.h" #include "sequence.h" #include "heuristic.h" #include "splice.h" #include "argument.h" #include "affine.h" typedef struct { Affine_Data ad; } EST2Genome_Data; #define EST2Genome_Data_get_submat(e2gd) \ Affine_Data_get_dna_submat(&((e2gd)->ad)) #define EST2Genome_Data_get_Intron_Data(e2gd) \ Affine_Data_get_Intron_Data(&((e2gd)->ad)) void EST2Genome_Data_init(EST2Genome_Data *e2gd, Sequence *query, Sequence *target); EST2Genome_Data *EST2Genome_Data_create(Sequence *query, Sequence *target); void EST2Genome_Data_clear(EST2Genome_Data *e2gd); void EST2Genome_Data_destroy(EST2Genome_Data *e2gd); C4_Model *EST2Genome_create(void); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_EST2GENOME_H */ exonerate-2.4.0/src/model/ungapped.h0000644000175000017500000000514111215724205014273 00000000000000/****************************************************************\ * * * Module for various ungapped alignment models * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_UNGAPPED_H #define INCLUDED_UNGAPPED_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include "c4.h" #include "heuristic.h" #include "submat.h" #include "translate.h" #include "sequence.h" #include "alignment.h" #include "argument.h" #include "intron.h" #include "frameshift.h" #include "match.h" #include "hspset.h" typedef struct { Heuristic_Data heuristic_data; /* Inherit */ Sequence *query; Sequence *target; Match_ArgumentSet *mas; Match *match_list[Match_Type_TOTAL]; Intron_Data *intron_data; Frameshift_Data *frameshift_data; } Ungapped_Data; #define Ungapped_Data_get_query(ud) (ud)->query #define Ungapped_Data_get_target(ud) (ud)->target #define Ungapped_Data_get_dna_submat(ud) ((ud)->mas->dna_submat) #define Ungapped_Data_get_protein_submat(ud) ((ud)->mas->protein_submat) #define Ungapped_Data_get_translate(ud) (ud)->mas->translate #define Ungapped_Data_get_Intron_Data(ud) (ud)->intron_data #define Ungapped_Data_get_Frameshift_Data(ud) (ud)->frameshift_data void Ungapped_Data_init(Ungapped_Data *ud, Sequence *query, Sequence *target, Match_Type match_type); Ungapped_Data *Ungapped_Data_create(Sequence *query, Sequence *target, Match_Type match_type); void Ungapped_Data_clear(Ungapped_Data *ud); void Ungapped_Data_destroy(Ungapped_Data *ud); C4_Model *Ungapped_create(Match_Type match_type); Alignment *Ungapped_Alignment_create(C4_Model *model, Ungapped_Data *ud, HSP *hsp); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_UNGAPPED_H */ exonerate-2.4.0/src/model/intron.h0000644000175000017500000000422311162714317014005 00000000000000/****************************************************************\ * * * Module of splice site and intron modelling * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_INTRON_H #define INCLUDED_INTRON_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include "c4.h" #include "splice.h" #include "argument.h" #include "phase.h" typedef struct { gint min_intron; gint max_intron; SplicePredictorSet *sps; C4_Score intron_open_penalty; } Intron_ArgumentSet; Intron_ArgumentSet *Intron_ArgumentSet_create(Argument *arg); typedef struct { gint curr_intron_start; SplicePrediction_Set *sps; } Intron_ChainData; /* FIXME: can remove chain data and just use curr_{qy,tg}_intron_start */ typedef struct { Intron_ArgumentSet *ias; Intron_ChainData *query_data; Intron_ChainData *target_data; Region *curr_region; } Intron_Data; Intron_Data *Intron_Data_create(void); void Intron_Data_destroy(Intron_Data *id); C4_Model *Intron_create(gchar *suffix, gboolean on_query, gboolean on_target, gboolean is_forward); void Intron_ChainData_init_splice_prediction(Intron_ChainData *icd, Sequence *s, SplicePredictorSet *sps, SpliceType type, gboolean use_single); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_INTRON_H */ exonerate-2.4.0/src/model/protein2genome.h0000644000175000017500000000404411162714320015424 00000000000000/****************************************************************\ * * * Protein <-> Genome comparison model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_PROTEIN2GENOME_H #define INCLUDED_PROTEIN2GENOME_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include "c4.h" #include "sequence.h" #include "affine.h" #include "protein2dna.h" typedef struct { Protein2DNA_Data p2dd; } Protein2Genome_Data; /**/ #define Protein2Genome_Data_get_submat(p2gd) \ Protein2DNA_Data_get_submat(&((p2gd)->p2dd)) #define Protein2Genome_Data_get_translate(p2gd) \ Protein2DNA_Data_get_translate(&((p2gd)->p2dd)) #define Protein2Genome_Data_get_Intron_Data(p2gd) \ Protein2DNA_Data_get_Intron_Data(&((p2gd)->p2dd)) #define Protein2Genome_Data_get_query(p2gd) \ Protein2DNA_Data_get_query(&((p2gd)->p2dd)) #define Protein2Genome_Data_get_target(p2gd) \ Protein2DNA_Data_get_target(&((p2gd)->p2dd)) Protein2Genome_Data *Protein2Genome_Data_create(Sequence *query, Sequence *target); void Protein2Genome_Data_destroy(Protein2Genome_Data *p2gd); /**/ C4_Model *Protein2Genome_create(Affine_Model_Type type); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_PROTEIN2GENOME_H */ exonerate-2.4.0/src/model/coding2coding.h0000644000175000017500000000407711162714316015213 00000000000000/****************************************************************\ * * * Coding DNA comparison model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_CODING2CODING_H #define INCLUDED_CODING2CODING_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include "c4.h" #include "sequence.h" #include "affine.h" typedef struct { Affine_Data ad; /* inherit */ } Coding2Coding_Data; /**/ #define Coding2Coding_Data_get_submat(c2dd) \ Affine_Data_get_protein_submat(&((c2dd)->ad)) #define Coding2Coding_Data_get_translate(c2dd) \ Affine_Data_get_translate(&((c2dd)->ad)) #define Coding2Coding_Data_get_Frameshift_Data(c2dd) \ Affine_Data_get_Frameshift_Data(&((c2dd)->ad)) #define Coding2Coding_Data_get_Intron_Data(c2dd) \ Affine_Data_get_Intron_Data(&((c2dd)->ad)) void Coding2Coding_Data_init(Coding2Coding_Data *c2cd, Sequence *query, Sequence *target); Coding2Coding_Data *Coding2Coding_Data_create(Sequence *query, Sequence *target); void Coding2Coding_Data_clear(Coding2Coding_Data *c2cd); void Coding2Coding_Data_destroy(Coding2Coding_Data *c2cd); /**/ C4_Model *Coding2Coding_create(void); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_CODING2CODING_H */ exonerate-2.4.0/src/model/frameshift.h0000644000175000017500000000313711162714316014626 00000000000000/****************************************************************\ * * * Module for frameshift modelling * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_FRAMESHIFT_H #define INCLUDED_FRAMESHIFT_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include "c4.h" #include "argument.h" typedef struct { C4_Score frameshift_penalty; } Frameshift_ArgumentSet; typedef struct { Frameshift_ArgumentSet *fas; } Frameshift_Data; Frameshift_Data *Frameshift_Data_create(void); void Frameshift_Data_destroy(Frameshift_Data *fd); Frameshift_ArgumentSet *Frameshift_ArgumentSet_create(Argument *arg); void Frameshift_add(C4_Model *model, C4_State *match_state, gchar *suffix, gboolean apply_to_query); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_FRAMESHIFT_H */ exonerate-2.4.0/src/model/coding2genome.h0000644000175000017500000000376111162714316015221 00000000000000/****************************************************************\ * * * Coding <-> Genome comparison model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_CODING2GENOME_H #define INCLUDED_CODING2GENOME_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include "c4.h" #include "sequence.h" #include "affine.h" #include "coding2coding.h" typedef struct { Coding2Coding_Data c2cd; } Coding2Genome_Data; #define Coding2Genome_Data_get_submat(g2gd) \ Coding2Coding_Data_get_submat(&((g2gd)->c2cd)) #define Coding2Genome_Data_get_translate(g2gd) \ Coding2Coding_Data_get_translate(&((g2gd)->c2cd)) #define Coding2Genome_Data_get_Intron_Data(g2gd) \ Coding2Coding_Data_get_Intron_Data(&((g2gd)->c2cd)) /**/ void Coding2Genome_Data_init(Coding2Genome_Data *c2gd, Sequence *query, Sequence *target); Coding2Genome_Data *Coding2Genome_Data_create(Sequence *query, Sequence *target); void Coding2Genome_Data_clear(Coding2Genome_Data *c2gd); void Coding2Genome_Data_destroy(Coding2Genome_Data *c2gd); /**/ C4_Model *Coding2Genome_create(void); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_CODING2GENOME_H */ exonerate-2.4.0/src/model/genome2genome.h0000644000175000017500000000366211162714317015231 00000000000000/****************************************************************\ * * * Genome <-> Genome comparison model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_GENOME2GENOME_H #define INCLUDED_GENOME2GENOME_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include "c4.h" #include "sequence.h" #include "affine.h" #include "cdna2genome.h" typedef struct { CDNA2Genome_Data cd2gd; } Genome2Genome_Data; #define Genome2Genome_Data_get_dna_submat(g2gd) \ CDNA2Genome_Data_get_dna_submat(&((g2gd)->cd2gd)) #define Genome2Genome_Data_get_protein_submat(g2gd) \ CDNA2Genome_Data_get_protein_submat(&((g2gd)->cd2gd)) #define Genome2Genome_Data_get_translate(g2gd) \ CDNA2Genome_Data_get_translate(&((g2gd)->cd2gd)) #define Genome2Genome_Data_get_Intron_Data(g2gd) \ CDNA2Coding_Data_get_Intron_Data(&((g2gd)->cd2gd)) /**/ Genome2Genome_Data *Genome2Genome_Data_create(Sequence *query, Sequence *target); void Genome2Genome_Data_destroy(Genome2Genome_Data *p2gd); /**/ C4_Model *Genome2Genome_create(void); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_GENOME2GENOME_H */ exonerate-2.4.0/src/model/cdna2genome.h0000644000175000017500000000432211162714302014650 00000000000000/****************************************************************\ * * * cDNA <-> genomic alignment model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_CDNA2GENOME_H #define INCLUDED_CDNA2GENOME_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include "c4.h" #include "sequence.h" #include "affine.h" #include "coding2genome.h" #include "est2genome.h" #include "argument.h" typedef union { /* NB. Must be a union for multiple-inheritance hack */ Coding2Genome_Data c2gd; EST2Genome_Data e2gd; } CDNA2Genome_Data; #define CDNA2Genome_Data_get_dna_submat(cd2gd) \ EST2Genome_Data_get_submat(&((cd2gd)->e2gd)) #define CDNA2Genome_Data_get_protein_submat(cd2gd) \ Coding2Genome_Data_get_submat(&((cd2gd)->c2gd)) #define CDNA2Genome_Data_get_translate(cd2gd) \ Coding2Genome_Data_get_translate(&((cd2gd)->c2gd)) #define CDNA2Genome_Data_get_Intron_Data(cd2gd) \ Coding2Genome_Data_get_Intron_Data(&((cd2gd)->c2gd)) /**/ void CDNA2Genome_Data_init(CDNA2Genome_Data *cd2gd, Sequence *query, Sequence *target); CDNA2Genome_Data *CDNA2Genome_Data_create(Sequence *query, Sequence *target); void CDNA2Genome_Data_clear(CDNA2Genome_Data *cd2gd); void CDNA2Genome_Data_destroy(CDNA2Genome_Data *cd2gd); /**/ C4_Model *CDNA2Genome_create(void); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_CDNA2GENOME_H */ exonerate-2.4.0/src/model/phase.h0000644000175000017500000000242411162714317013575 00000000000000/****************************************************************\ * * * Model for phased introns. * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_PHASE_H #define INCLUDED_PHASE_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include "c4.h" #include "sequence.h" #include "match.h" C4_Model *Phase_create(gchar *suffix, Match *match, gboolean on_query, gboolean on_target); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_PHASE_H */ exonerate-2.4.0/src/model/modeltype.h0000644000175000017500000000456611162714317014510 00000000000000/****************************************************************\ * * * Interface for different types of alignment model * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_MODEL_TYPE_H #define INCLUDED_MODEL_TYPE_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include "c4.h" #include "alphabet.h" #include "sequence.h" typedef enum { Model_Type_UNGAPPED, Model_Type_UNGAPPED_TRANS, Model_Type_AFFINE_GLOBAL, Model_Type_AFFINE_BESTFIT, Model_Type_AFFINE_LOCAL, Model_Type_AFFINE_OVERLAP, Model_Type_EST2GENOME, Model_Type_NER, Model_Type_PROTEIN2DNA, Model_Type_PROTEIN2DNA_BESTFIT, Model_Type_PROTEIN2GENOME, Model_Type_PROTEIN2GENOME_BESTFIT, Model_Type_CODING2CODING, Model_Type_CODING2GENOME, Model_Type_CDNA2GENOME, Model_Type_GENOME2GENOME, Model_Type_TOTAL /* just to store the total */ } Model_Type; gchar *Model_Type_to_string(Model_Type type); Model_Type Model_Type_from_string(gchar *str); gboolean Model_Type_is_gapped(Model_Type type); gboolean Model_Type_translate_both(Model_Type type); gboolean Model_Type_has_dual_match(Model_Type type); gboolean Model_Type_has_genomic_target(Model_Type type); C4_Model *Model_Type_get_model(Model_Type type, Alphabet_Type query_type, Alphabet_Type target_type); gpointer Model_Type_create_data(Model_Type type, Sequence *query, Sequence *target); void Model_Type_destroy_data(Model_Type type, gpointer model_data); /**/ #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_MODEL_TYPE_H */ exonerate-2.4.0/src/hub/0000777000175000017500000000000011222703703012056 500000000000000exonerate-2.4.0/src/hub/Makefile.am0000644000175000017500000001351011222221201014012 00000000000000 TESTS = gam.test analysis.test bsam.test noinst_PROGRAMS = $(TESTS) INCLUDES = -I$(top_srcdir)/src/sequence \ -I$(top_srcdir)/src/database \ -I$(top_srcdir)/src/comparison \ -I$(top_srcdir)/src/struct \ -I$(top_srcdir)/src/c4 \ -I$(top_srcdir)/src/bsdp \ -I$(top_srcdir)/src/sdp \ -I$(top_srcdir)/src/model \ -I$(top_srcdir)/src/general \ -DHOSTTYPE="\"@host@\"" \ -DSOURCE_ROOT_DIR="\"@source_root_dir@\"" noinst_HEADERS = gam.h analysis.h bsam.h C4_OBJECTS = $(top_srcdir)/src/c4/c4.o \ $(top_srcdir)/src/c4/codegen.o \ $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/opair.o \ $(top_srcdir)/src/c4/alignment.o \ $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/viterbi.o \ $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/bsdp/bsdp.o \ $(top_srcdir)/src/bsdp/sar.o \ $(top_srcdir)/src/bsdp/heuristic.o \ $(top_srcdir)/src/bsdp/hpair.o \ $(top_srcdir)/src/sdp/boundary.o \ $(top_srcdir)/src/sdp/sdp.o \ $(top_srcdir)/src/sdp/scheduler.o \ $(top_srcdir)/src/sdp/straceback.o \ $(top_srcdir)/src/sdp/lookahead.o \ $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/model/affine.o \ $(top_srcdir)/src/model/est2genome.o \ $(top_srcdir)/src/model/ner.o \ $(top_srcdir)/src/model/protein2dna.o \ $(top_srcdir)/src/model/protein2genome.o \ $(top_srcdir)/src/model/coding2coding.o \ $(top_srcdir)/src/model/coding2genome.o \ $(top_srcdir)/src/model/cdna2genome.o \ $(top_srcdir)/src/model/genome2genome.o \ $(top_srcdir)/src/model/ungapped.o \ $(top_srcdir)/src/model/intron.o \ $(top_srcdir)/src/model/frameshift.o \ $(top_srcdir)/src/model/phase.o \ $(top_srcdir)/src/model/modeltype.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/jobqueue.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ -lm gam_test_SOURCES = gam.test.c gam.c gam_test_LDADD = $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/comparison.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/threadref.o \ $(C4_OBJECTS) bsam_test_SOURCES = bsam.test.c bsam.c bsam_test_LDADD = $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o \ $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/comparison.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/struct/dejavu.o \ $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/matrix.o \ -lm analysis_test_SOURCES = analysis.test.c analysis.c gam.c bsam.c analysis_test_LDADD = $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/comparison/seeder.o \ $(top_srcdir)/src/comparison/comparison.o \ $(top_srcdir)/src/database/fastapipe.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/struct/dejavu.o \ $(top_srcdir)/src/struct/fsm.o \ $(top_srcdir)/src/struct/vfsm.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/socket.o \ $(top_srcdir)/src/general/threadref.o \ $(C4_OBJECTS) # Files to clear away MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/src/hub/Makefile.in0000644000175000017500000004507211222703651014051 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ TESTS = gam.test analysis.test bsam.test noinst_PROGRAMS = $(TESTS) INCLUDES = -I$(top_srcdir)/src/sequence -I$(top_srcdir)/src/database -I$(top_srcdir)/src/comparison -I$(top_srcdir)/src/struct -I$(top_srcdir)/src/c4 -I$(top_srcdir)/src/bsdp -I$(top_srcdir)/src/sdp -I$(top_srcdir)/src/model -I$(top_srcdir)/src/general -DHOSTTYPE="\"@host@\"" -DSOURCE_ROOT_DIR="\"@source_root_dir@\"" noinst_HEADERS = gam.h analysis.h bsam.h C4_OBJECTS = $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o $(top_srcdir)/src/c4/opair.o $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/optimal.o $(top_srcdir)/src/c4/viterbi.o $(top_srcdir)/src/c4/layout.o $(top_srcdir)/src/c4/region.o $(top_srcdir)/src/c4/subopt.o $(top_srcdir)/src/bsdp/bsdp.o $(top_srcdir)/src/bsdp/sar.o $(top_srcdir)/src/bsdp/heuristic.o $(top_srcdir)/src/bsdp/hpair.o $(top_srcdir)/src/sdp/boundary.o $(top_srcdir)/src/sdp/sdp.o $(top_srcdir)/src/sdp/scheduler.o $(top_srcdir)/src/sdp/straceback.o $(top_srcdir)/src/sdp/lookahead.o $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/rangetree.o $(top_srcdir)/src/sequence/sequence.o $(top_srcdir)/src/sequence/alphabet.o $(top_srcdir)/src/sequence/translate.o $(top_srcdir)/src/sequence/splice.o $(top_srcdir)/src/model/affine.o $(top_srcdir)/src/model/est2genome.o $(top_srcdir)/src/model/ner.o $(top_srcdir)/src/model/protein2dna.o $(top_srcdir)/src/model/protein2genome.o $(top_srcdir)/src/model/coding2coding.o $(top_srcdir)/src/model/coding2genome.o $(top_srcdir)/src/model/cdna2genome.o $(top_srcdir)/src/model/genome2genome.o $(top_srcdir)/src/model/ungapped.o $(top_srcdir)/src/model/intron.o $(top_srcdir)/src/model/frameshift.o $(top_srcdir)/src/model/phase.o $(top_srcdir)/src/model/modeltype.o $(top_srcdir)/src/general/argument.o $(top_srcdir)/src/general/lineparse.o $(top_srcdir)/src/general/jobqueue.o $(top_srcdir)/src/comparison/match.o $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/sequence/codonsubmat.o -lm gam_test_SOURCES = gam.test.c gam.c gam_test_LDADD = $(top_srcdir)/src/comparison/hspset.o $(top_srcdir)/src/comparison/comparison.o $(top_srcdir)/src/comparison/wordhood.o $(top_srcdir)/src/struct/sparsecache.o $(top_srcdir)/src/general/compoundfile.o $(top_srcdir)/src/general/threadref.o $(C4_OBJECTS) bsam_test_SOURCES = bsam.test.c bsam.c bsam_test_LDADD = $(top_srcdir)/src/general/argument.o $(top_srcdir)/src/general/lineparse.o $(top_srcdir)/src/general/threadref.o $(top_srcdir)/src/comparison/hspset.o $(top_srcdir)/src/comparison/comparison.o $(top_srcdir)/src/comparison/match.o $(top_srcdir)/src/comparison/wordhood.o $(top_srcdir)/src/sequence/sequence.o $(top_srcdir)/src/sequence/alphabet.o $(top_srcdir)/src/sequence/translate.o $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/sequence/codonsubmat.o $(top_srcdir)/src/sequence/splice.o $(top_srcdir)/src/struct/dejavu.o $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/sparsecache.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/matrix.o -lm analysis_test_SOURCES = analysis.test.c analysis.c gam.c bsam.c analysis_test_LDADD = $(top_srcdir)/src/comparison/hspset.o $(top_srcdir)/src/comparison/wordhood.o $(top_srcdir)/src/comparison/seeder.o $(top_srcdir)/src/comparison/comparison.o $(top_srcdir)/src/database/fastapipe.o $(top_srcdir)/src/database/fastadb.o $(top_srcdir)/src/struct/dejavu.o $(top_srcdir)/src/struct/fsm.o $(top_srcdir)/src/struct/vfsm.o $(top_srcdir)/src/struct/sparsecache.o $(top_srcdir)/src/general/compoundfile.o $(top_srcdir)/src/general/socket.o $(top_srcdir)/src/general/threadref.o $(C4_OBJECTS) # Files to clear away MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = PROGRAMS = $(noinst_PROGRAMS) DEFS = @DEFS@ -I. -I$(srcdir) CPPFLAGS = @CPPFLAGS@ LDFLAGS = @LDFLAGS@ LIBS = @LIBS@ gam_test_OBJECTS = gam.test.o gam.o gam_test_DEPENDENCIES = $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/comparison.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/threadref.o $(top_srcdir)/src/c4/c4.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/opair.o $(top_srcdir)/src/c4/alignment.o \ $(top_srcdir)/src/c4/optimal.o $(top_srcdir)/src/c4/viterbi.o \ $(top_srcdir)/src/c4/layout.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/subopt.o $(top_srcdir)/src/bsdp/bsdp.o \ $(top_srcdir)/src/bsdp/sar.o $(top_srcdir)/src/bsdp/heuristic.o \ $(top_srcdir)/src/bsdp/hpair.o $(top_srcdir)/src/sdp/boundary.o \ $(top_srcdir)/src/sdp/sdp.o $(top_srcdir)/src/sdp/scheduler.o \ $(top_srcdir)/src/sdp/straceback.o $(top_srcdir)/src/sdp/lookahead.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/sequence/splice.o $(top_srcdir)/src/model/affine.o \ $(top_srcdir)/src/model/est2genome.o $(top_srcdir)/src/model/ner.o \ $(top_srcdir)/src/model/protein2dna.o \ $(top_srcdir)/src/model/protein2genome.o \ $(top_srcdir)/src/model/coding2coding.o \ $(top_srcdir)/src/model/coding2genome.o \ $(top_srcdir)/src/model/cdna2genome.o \ $(top_srcdir)/src/model/genome2genome.o \ $(top_srcdir)/src/model/ungapped.o $(top_srcdir)/src/model/intron.o \ $(top_srcdir)/src/model/frameshift.o $(top_srcdir)/src/model/phase.o \ $(top_srcdir)/src/model/modeltype.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/jobqueue.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o gam_test_LDFLAGS = analysis_test_OBJECTS = analysis.test.o analysis.o gam.o bsam.o analysis_test_DEPENDENCIES = $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/comparison/seeder.o \ $(top_srcdir)/src/comparison/comparison.o \ $(top_srcdir)/src/database/fastapipe.o \ $(top_srcdir)/src/database/fastadb.o $(top_srcdir)/src/struct/dejavu.o \ $(top_srcdir)/src/struct/fsm.o $(top_srcdir)/src/struct/vfsm.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/socket.o \ $(top_srcdir)/src/general/threadref.o $(top_srcdir)/src/c4/c4.o \ $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/opair.o $(top_srcdir)/src/c4/alignment.o \ $(top_srcdir)/src/c4/optimal.o $(top_srcdir)/src/c4/viterbi.o \ $(top_srcdir)/src/c4/layout.o $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/subopt.o $(top_srcdir)/src/bsdp/bsdp.o \ $(top_srcdir)/src/bsdp/sar.o $(top_srcdir)/src/bsdp/heuristic.o \ $(top_srcdir)/src/bsdp/hpair.o $(top_srcdir)/src/sdp/boundary.o \ $(top_srcdir)/src/sdp/sdp.o $(top_srcdir)/src/sdp/scheduler.o \ $(top_srcdir)/src/sdp/straceback.o $(top_srcdir)/src/sdp/lookahead.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/sequence/splice.o $(top_srcdir)/src/model/affine.o \ $(top_srcdir)/src/model/est2genome.o $(top_srcdir)/src/model/ner.o \ $(top_srcdir)/src/model/protein2dna.o \ $(top_srcdir)/src/model/protein2genome.o \ $(top_srcdir)/src/model/coding2coding.o \ $(top_srcdir)/src/model/coding2genome.o \ $(top_srcdir)/src/model/cdna2genome.o \ $(top_srcdir)/src/model/genome2genome.o \ $(top_srcdir)/src/model/ungapped.o $(top_srcdir)/src/model/intron.o \ $(top_srcdir)/src/model/frameshift.o $(top_srcdir)/src/model/phase.o \ $(top_srcdir)/src/model/modeltype.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/jobqueue.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o analysis_test_LDFLAGS = bsam_test_OBJECTS = bsam.test.o bsam.o bsam_test_DEPENDENCIES = $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o \ $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/comparison.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/sequence/splice.o $(top_srcdir)/src/struct/dejavu.o \ $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/matrix.o bsam_test_LDFLAGS = CFLAGS = @CFLAGS@ COMPILE = $(CC) $(DEFS) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) CCLD = $(CC) LINK = $(CCLD) $(AM_CFLAGS) $(CFLAGS) $(LDFLAGS) -o $@ HEADERS = $(noinst_HEADERS) DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best SOURCES = $(gam_test_SOURCES) $(analysis_test_SOURCES) $(bsam_test_SOURCES) OBJECTS = $(gam_test_OBJECTS) $(analysis_test_OBJECTS) $(bsam_test_OBJECTS) all: all-redirect .SUFFIXES: .SUFFIXES: .S .c .o .s $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps src/hub/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status mostlyclean-noinstPROGRAMS: clean-noinstPROGRAMS: -test -z "$(noinst_PROGRAMS)" || rm -f $(noinst_PROGRAMS) distclean-noinstPROGRAMS: maintainer-clean-noinstPROGRAMS: .c.o: $(COMPILE) -c $< .s.o: $(COMPILE) -c $< .S.o: $(COMPILE) -c $< mostlyclean-compile: -rm -f *.o core *.core clean-compile: distclean-compile: -rm -f *.tab.c maintainer-clean-compile: gam.test: $(gam_test_OBJECTS) $(gam_test_DEPENDENCIES) @rm -f gam.test $(LINK) $(gam_test_LDFLAGS) $(gam_test_OBJECTS) $(gam_test_LDADD) $(LIBS) analysis.test: $(analysis_test_OBJECTS) $(analysis_test_DEPENDENCIES) @rm -f analysis.test $(LINK) $(analysis_test_LDFLAGS) $(analysis_test_OBJECTS) $(analysis_test_LDADD) $(LIBS) bsam.test: $(bsam_test_OBJECTS) $(bsam_test_DEPENDENCIES) @rm -f bsam.test $(LINK) $(bsam_test_LDFLAGS) $(bsam_test_OBJECTS) $(bsam_test_LDADD) $(LIBS) tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = src/hub distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done check-TESTS: $(TESTS) @failed=0; all=0; \ srcdir=$(srcdir); export srcdir; \ for tst in $(TESTS); do \ if test -f $$tst; then dir=.; \ else dir="$(srcdir)"; fi; \ if $(TESTS_ENVIRONMENT) $$dir/$$tst; then \ all=`expr $$all + 1`; \ echo "PASS: $$tst"; \ elif test $$? -ne 77; then \ all=`expr $$all + 1`; \ failed=`expr $$failed + 1`; \ echo "FAIL: $$tst"; \ fi; \ done; \ if test "$$failed" -eq 0; then \ banner="All $$all tests passed"; \ else \ banner="$$failed of $$all tests failed"; \ fi; \ dashes=`echo "$$banner" | sed s/./=/g`; \ echo "$$dashes"; \ echo "$$banner"; \ echo "$$dashes"; \ test "$$failed" -eq 0 info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am $(MAKE) $(AM_MAKEFLAGS) check-TESTS check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall: uninstall-am all-am: Makefile $(PROGRAMS) $(HEADERS) all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-noinstPROGRAMS mostlyclean-compile \ mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-noinstPROGRAMS clean-compile clean-tags clean-generic \ mostlyclean-am clean: clean-am distclean-am: distclean-noinstPROGRAMS distclean-compile distclean-tags \ distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-noinstPROGRAMS \ maintainer-clean-compile maintainer-clean-tags \ maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: mostlyclean-noinstPROGRAMS distclean-noinstPROGRAMS \ clean-noinstPROGRAMS maintainer-clean-noinstPROGRAMS \ mostlyclean-compile distclean-compile clean-compile \ maintainer-clean-compile tags mostlyclean-tags distclean-tags \ clean-tags maintainer-clean-tags distdir check-TESTS info-am info \ dvi-am dvi check check-am installcheck-am installcheck install-exec-am \ install-exec install-data-am install-data install-am install \ uninstall-am uninstall all-redirect all-am all installdirs \ mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/src/hub/gam.test.c0000644000175000017500000000205211162714301013656 00000000000000/****************************************************************\ * * * GAM: Gapped Alignment Manager * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "gam.h" int Argument_main(Argument *arg){ g_warning("test [%s] not implemented", __FILE__); return 0; } exonerate-2.4.0/src/hub/gam.c0000644000175000017500000014327611222241473012720 00000000000000/****************************************************************\ * * * GAM: Gapped Alignment Manager * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For fopen() */ #include /* For qsort() */ #include /* For strcmp() */ #include /* For unlink(), getpid(), getppid() etc */ #include "gam.h" #include "ungapped.h" #include "opair.h" #include "rangetree.h" static gchar *GAM_Argument_parse_Model_Type(gchar *arg_string, gpointer data){ gchar *type_str; register gchar *ret_val = Argument_parse_string(arg_string, &type_str); register Model_Type *model_type = (Model_Type*)data; if(ret_val) return ret_val; (*model_type) = Model_Type_from_string(type_str); return NULL; } /**/ static gchar *GAM_Refinement_to_string(GAM_Refinement refinement){ register gchar *name = NULL; switch(refinement){ case GAM_Refinement_NONE: name = "none"; break; case GAM_Refinement_FULL: name = "full"; break; case GAM_Refinement_REGION: name = "region"; break; default: g_error("Unknown GAM Refinement [%d]", refinement); break; } return name; } static GAM_Refinement GAM_Refinement_from_string(gchar *str){ gchar *name[GAM_Refinement_TOTAL] = {"none", "full", "region"}; GAM_Refinement refinement[GAM_Refinement_TOTAL] = {GAM_Refinement_NONE, GAM_Refinement_FULL, GAM_Refinement_REGION}; register gint i; for(i = 0; i < GAM_Refinement_TOTAL; i++) if(!g_strcasecmp(name[i], str)) return refinement[i]; g_error("Unknown refinement type [%s]", str); return GAM_Refinement_NONE; /* Not reached */ } static gchar *GAM_Argument_parse_refinement(gchar *arg_string, gpointer data){ register GAM_Refinement *refinement = (GAM_Refinement*)data; gchar *refine_str; register gchar *ret_val = Argument_parse_string(arg_string, &refine_str); if(ret_val) return ret_val; (*refinement) = GAM_Refinement_from_string(refine_str); return NULL; } /**/ GAM_ArgumentSet *GAM_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static GAM_ArgumentSet gas; if(arg){ as = ArgumentSet_create("Gapped Alignment Options"); ArgumentSet_add_option(as, 'm', "model", "alignment model", "Specify alignment model type\n" "Supported types:\n" " ungapped ungapped:trans\n" " affine:global affine:bestfit affine:local affine:overlap\n" " est2genome ner protein2dna protein2genome\n" " protein2dna:bestfit protein2genome:bestfit\n" " coding2coding coding2genome cdna2genome genome2genome\n", "ungapped", GAM_Argument_parse_Model_Type, &gas.type); ArgumentSet_add_option(as, 's', "score", "threshold", "Score threshold for gapped alignment", "100", Argument_parse_int, &gas.threshold); ArgumentSet_add_option(as, '\0', "percent", "threshold", "Percent self-score threshold", "0.0", Argument_parse_float, &gas.percent_threshold); ArgumentSet_add_option(as, 0, "showalignment", NULL, "Include (human readable) alignment in results", "TRUE", Argument_parse_boolean, &gas.show_alignment); ArgumentSet_add_option(as, 0, "showsugar", NULL, "Include 'sugar' format output in results", "FALSE", Argument_parse_boolean, &gas.show_sugar); ArgumentSet_add_option(as, 0, "showcigar", NULL, "Include 'cigar' format output in results", "FALSE", Argument_parse_boolean, &gas.show_cigar); ArgumentSet_add_option(as, 0, "showvulgar", NULL, "Include 'vulgar' format output in results", "TRUE", Argument_parse_boolean, &gas.show_vulgar); ArgumentSet_add_option(as, 0, "showquerygff", NULL, "Include GFF output on query in results", "FALSE", Argument_parse_boolean, &gas.show_query_gff); ArgumentSet_add_option(as, 0, "showtargetgff", NULL, "Include GFF output on target in results", "FALSE", Argument_parse_boolean, &gas.show_target_gff); ArgumentSet_add_option(as, 0, "ryo", "format", "Roll-your-own printf-esque output format", "NULL", Argument_parse_string, &gas.ryo); ArgumentSet_add_option(as, 'n', "bestn", "number", "Report best N results per query", "0", Argument_parse_int, &gas.best_n); ArgumentSet_add_option(as, 'S', "subopt", NULL, "Search for suboptimal alignments", "TRUE", Argument_parse_boolean, &gas.use_subopt); ArgumentSet_add_option(as, 'g', "gappedextension", NULL, "Use gapped extension (default is SDP)", "TRUE", Argument_parse_boolean, &gas.use_gapped_extension); /**/ ArgumentSet_add_option(as, '\0', "refine", NULL, "Alignment refinement strategy [none|full|region]", "none", GAM_Argument_parse_refinement, &gas.refinement); ArgumentSet_add_option(as, '\0', "refineboundary", NULL, "Refinement region boundary", "32", Argument_parse_int, &gas.refinement_boundary); /**/ Argument_absorb_ArgumentSet(arg, as); } return &gas; } /**/ static gboolean GAM_StoredResult_compare(gpointer low, gpointer high, gpointer user_data){ register GAM_StoredResult *gsr_low = (GAM_StoredResult*)low, *gsr_high = (GAM_StoredResult*)high; return gsr_low->score < gsr_high->score; } static void GAM_display_alignment(GAM *gam, Alignment *alignment, Sequence *query, Sequence *target, gint result_id, gint rank, gpointer user_data, gpointer self_data, FILE *fp); static GAM_StoredResult *GAM_StoredResult_create(GAM *gam, Sequence *query, Sequence *target, Alignment *alignment, gpointer user_data, gpointer self_data){ register GAM_StoredResult *gsr = g_new(GAM_StoredResult, 1); gsr->score = alignment->score; gsr->pos = ftell(gam->bestn_tmp_file); GAM_display_alignment(gam, alignment, query, target, 0, -1, user_data, self_data, gam->bestn_tmp_file); gsr->len = ftell(gam->bestn_tmp_file) - gsr->pos; return gsr; } static void GAM_StoredResult_destroy(GAM_StoredResult *gsr){ g_free(gsr); return; } /* FIXME: optimisation: * could store unused positions in the tmp file for reuse */ static void GAM_StoredResult_display(GAM_StoredResult *gsr, GAM *gam, gint rank){ register gint i, ch, tag_pos = 0; register gchar *rank_tag = "%_EXONERATE_BESTN_RANK_%"; if(fseek(gam->bestn_tmp_file, gsr->pos, SEEK_SET)) g_error("Could not seek in tmp file"); for(i = 0; i < gsr->len; i++){ ch = getc(gam->bestn_tmp_file); g_assert(ch != EOF); if(ch == rank_tag[tag_pos]){ tag_pos++; if(!rank_tag[tag_pos]){ printf("%d", rank); tag_pos = 0; } } else { if(tag_pos){ printf("%.*s", tag_pos, rank_tag); tag_pos = 0; } putchar(ch); } } fflush(stdout); return; } /**/ static gint GAM_compare_id(gconstpointer a, gconstpointer b){ register const gchar *id_a = a, *id_b = b; return strcmp(id_a, id_b); } static GAM_QueryResult *GAM_QueryResult_create(GAM *gam, gchar *query_id){ register GAM_QueryResult *gqr = g_new(GAM_QueryResult, 1); gqr->pq = PQueue_create(gam->pqueue_set, GAM_StoredResult_compare, NULL); gqr->query_id = g_strdup(query_id); gqr->tie_count = 0; gqr->tie_score = C4_IMPOSSIBLY_LOW_SCORE; return gqr; } static void GAM_QueryResult_pqueue_destroy_GAM_Result(gpointer data, gpointer user_data){ register GAM_StoredResult *gsr = data; GAM_StoredResult_destroy(gsr); return; } static void GAM_QueryResult_destroy(GAM_QueryResult *gqr){ PQueue_destroy(gqr->pq, GAM_QueryResult_pqueue_destroy_GAM_Result, NULL); g_free(gqr->query_id); g_free(gqr); return; } static void GAM_QueryResult_push(GAM_QueryResult *gqr, GAM_Result *gam_result, Alignment *alignment){ register GAM_StoredResult *gsr = GAM_StoredResult_create(gam_result->gam, gam_result->query, gam_result->target, alignment, gam_result->user_data, gam_result->self_data); PQueue_push(gqr->pq, gsr); return; } static void GAM_QueryResult_submit(GAM_QueryResult *gqr, GAM_Result *gam_result){ register GAM_StoredResult *gsr; register Alignment *alignment; register gint i; register GPtrArray *tie_list = g_ptr_array_new(); g_assert(!strcmp(gqr->query_id, gam_result->query->id)); for(i = 0; i < gam_result->alignment_list->len; i++){ alignment = gam_result->alignment_list->pdata[i]; if(alignment->score == gqr->tie_score){ GAM_QueryResult_push(gqr, gam_result, alignment); gqr->tie_count++; } else if(alignment->score < gqr->tie_score){ if(PQueue_total(gqr->pq) < gam_result->gam->gas->best_n){ GAM_QueryResult_push(gqr, gam_result, alignment); gqr->tie_count = 1; gqr->tie_score = alignment->score; } else { break; /* Other alignments are worse */ } } else { /* (alignment->score > gqr->tie_score) */ GAM_QueryResult_push(gqr, gam_result, alignment); if((PQueue_total(gqr->pq)-gqr->tie_count) >= gam_result->gam->gas->best_n){ /* Remove old ties */ while(gqr->tie_count){ gsr = PQueue_pop(gqr->pq); GAM_StoredResult_destroy(gsr); gqr->tie_count--; } /* Count new ties */ gsr = PQueue_top(gqr->pq); gqr->tie_score = gsr->score; do { gsr = PQueue_top(gqr->pq); if(gsr && (gsr->score == gqr->tie_score)){ gsr = PQueue_pop(gqr->pq); g_ptr_array_add(tie_list, gsr); } else { break; } } while(TRUE); gqr->tie_count = tie_list->len; /* Replace new ties */ while(tie_list->len){ gsr = tie_list->pdata[tie_list->len-1]; PQueue_push(gqr->pq, gsr); g_ptr_array_set_size(tie_list, tie_list->len-1); } } else { if(PQueue_total(gqr->pq) == 1){ /* First alignment */ gqr->tie_count = 1; gqr->tie_score = alignment->score; } } } } g_ptr_array_free(tie_list, TRUE); return; } /**/ static gboolean GAM_QueryResult_report_traverse_func(gpointer data, gpointer user_data){ register GAM_StoredResult *gsr = data; register GPtrArray *result_list = user_data; g_ptr_array_add(result_list, gsr); return FALSE; } static int GAM_QueryResult_report_sort_func(const void *a, const void *b){ register GAM_StoredResult **gsr_a = (GAM_StoredResult**)a, **gsr_b = (GAM_StoredResult**)b; return (*gsr_a)->score - (*gsr_b)->score; } static void GAM_QueryResult_report(GAM_QueryResult *gqr, GAM *gam){ register GPtrArray *result_list = g_ptr_array_new(); register gint i; register GAM_StoredResult *gsr; if(gam->verbosity > 2) g_message("Reporting [%d/%d] results for query [%s]", PQueue_total(gqr->pq), gam->gas->best_n, gqr->query_id); PQueue_traverse(gqr->pq, GAM_QueryResult_report_traverse_func, result_list); qsort(result_list->pdata, result_list->len, sizeof(gpointer), GAM_QueryResult_report_sort_func); for(i = result_list->len-1; i >= 0; i--){ gsr = result_list->pdata[i]; GAM_StoredResult_display(gsr, gam, result_list->len-i); } g_ptr_array_free(result_list, TRUE); return; } /**/ #define Combined_Alphabet_Type(qt,tt) ((qt) << 8 | (tt)) static FILE *GAM_get_tmp_file_handler(void){ register gchar *path; register FILE *fp; gchar hostname[1024]; gethostname(hostname, 1024); path = g_strdup_printf("%s%cxnr8_%s_%d_%d.txt", g_get_tmp_dir(), G_DIR_SEPARATOR, hostname, (gint)getppid(), (gint)getpid()); fp = fopen(path, "w+"); if(!fp) g_error("Could not write to tmp bestn file [%s]", path); unlink(path); /* tmp file deleted now, as will not be re-opened */ g_free(path); return fp; } static GPtrArray *GAM_build_match_list(C4_Model *model){ register GPtrArray *match_list = g_ptr_array_new(); Match *match_array[Match_Type_TOTAL] = {0}; register GPtrArray *match_transition_list = C4_Model_select_transitions(model, C4_Label_MATCH); register gint i; register C4_Transition *transition; register Match *match; for(i = 0; i < match_transition_list->len; i++){ transition = match_transition_list->pdata[i]; g_assert(transition->label == C4_Label_MATCH); g_assert(transition->label_data); match = transition->label_data; if(!match_array[match->type]){ match_array[match->type] = match; g_ptr_array_add(match_list, match); } } g_assert(match_list->len); g_ptr_array_free(match_transition_list, TRUE); return match_list; } GAM *GAM_create(Alphabet_Type query_type, Alphabet_Type target_type, Submat *dna_submat, Submat *protein_submat, Translate *translate, gboolean use_exhaustive, gint verbosity){ register GAM *gam = g_new0(GAM, 1); register gint i; register C4_Span *span; gam->thread_ref = ThreadRef_create(); gam->dna_submat = Submat_share(dna_submat); gam->protein_submat = Submat_share(protein_submat); gam->translate = Translate_share(translate); gam->gas = GAM_ArgumentSet_create(NULL); if(use_exhaustive && gam->gas->use_subopt) g_warning("Exhaustively generating suboptimal alignments" " will be VERY SLOW: use -S no"); gam->query_type = query_type; gam->target_type = target_type; if(gam->gas->best_n){ gam->bestn_tree = g_tree_new(GAM_compare_id); gam->bestn_tmp_file = GAM_get_tmp_file_handler(); gam->pqueue_set = PQueueSet_create(); } if(gam->gas->percent_threshold) gam->percent_threshold_tree = g_tree_new(GAM_compare_id); gam->translate_both = Model_Type_translate_both(gam->gas->type); gam->dual_match = Model_Type_has_dual_match(gam->gas->type); gam->model = Model_Type_get_model(gam->gas->type, query_type, target_type); if((!use_exhaustive) && (!C4_Model_is_local(gam->model))) g_error("Cannot perform heuristic alignments using non-local models: use -E"); gam->match_list = GAM_build_match_list(gam->model); if(use_exhaustive){ gam->optimal = Optimal_create(gam->model, NULL, Optimal_Type_SCORE |Optimal_Type_PATH |Optimal_Type_REDUCED_SPACE, TRUE); if(gam->gas->refinement != GAM_Refinement_NONE) g_error("Exhaustive alignments cannot be refined"); } else { if(gam->gas->refinement != GAM_Refinement_NONE){ gam->optimal = Optimal_create(gam->model, NULL, Optimal_Type_SCORE |Optimal_Type_PATH |Optimal_Type_REDUCED_SPACE, TRUE); } if(Model_Type_is_gapped(gam->gas->type)){ if(gam->gas->use_gapped_extension){ gam->sdp = SDP_create(gam->model); } else { gam->heuristic = Heuristic_create(gam->model); } } } gam->verbosity = verbosity; /* Find max_{query,target}_span */ gam->max_query_span = gam->max_target_span = 0; for(i = 0; i < gam->model->span_list->len; i++){ span = gam->model->span_list->pdata[i]; if(gam->max_query_span < span->max_query) gam->max_query_span = span->max_query; if(gam->max_target_span < span->max_target) gam->max_target_span = span->max_target; } #ifdef USE_PTHREADS pthread_mutex_init(&gam->gam_lock, NULL); #endif /* USE_PTHREADS */ return gam; } GAM *GAM_share(GAM *gam){ g_assert(gam); ThreadRef_share(gam->thread_ref); return gam; } /**/ static GAM_QueryInfo *GAM_QueryInfo_create(Sequence *query, GAM *gam){ register GAM_QueryInfo *gqi = g_new(GAM_QueryInfo, 1); register gint i, j; register Match *match; register Match_Score th; gqi->query_id = g_strdup(query->id); gqi->threshold = 0; /* Calculate best threshold for each query match */ for(i = 0; i < gam->match_list->len; i++){ match = gam->match_list->pdata[i]; th = 0; for(j = 0; j < query->len; j += match->query->advance) th += match->query->self_func(match->query, query, j); if(gqi->threshold < th) gqi->threshold = th; } gqi->threshold *= gam->gas->percent_threshold; gqi->threshold /= 100; if(gqi->threshold < gam->gas->threshold) gqi->threshold = gam->gas->threshold; return gqi; } static void GAM_QueryInfo_destroy(GAM_QueryInfo *gqi){ g_free(gqi->query_id); g_free(gqi); return; } /**/ static gint GAM_tree_collect_values(gpointer key, gpointer value, gpointer data){ register GPtrArray *list = data; g_ptr_array_add(list, value); return FALSE; } static void GAM_bestn_tree_destroy(GTree *bestn_tree){ register GPtrArray *query_data_list = g_ptr_array_new(); register gint i; g_tree_traverse(bestn_tree, GAM_tree_collect_values, G_IN_ORDER, query_data_list); g_tree_destroy(bestn_tree); for(i = 0; i < query_data_list->len; i++) GAM_QueryResult_destroy(query_data_list->pdata[i]); g_ptr_array_free(query_data_list, TRUE); return; } static void GAM_percent_threshold_tree_destroy( GTree *percent_threshold_tree){ register GPtrArray *query_info_list = g_ptr_array_new(); register gint i; g_tree_traverse(percent_threshold_tree, GAM_tree_collect_values, G_IN_ORDER, query_info_list); g_tree_destroy(percent_threshold_tree); for(i = 0; i < query_info_list->len; i++) GAM_QueryInfo_destroy(query_info_list->pdata[i]); g_ptr_array_free(query_info_list, TRUE); return; } void GAM_destroy(GAM *gam){ g_assert(gam); if(ThreadRef_destroy(gam->thread_ref)) return; #ifdef USE_PTHREADS pthread_mutex_destroy(&gam->gam_lock); #endif /* USE_PTHREADS */ g_assert(gam->model); g_ptr_array_free(gam->match_list, TRUE); C4_Model_destroy(gam->model); if(gam->optimal) Optimal_destroy(gam->optimal); if(gam->heuristic) Heuristic_destroy(gam->heuristic); if(gam->sdp) SDP_destroy(gam->sdp); if(gam->translate) Translate_destroy(gam->translate); if(gam->bestn_tree) GAM_bestn_tree_destroy(gam->bestn_tree); if(gam->percent_threshold_tree) GAM_percent_threshold_tree_destroy(gam->percent_threshold_tree); if(gam->bestn_tmp_file) fclose(gam->bestn_tmp_file); if(gam->pqueue_set) PQueueSet_destroy(gam->pqueue_set); g_free(gam); return; } static gint GAM_bestn_tree_report_traverse(gpointer key, gpointer value, gpointer data){ register GAM_QueryResult *gqr = value; register GAM *gam = data; GAM_QueryResult_report(gqr, gam); return FALSE; } void GAM_report(GAM *gam){ if(gam->bestn_tree) g_tree_traverse(gam->bestn_tree, GAM_bestn_tree_report_traverse, G_IN_ORDER, gam); return; } /**/ static C4_Portal *GAM_Pair_find_portal(C4_Model *model, HSPset *hspset){ register gint i; register C4_Portal *portal = NULL; register C4_Transition *transition; for(i = 0; i < model->portal_list->len; i++){ g_assert(model->portal_list); portal = model->portal_list->pdata[i]; g_assert(portal->transition_list->len); transition = portal->transition_list->pdata[0]; g_assert(transition); if((transition->advance_query == hspset->param->match->query->advance) && (transition->advance_target == hspset->param->match->target->advance)){ return portal; } } if(!portal) g_error("No compatible portal found for hspset"); return portal; } static GAM_Result *GAM_Result_create(GAM *gam, Sequence *query, Sequence *target){ register GAM_Result *gam_result = g_new(GAM_Result, 1); g_assert(gam); g_assert(query); g_assert(target); g_assert(query->alphabet->type == gam->query_type); g_assert(target->alphabet->type == gam->target_type); gam_result->ref_count = 1; gam_result->gam = GAM_share(gam); gam_result->alignment_list = NULL; gam_result->query = Sequence_share(query); gam_result->target = Sequence_share(target); gam_result->user_data = Model_Type_create_data(gam->gas->type, query, target); gam_result->self_data = Model_Type_create_data(gam->gas->type, query, query); gam_result->subopt = SubOpt_create(query->len, target->len); return gam_result; } static Alignment *GAM_Result_refine_alignment(GAM_Result *gam_result, Alignment *alignment){ register Alignment *refined_alignment = NULL; register Region *region; register gint query_region_start, target_region_start; g_assert(gam_result->gam->optimal); if(gam_result->gam->verbosity > 1) g_message("Refining alignment ... (%d)", alignment->score); switch(gam_result->gam->gas->refinement){ case GAM_Refinement_FULL: region = Region_create(0, 0, gam_result->query->len, gam_result->target->len); refined_alignment = Optimal_find_path( gam_result->gam->optimal, region, gam_result->user_data, alignment->score, gam_result->subopt); g_assert(refined_alignment); Region_destroy(region); break; case GAM_Refinement_REGION: query_region_start = MAX(0, alignment->region->query_start - gam_result->gam->gas->refinement_boundary); target_region_start = MAX(0, alignment->region->target_start - gam_result->gam->gas->refinement_boundary); region = Region_create(query_region_start, target_region_start, MIN(gam_result->query->len, Region_query_end(alignment->region) + gam_result->gam->gas->refinement_boundary) - query_region_start, MIN(gam_result->target->len, Region_target_end(alignment->region) + gam_result->gam->gas->refinement_boundary) - target_region_start); refined_alignment = Optimal_find_path( gam_result->gam->optimal, region, gam_result->user_data, alignment->score, gam_result->subopt); g_assert(refined_alignment); Region_destroy(region); break; default: g_error("Bad type for refinement [%s]", GAM_Refinement_to_string(gam_result->gam->gas->refinement)); break; } g_assert(refined_alignment); g_assert(refined_alignment->score >= alignment->score); if(gam_result->gam->verbosity > 1) g_message("Refined alignment score [%d]", refined_alignment->score); return refined_alignment; } static void GAM_Result_add_alignment(GAM_Result *gam_result, Alignment *alignment, C4_Score threshold){ register Alignment *refined_alignment; if(!gam_result->alignment_list) gam_result->alignment_list = g_ptr_array_new(); if(gam_result->gam->gas->refinement != GAM_Refinement_NONE){ refined_alignment = GAM_Result_refine_alignment(gam_result, alignment); Alignment_destroy(alignment); alignment = refined_alignment; /* Refinement may put alignment below threshold */ if(alignment->score < threshold){ Alignment_destroy(alignment); return; } } g_ptr_array_add(gam_result->alignment_list, alignment); SubOpt_add_alignment(gam_result->subopt, alignment); return; } static C4_Score GAM_get_query_threshold(GAM *gam, Sequence *query){ register GAM_QueryResult *gqr; register GAM_StoredResult *gsr; register GAM_QueryInfo *gqi; if(gam->gas->best_n){ gqr = g_tree_lookup(gam->bestn_tree, query->id); if(gqr && (PQueue_total(gqr->pq) >= gam->gas->best_n)){ gsr = PQueue_top(gqr->pq); if(gam->verbosity > 2) g_message("Using threshold [%d] for query [%s]", gsr->score, query->id); return gsr->score; } } if(gam->gas->percent_threshold){ gqi = g_tree_lookup(gam->percent_threshold_tree, query->id); if(!gqi){ gqi = GAM_QueryInfo_create(query, gam); g_tree_insert(gam->percent_threshold_tree, gqi->query_id, gqi); } return gqi->threshold; } return gam->gas->threshold; } static int GAM_Result_ungapped_create_sort_func(const void *a, const void *b){ register Alignment **alignment_a = (Alignment**)a, **alignment_b = (Alignment**)b; return (*alignment_b)->score - (*alignment_a)->score; } static void GAM_Result_ungapped_add_HSPset(GAM_Result *gam_result, C4_Model *model, HSPset *hspset){ register gint i; register C4_Score threshold; register Alignment *alignment; register HSP *hsp; HSPset_filter_ungapped(hspset); for(i = 0; i < hspset->hsp_list->len; i++){ hsp = hspset->hsp_list->pdata[i]; threshold = GAM_get_query_threshold(gam_result->gam, hspset->query); if(hsp->score >= threshold){ alignment = Ungapped_Alignment_create(model, gam_result->user_data, hsp); g_assert(alignment); GAM_Result_add_alignment(gam_result, alignment, threshold); } } return; } GAM_Result *GAM_Result_ungapped_create(GAM *gam, Comparison *comparison){ register GAM_Result *gam_result; g_assert(comparison); if(!Comparison_has_hsps(comparison)) return NULL; gam_result = GAM_Result_create(gam, comparison->query, comparison->target); /**/ if(comparison->dna_hspset) GAM_Result_ungapped_add_HSPset(gam_result, gam->model, comparison->dna_hspset); if(comparison->protein_hspset) GAM_Result_ungapped_add_HSPset(gam_result, gam->model, comparison->protein_hspset); if(comparison->codon_hspset) GAM_Result_ungapped_add_HSPset(gam_result, gam->model, comparison->codon_hspset); /**/ if(!gam_result->alignment_list){ GAM_Result_destroy(gam_result); return NULL; } /* Sort results, best scores first */ qsort(gam_result->alignment_list->pdata, gam_result->alignment_list->len, sizeof(gpointer), GAM_Result_ungapped_create_sort_func); return gam_result; } /**/ static void GAM_Result_BSDP_add_HSPset(HSPset *hspset, GAM *gam, HPair *hpair){ register C4_Portal *portal; portal = GAM_Pair_find_portal(gam->model, hspset); HPair_add_hspset(hpair, portal, hspset); if(gam->verbosity > 2) g_message("Added [%d] hsps to HPair", hspset->hsp_list->len); return; } static gboolean GAM_Result_is_full(GAM_Result *gam_result){ register Alignment *penult, *last; if((gam_result->gam->gas->best_n) && (gam_result->alignment_list->len >= gam_result->gam->gas->best_n) && (gam_result->alignment_list->len > 1)){ penult = gam_result->alignment_list->pdata [gam_result->alignment_list->len-2]; last = gam_result->alignment_list->pdata [gam_result->alignment_list->len-1]; if(penult->score != last->score) return TRUE; } return FALSE; } /* GAM_Result is only full when best_n threshold has been met * and all tie-breakers have been admitted into the GAM_Result */ static GAM_Result *GAM_Result_BSDP_create(GAM *gam, Comparison *comparison){ register HPair *hpair; register Alignment *alignment; register C4_Score threshold = GAM_get_query_threshold(gam, comparison->query); register GAM_Result *gam_result = GAM_Result_create(gam, comparison->query, comparison->target); g_assert(gam->heuristic); if(gam->verbosity > 2) g_message("Preparing HPair for [%s][%s]", comparison->query->id, comparison->target->id); hpair = HPair_create(gam->heuristic, gam_result->subopt, comparison->query->len, comparison->target->len, gam->verbosity, gam_result->user_data); /**/ g_assert(comparison); g_assert(Comparison_has_hsps(comparison)); if(comparison->dna_hspset) GAM_Result_BSDP_add_HSPset(comparison->dna_hspset, gam, hpair); if(comparison->protein_hspset) GAM_Result_BSDP_add_HSPset(comparison->protein_hspset, gam, hpair); if(comparison->codon_hspset) GAM_Result_BSDP_add_HSPset(comparison->codon_hspset, gam, hpair); HPair_finalise(hpair, threshold); if(gam->verbosity > 2) g_message("Finalised HPair"); do { threshold = GAM_get_query_threshold(gam, comparison->query); alignment = HPair_next_path(hpair, threshold); if(!alignment) break; g_assert(alignment->score >= threshold); if(gam->verbosity > 2) g_message("Found BSDP alignment score [%d]", alignment->score); GAM_Result_add_alignment(gam_result, alignment, threshold); if(GAM_Result_is_full(gam_result)) break; } while(gam->gas->use_subopt); HPair_destroy(hpair); if(!gam_result->alignment_list){ /* No alignments */ GAM_Result_destroy(gam_result); return NULL; } return gam_result; } /* FIXME: optimisation: change to update threshold between * suboptimal alignments. */ static GAM_Result *GAM_Result_SDP_create(GAM *gam, Comparison *comparison){ register SDP_Pair *sdp_pair; register Alignment *alignment; register C4_Score threshold; register GAM_Result *gam_result = GAM_Result_create(gam, comparison->query, comparison->target); if(gam->verbosity > 2) g_message("Preparing SDP_Pair for [%s][%s]", comparison->query->id, comparison->target->id); g_assert(gam); g_assert(gam->sdp); g_assert(comparison); g_assert(Comparison_has_hsps(comparison)); sdp_pair = SDP_Pair_create(gam->sdp, gam_result->subopt, comparison, gam_result->user_data); do { threshold = GAM_get_query_threshold(gam, comparison->query); alignment = SDP_Pair_next_path(sdp_pair, threshold); if(!alignment) break; g_assert(alignment->score >= threshold); if(gam->verbosity > 2) g_message("Found SDP alignment score [%d]", alignment->score); GAM_Result_add_alignment(gam_result, alignment, threshold); if(GAM_Result_is_full(gam_result)) break; } while(gam->gas->use_subopt); SDP_Pair_destroy(sdp_pair); if(!gam_result->alignment_list){ /* No alignments */ GAM_Result_destroy(gam_result); return NULL; } return gam_result; } /**/ typedef struct { gboolean *fwd_keep; gboolean *rev_keep; GPtrArray *hsp_list; RangeTree *rangetree; HSP *max_cobs_hsp; GAM *gam; } GAM_Geneseed_Data; static void GAM_Result_geneseed_add(HSPset *hspset, GAM_Geneseed_Data *gsd){ register gint i; register HSP *hsp; if(!hspset) return; for(i = 0; i < hspset->hsp_list->len; i++){ hsp = hspset->hsp_list->pdata[i]; if(!RangeTree_check_pos(gsd->rangetree, HSP_query_cobs(hsp), HSP_target_cobs(hsp))) RangeTree_add(gsd->rangetree, HSP_query_cobs(hsp), HSP_target_cobs(hsp), GINT_TO_POINTER(gsd->hsp_list->len)); g_ptr_array_add(gsd->hsp_list, hsp); if((!gsd->max_cobs_hsp) || (gsd->max_cobs_hsp->cobs < hsp->cobs)) gsd->max_cobs_hsp = hsp; } return; } static void GAM_Result_geneseed_visit_hsp_fwd(GAM_Geneseed_Data *gsd, gint hsp_id); static void GAM_Result_geneseed_visit_hsp_rev(GAM_Geneseed_Data *gsd, gint hsp_id); static gboolean GAM_Result_geneseed_report_fwd_func(gint x, gint y, gpointer info, gpointer user_data){ register GAM_Geneseed_Data *gsd = user_data; register gint hsp_id = GPOINTER_TO_INT(info); GAM_Result_geneseed_visit_hsp_fwd(gsd, hsp_id); return FALSE; } static gboolean GAM_Result_geneseed_report_rev_func(gint x, gint y, gpointer info, gpointer user_data){ register GAM_Geneseed_Data *gsd = user_data; register gint hsp_id = GPOINTER_TO_INT(info); GAM_Result_geneseed_visit_hsp_rev(gsd, hsp_id); return FALSE; } static void GAM_Result_geneseed_visit_hsp_fwd(GAM_Geneseed_Data *gsd, gint hsp_id){ register HSP *hsp; register gint query_range, target_range; g_assert(hsp_id < gsd->hsp_list->len); if(!gsd->fwd_keep[hsp_id]){ gsd->fwd_keep[hsp_id] = TRUE; hsp = gsd->hsp_list->pdata[hsp_id]; query_range = gsd->gam->max_query_span + (((HSP_query_end(hsp) - HSP_query_cobs(hsp)) + (HSP_query_cobs(gsd->max_cobs_hsp) - gsd->max_cobs_hsp->query_start)) * 2); target_range = gsd->gam->max_target_span + (((HSP_target_end(hsp) - HSP_target_cobs(hsp)) + (HSP_target_cobs(gsd->max_cobs_hsp) - gsd->max_cobs_hsp->target_start)) * 2); /* Search forwards */ /* g_message("find fwd (%d,%d,%d) (%d,%d,%d) [%d,%d]", (HSP_query_end(hsp) - HSP_query_cobs(hsp)), gsd->gam->max_query_span, (HSP_query_cobs(gsd->max_cobs_hsp) - gsd->max_cobs_hsp->query_start), (HSP_target_end(hsp) - HSP_target_cobs(hsp)), gsd->gam->max_target_span, (HSP_target_cobs(gsd->max_cobs_hsp) - gsd->max_cobs_hsp->target_start), query_range, target_range); g_message("RangeTree_find fwd"); */ RangeTree_find(gsd->rangetree, HSP_query_cobs(hsp), query_range, HSP_target_cobs(hsp), target_range, GAM_Result_geneseed_report_fwd_func, gsd); } return; } static void GAM_Result_geneseed_visit_hsp_rev(GAM_Geneseed_Data *gsd, gint hsp_id){ register HSP *hsp; register gint query_range, target_range; g_assert(hsp_id < gsd->hsp_list->len); if(!gsd->rev_keep[hsp_id]){ gsd->rev_keep[hsp_id] = TRUE; hsp = gsd->hsp_list->pdata[hsp_id]; query_range = gsd->gam->max_query_span + (((HSP_query_end(hsp) - HSP_query_cobs(hsp)) + (HSP_query_cobs(gsd->max_cobs_hsp) - gsd->max_cobs_hsp->query_start)) * 2); target_range = gsd->gam->max_target_span + (((HSP_target_end(hsp) - HSP_target_cobs(hsp)) + (HSP_target_cobs(gsd->max_cobs_hsp) - gsd->max_cobs_hsp->target_start)) * 2); /* g_message("find rev (%d,%d,%d) (%d,%d,%d) [%d,%d]", (HSP_query_end(hsp) - HSP_query_cobs(hsp)), gsd->gam->max_query_span, (HSP_query_cobs(gsd->max_cobs_hsp) - gsd->max_cobs_hsp->query_start), (HSP_target_end(hsp) - HSP_target_cobs(hsp)), gsd->gam->max_target_span, (HSP_target_cobs(gsd->max_cobs_hsp) - gsd->max_cobs_hsp->target_start), query_range, target_range); g_message("RangeTree_find rev"); */ /* Search backwards */ RangeTree_find(gsd->rangetree, HSP_query_cobs(hsp)-query_range, query_range, HSP_target_cobs(hsp)-target_range, target_range, GAM_Result_geneseed_report_rev_func, gsd); } return; } static gint GAM_Result_geneseed_select(HSPset *hspset, gboolean *fwd_keep, gboolean *rev_keep, gint hsp_id){ register gint i; register GPtrArray *keep_list = g_ptr_array_new(); register HSP *hsp; if(!hspset) return hsp_id; keep_list = g_ptr_array_new(); for(i = 0; i < hspset->hsp_list->len; i++){ hsp = hspset->hsp_list->pdata[i]; if(fwd_keep[hsp_id] || rev_keep[hsp_id]){ g_ptr_array_add(keep_list, hsp); } else { HSP_destroy(hsp); } hsp_id++; } /* g_message("geneseed selected [%d] from [%d]", keep_list->len, hspset->hsp_list->len); */ g_ptr_array_free(hspset->hsp_list, TRUE); hspset->hsp_list = keep_list; return hsp_id; } static void GAM_Result_geneseed_filter(GAM *gam, Comparison *comparison){ register gint i, hsp_id = 0; register HSP *hsp; GAM_Geneseed_Data gsd; gsd.hsp_list = g_ptr_array_new(); gsd.rangetree = RangeTree_create(); gsd.max_cobs_hsp = NULL; gsd.gam = gam; /* Build the rangetree */ GAM_Result_geneseed_add(comparison->dna_hspset, &gsd); GAM_Result_geneseed_add(comparison->protein_hspset, &gsd); GAM_Result_geneseed_add(comparison->codon_hspset, &gsd); g_assert(gsd.hsp_list->len); g_assert(gsd.max_cobs_hsp); gsd.fwd_keep = g_new0(gboolean, gsd.hsp_list->len); gsd.rev_keep = g_new0(gboolean, gsd.hsp_list->len); /* Find HSPs reachable from geneseed HSPs */ for(i = 0; i < gsd.hsp_list->len; i++){ hsp = gsd.hsp_list->pdata[i]; if(hsp->score >= hsp->hsp_set->param->has->geneseed_threshold){ GAM_Result_geneseed_visit_hsp_fwd(&gsd, i); GAM_Result_geneseed_visit_hsp_rev(&gsd, i); } } /* Keep HSPs marked as keep */ hsp_id = GAM_Result_geneseed_select(comparison->dna_hspset, gsd.fwd_keep, gsd.rev_keep, hsp_id); hsp_id = GAM_Result_geneseed_select(comparison->protein_hspset, gsd.fwd_keep, gsd.rev_keep, hsp_id); hsp_id = GAM_Result_geneseed_select(comparison->codon_hspset, gsd.fwd_keep, gsd.rev_keep, hsp_id); /**/ if(comparison->dna_hspset && (!comparison->dna_hspset->hsp_list->len)){ HSPset_destroy(comparison->dna_hspset); comparison->dna_hspset = NULL; } if(comparison->protein_hspset && (!comparison->protein_hspset->hsp_list->len)){ HSPset_destroy(comparison->protein_hspset); comparison->protein_hspset = NULL; } if(comparison->codon_hspset && (!comparison->codon_hspset->hsp_list->len)){ HSPset_destroy(comparison->codon_hspset); comparison->codon_hspset = NULL; } /**/ /**/ /* g_message("start with [%d] FILTER down to [%d]", gsd.hsp_list->len, (comparison->dna_hspset?comparison->dna_hspset->hsp_list->len:0) +(comparison->protein_hspset?comparison->protein_hspset->hsp_list->len:0) +(comparison->codon_hspset?comparison->codon_hspset->hsp_list->len:0)); */ RangeTree_destroy(gsd.rangetree, NULL, NULL); g_free(gsd.fwd_keep); g_free(gsd.rev_keep); g_ptr_array_free(gsd.hsp_list, TRUE); return; } GAM_Result *GAM_Result_heuristic_create(GAM *gam, Comparison *comparison){ register GAM_Result *gam_result; register HSPset_ArgumentSet *has = Comparison_Param_get_HSPSet_Argument_Set(comparison->param); if(has->geneseed_threshold){ /* Raise score threshold to geneseed * to prevent low-scoring subopt alignments. */ GAM_lock(gam); if(gam->gas->threshold < has->geneseed_threshold) gam->gas->threshold = has->geneseed_threshold; GAM_unlock(gam); GAM_Result_geneseed_filter(gam, comparison); } if(!Comparison_has_hsps(comparison)) return NULL; /* g_message("heuristic create with [%d,%d,%d]", comparison->dna_hspset?comparison->dna_hspset->hsp_list->len:0, comparison->protein_hspset?comparison->protein_hspset->hsp_list->len:0, comparison->codon_hspset?comparison->codon_hspset->hsp_list->len:0); Comparison_print(comparison); */ if(gam->gas->use_gapped_extension) gam_result = GAM_Result_SDP_create(gam, comparison); else gam_result = GAM_Result_BSDP_create(gam, comparison); return gam_result; } /**/ GAM_Result *GAM_Result_exhaustive_create(GAM *gam, Sequence *query, Sequence *target){ register Alignment *alignment; register C4_Score threshold; register GAM_Result *gam_result; register OPair *opair; g_assert(gam->optimal); GAM_lock(gam); Sequence_lock(query); Sequence_lock(target); gam_result = GAM_Result_create(gam, query, target); Sequence_unlock(query); Sequence_unlock(target); opair = OPair_create(gam->optimal, gam_result->subopt, query->len, target->len, gam_result->user_data); GAM_unlock(gam); if(gam->verbosity > 1) g_message("Exhaustive alignment of [%s] [%s]", query->id, target->id); /**/ do { GAM_lock(gam); threshold = GAM_get_query_threshold(gam, query); GAM_unlock(gam); alignment = OPair_next_path(opair, threshold); if(!alignment) break; g_assert(alignment->score >= threshold); GAM_Result_add_alignment(gam_result, alignment, threshold); if(gam->verbosity > 2) g_message("Found alignment number [%d] score [%d]", gam_result->alignment_list->len, alignment->score); } while(gam->gas->use_subopt); OPair_destroy(opair); if(!gam_result->alignment_list){ /* No alignments */ GAM_Result_destroy(gam_result); return NULL; } return gam_result; } GAM_Result *GAM_Result_share(GAM_Result *gam_result){ gam_result->ref_count++; return gam_result; } void GAM_Result_destroy(GAM_Result *gam_result){ register gint i; if(--gam_result->ref_count) return; if(gam_result->alignment_list){ for(i = 0; i < gam_result->alignment_list->len; i++) Alignment_destroy(gam_result->alignment_list->pdata[i]); g_ptr_array_free(gam_result->alignment_list, TRUE); } Model_Type_destroy_data(gam_result->gam->gas->type, gam_result->user_data); Model_Type_destroy_data(gam_result->gam->gas->type, gam_result->self_data); Sequence_destroy(gam_result->query); Sequence_destroy(gam_result->target); GAM_lock(gam_result->gam); GAM_destroy(gam_result->gam); SubOpt_destroy(gam_result->subopt); GAM_unlock(gam_result->gam); g_free(gam_result); return; } static void GAM_display_alignment(GAM *gam, Alignment *alignment, Sequence *query, Sequence *target, gint result_id, gint rank, gpointer user_data, gpointer self_data, FILE *fp){ if(gam->gas->show_alignment) Alignment_display(alignment, query, target, gam->dna_submat, gam->protein_submat, gam->translate, fp); if(gam->gas->show_sugar) Alignment_display_sugar(alignment, query, target, fp); if(gam->gas->show_cigar) Alignment_display_cigar(alignment, query, target, fp); if(gam->gas->show_vulgar) Alignment_display_vulgar(alignment, query, target, fp); if(gam->gas->show_query_gff) Alignment_display_gff(alignment, query, target, gam->translate, TRUE, FALSE, result_id, user_data, fp); if(gam->gas->show_target_gff) Alignment_display_gff(alignment, query, target, gam->translate, FALSE, Model_Type_has_genomic_target(gam->gas->type), result_id, user_data, fp); if(gam->gas->ryo) Alignment_display_ryo(alignment, query, target, gam->gas->ryo, gam->translate, rank, user_data, self_data, fp); fflush(fp); return; } static void GAM_Result_display(GAM_Result *gam_result){ register gint i; register Alignment *alignment; g_assert(gam_result); for(i = 0; i < gam_result->alignment_list->len; i++){ alignment = gam_result->alignment_list->pdata[i]; GAM_display_alignment(gam_result->gam, alignment, gam_result->query, gam_result->target, i+1, 0, gam_result->user_data, gam_result->self_data, stdout); } return; } void GAM_Result_submit(GAM_Result *gam_result){ register GAM_QueryResult *gqr; g_assert(gam_result); GAM_lock(gam_result->gam); if(gam_result->gam->gas->best_n){ gqr = g_tree_lookup(gam_result->gam->bestn_tree, gam_result->query->id); if(!gqr){ gqr = GAM_QueryResult_create(gam_result->gam, gam_result->query->id); g_tree_insert(gam_result->gam->bestn_tree, gqr->query_id, gqr); } g_assert(gqr); g_assert(!strcmp(gqr->query_id, gam_result->query->id)); GAM_QueryResult_submit(gqr, gam_result); } else { GAM_Result_display(gam_result); } GAM_unlock(gam_result->gam); return; } void GAM_lock(GAM *gam){ #ifdef USE_PTHREADS pthread_mutex_lock(&gam->gam_lock); #endif /* USE_PTHREADS */ return; } void GAM_unlock(GAM *gam){ #ifdef USE_PTHREADS pthread_mutex_unlock(&gam->gam_lock); #endif /* USE_PTHREADS */ return; } /**/ exonerate-2.4.0/src/hub/analysis.test.c0000644000175000017500000000456211162714301014745 00000000000000/****************************************************************\ * * * Analysis module for exonerate * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "analysis.h" int Argument_main(Argument *arg){ register Analysis *analysis; register gchar *query_path, *target_path; register Alphabet_Type query_type, target_type; register GPtrArray *query_path_list = g_ptr_array_new(), *target_path_list = g_ptr_array_new(); if(arg->argc == 5){ query_path = arg->argv[1]; target_path = arg->argv[2]; query_type = (arg->argv[3][0] == 'd')?Alphabet_Type_DNA :Alphabet_Type_PROTEIN; target_type = (arg->argv[4][0] == 'd')?Alphabet_Type_DNA :Alphabet_Type_PROTEIN; g_ptr_array_add(query_path_list, query_path); g_ptr_array_add(target_path_list, target_path); g_message("Creating Analysis with [%s][%s]", query_path, target_path); analysis = Analysis_create(query_path_list, query_type, 0, 0, target_path_list, target_type, 0, 0, 3); Analysis_process(analysis); Analysis_destroy(analysis); g_ptr_array_free(query_path_list, TRUE); g_ptr_array_free(target_path_list, TRUE); } else { g_warning("Test [%s] does nothing without arguments", __FILE__); g_message("usage: \n" " (type==[d|p])\n"); } return 0; } exonerate-2.4.0/src/hub/analysis.c0000644000175000017500000015330511222647523013777 00000000000000/****************************************************************\ * * * Analysis module for exonerate * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "analysis.h" #include "ungapped.h" #include "compoundfile.h" #include "hspset.h" #include "lineparse.h" #include /* For atoi() */ #include /* For sleep() */ #include /* For strcmp() */ #include /* For isdigit() */ #include /* For stat() */ #include /* For stat() */ #include /* For stat() */ Analysis_ArgumentSet *Analysis_ArgumentSet_create(Argument *arg){ register ArgumentSet *as; static Analysis_ArgumentSet aas; if(arg){ as = ArgumentSet_create("Analysis Options"); ArgumentSet_add_option(as, 'E', "exhaustive", NULL, "Perform exhaustive alignment (slow)", "FALSE", Argument_parse_boolean, &aas.use_exhaustive); ArgumentSet_add_option(as, 'B', "bigseq", NULL, "Allow rapid comparison between big sequences", "FALSE", Argument_parse_boolean, &aas.use_bigseq); ArgumentSet_add_option(as, 'r', "revcomp", NULL, "Also search reverse complement of query and target", "TRUE", Argument_parse_boolean, &aas.use_revcomp); ArgumentSet_add_option(as, '\0', "forcescan", "[q|t]", "Force FSM scan on query or target sequences", "none", Argument_parse_string, &aas.force_scan); /**/ ArgumentSet_add_option(as, 0, "saturatethreshold", "int", "Word saturation threshold", "0", Argument_parse_int, &aas.saturate_threshold); /**/ ArgumentSet_add_option(as, 0, "customserver", "command", "Custom command to send non-standard server", "NULL", Argument_parse_string, &aas.custom_server_command); ArgumentSet_add_option(as, 'c', "cores", "number", "Number of cores/CPUs/threads for alignment computation", "1", Argument_parse_int, &aas.thread_count); Argument_absorb_ArgumentSet(arg, as); } return &aas; } /**/ typedef struct { GAM *gam; Comparison *comparison; } Analysis_HeuristicJob; static Analysis_HeuristicJob *Analysis_HeuristicJob_create(GAM *gam, Comparison *comparison){ register Analysis_HeuristicJob *ahj = g_new(Analysis_HeuristicJob, 1); ahj->gam = GAM_share(gam); ahj->comparison = Comparison_share(comparison); return ahj; } static void Analysis_HeuristicJob_destroy(Analysis_HeuristicJob *ahj){ Comparison_destroy(ahj->comparison); GAM_destroy(ahj->gam); g_free(ahj); return; } static void Analysis_HeuristicJob_run(gpointer data){ register Analysis_HeuristicJob *ahj = data; register GAM_Result *gam_result = GAM_Result_heuristic_create(ahj->gam, ahj->comparison); if(gam_result){ GAM_Result_submit(gam_result); GAM_Result_destroy(gam_result); } Analysis_HeuristicJob_destroy(ahj); return; } static void Analysis_report_func(Comparison *comparison, gpointer user_data){ register Analysis *analysis = user_data; register GAM_Result *gam_result; register Analysis_HeuristicJob *ahj; g_assert(Comparison_has_hsps(comparison)); if(analysis->scan_query){ /* Swap back query and target after a query scan */ Comparison_swap(comparison); } else { /* Revert target scan alignments to +ve query strand when possible */ if((comparison->query->alphabet->type == Alphabet_Type_DNA) && (comparison->target->alphabet->type == Alphabet_Type_DNA) && (comparison->query->strand == Sequence_Strand_REVCOMP) && (comparison->target->strand == Sequence_Strand_FORWARD) && (!analysis->gam->translate_both)) Comparison_revcomp(comparison); } if(Model_Type_is_gapped(analysis->gam->gas->type)){ ahj = Analysis_HeuristicJob_create(analysis->gam, comparison); #ifdef USE_PTHREADS g_assert(analysis->job_queue); /* FIXME: need to use priority here */ JobQueue_submit(analysis->job_queue, Analysis_HeuristicJob_run, ahj, 2); #else /* USE_PTHREADS */ Analysis_HeuristicJob_run(ahj); #endif /* USE_PTHREADS */ } else { gam_result = GAM_Result_ungapped_create(analysis->gam, comparison); if(gam_result){ GAM_Result_submit(gam_result); GAM_Result_destroy(gam_result); } } return; } /**/ static void Analysis_FastaPipe_Pair_init_func(gpointer user_data){ register Analysis *analysis = user_data; g_assert(!analysis->curr_query); return; } /* Called before query pipeline loading */ static void Analysis_FastaPipe_Pair_prep_func(gpointer user_data){ register Analysis *analysis = user_data; g_assert(analysis->curr_query); return; } /* Called after query pipeline loading */ static void Analysis_FastaPipe_Pair_term_func(gpointer user_data){ register Analysis *analysis = user_data; FastaDB_Seq_destroy(analysis->curr_query); analysis->curr_query = NULL; return; } /* Called after query pipeline analysis */ static gboolean Analysis_FastaPipe_Pair_query_func(FastaDB_Seq *fdbs, gpointer user_data){ register Analysis *analysis = user_data; g_assert(!analysis->curr_query); /* if(analysis->aas->use_exhaustive) g_strup(fdbs->seq->seq); */ if(analysis->verbosity > 1) g_message("Load query for pairwise comparision [%s] (%d)", fdbs->seq->id, fdbs->seq->len); analysis->curr_query = FastaDB_Seq_share(fdbs); return TRUE; /* take queries one at a time */ } /* Called on query loading */ static void Analysis_BSAM_compare(Analysis *analysis, FastaDB_Seq *query, FastaDB_Seq *target){ register Comparison *comparison = BSAM_compare(analysis->bsam, query->seq, target->seq); if(comparison){ if(Comparison_has_hsps(comparison)) Analysis_report_func(comparison, analysis); Comparison_destroy(comparison); } return; } /**/ typedef struct { GAM *gam; Sequence *query; Sequence *target; } Analysis_ExhaustiveJob; static Analysis_ExhaustiveJob *Analysis_ExhaustiveJob_create(GAM *gam, Sequence *query, Sequence *target){ register Analysis_ExhaustiveJob *aej = g_new(Analysis_ExhaustiveJob, 1); aej->gam = GAM_share(gam); aej->query = Sequence_share(query); aej->target = Sequence_share(target); return aej; } static void Analysis_ExhaustiveJob_destroy(Analysis_ExhaustiveJob *aej){ GAM_destroy(aej->gam); Sequence_destroy(aej->query); Sequence_destroy(aej->target); g_free(aej); return; } static void Analysis_ExhaustiveJob_run(gpointer data){ register Analysis_ExhaustiveJob *aej = data; register GAM_Result *gam_result; gam_result = GAM_Result_exhaustive_create(aej->gam, aej->query, aej->target); if(gam_result){ GAM_Result_submit(gam_result); GAM_Result_destroy(gam_result); } Analysis_ExhaustiveJob_destroy(aej); return; } static void Analysis_Pair_compare(Analysis *analysis, FastaDB_Seq *fdbs){ register Analysis_ExhaustiveJob *aej; if(analysis->aas->use_exhaustive){ aej = Analysis_ExhaustiveJob_create(analysis->gam, analysis->curr_query->seq, fdbs->seq); #ifdef USE_PTHREADS g_assert(analysis->job_queue); JobQueue_submit(analysis->job_queue, Analysis_ExhaustiveJob_run, aej, 1); #else /* USE_PTHREADS */ Analysis_ExhaustiveJob_run(aej); #endif /* USE_PTHREADS */ } else { Analysis_BSAM_compare(analysis, analysis->curr_query, fdbs); } return; } /**/ static gboolean Analysis_FastaPipe_Pair_target_func(FastaDB_Seq *fdbs, gpointer user_data){ register Analysis *analysis = user_data; g_assert(analysis->gam); g_assert(analysis->curr_query); /* if(analysis->aas->use_exhaustive) g_strup(fdbs->seq->seq); */ if(analysis->verbosity > 1) g_message("Load target for pairwise comparison [%s] (%d)", fdbs->seq->id, fdbs->seq->len); /**/ Analysis_Pair_compare(analysis, fdbs); return FALSE; } /* Called on target loading */ /**/ static void Analysis_FastaPipe_Seeder_init_func(gpointer user_data){ register Analysis *analysis = user_data; g_assert(!analysis->curr_seeder); register Comparison_Param *comparison_param = Comparison_Param_share(analysis->comparison_param); if(analysis->scan_query) comparison_param = Comparison_Param_swap(comparison_param); analysis->curr_seeder = Seeder_create(analysis->verbosity, comparison_param, analysis->aas->saturate_threshold, Analysis_report_func, analysis); Comparison_Param_destroy(comparison_param); return; } /* Called before query pipeline loading */ static void Analysis_FastaPipe_Seeder_prep_func(gpointer user_data){ /* does nothing */ return; } /* Called after query pipeline loading */ static void Analysis_FastaPipe_Seeder_term_func(gpointer user_data){ register Analysis *analysis = user_data; Seeder_destroy(analysis->curr_seeder); if(analysis->verbosity > 2){ g_message("### Seeder destroyed ###"); RecycleBin_profile(); } analysis->curr_seeder = NULL; return; } /* Called after query pipeline analysis */ static gboolean Analysis_FastaPipe_Seeder_query_func(FastaDB_Seq *fdbs, gpointer user_data){ register Analysis *analysis = user_data; if(analysis->verbosity > 1) g_message("Load query for Seeder [%s] (%d)", fdbs->seq->id, fdbs->seq->len); return Seeder_add_query(analysis->curr_seeder, fdbs->seq); } /* Called on query loading */ static gboolean Analysis_FastaPipe_Seeder_target_func(FastaDB_Seq *fdbs, gpointer user_data){ register Analysis *analysis = user_data; if(analysis->verbosity > 1) g_message("Load target for Seeder [%s] (%d)", fdbs->seq->id, fdbs->seq->len); Seeder_add_target(analysis->curr_seeder, fdbs->seq); return FALSE; } /* Called on target loading */ /**/ static gboolean Analysis_decide_scan_query(FastaDB *query_fdb, FastaDB *target_fdb, gchar *force_scan){ register CompoundFile_Pos query_size, target_size; if(!g_strcasecmp(force_scan, "none")){ query_size = CompoundFile_get_length(query_fdb->cf); target_size = CompoundFile_get_length(target_fdb->cf); if((query_size >> 4) < target_size) return FALSE; else return TRUE; } else if((!g_strcasecmp(force_scan, "query")) || (!g_strcasecmp(force_scan, "q"))){ return TRUE; } else if((!g_strcasecmp(force_scan, "target")) || (!g_strcasecmp(force_scan, "t"))){ return FALSE; } else { g_error("Unknown force_scan command [%s]", force_scan); } return FALSE; /* not reached */ } static void Analysis_find_matches(Analysis *analysis, Match **dna_match, Match **protein_match, Match **codon_match){ register GPtrArray *transition_list = C4_Model_select_transitions(analysis->gam->model, C4_Label_MATCH); register gint i; register C4_Transition *transition; register Match *match; for(i = 0; i < transition_list->len; i++){ transition = transition_list->pdata[i]; g_assert(transition); g_assert(transition->label == C4_Label_MATCH); match = transition->label_data; if(match){ switch(match->type){ case Match_Type_DNA2DNA: if((*dna_match) && ((*dna_match) != match)) g_error("Multiple DNA matches not implemented"); (*dna_match) = match; break; case Match_Type_PROTEIN2PROTEIN: case Match_Type_DNA2PROTEIN: case Match_Type_PROTEIN2DNA: if((*protein_match) && ((*protein_match) != match)) g_error("Multiple protein matches" " not supported"); (*protein_match) = match; break; case Match_Type_CODON2CODON: if((*codon_match) && ((*codon_match) != match)) g_error("Multiple codon matches" " not implemented"); (*codon_match) = match; break; default: break; } } } g_ptr_array_free(transition_list, TRUE); return; } /**/ static SocketClient *Analysis_Client_connect(gchar *path){ register SocketClient *sc; register gint i, divider = 0, port; register gchar *server; register gint connection_attempts = 10; for(i = 0; path[i]; i++) if(path[i] == ':'){ divider = i; break; } if(divider){ port = atoi(path+divider+1); server = g_strndup(path, divider); for(i = connection_attempts; i >= 0; i--){ sc = SocketClient_create(server, port); if(sc) break; g_warning("Failed connection to server (retrys left: %d", i); sleep(1); } g_free(server); return sc; } return NULL; } static gchar *Analysis_Client_send(Analysis_Client *aclient, gchar *msg, gchar *expect, gboolean multi_line_reply){ register gchar *reply = SocketClient_send(aclient->sc, msg); register gchar *line, *p = reply, *processed_reply; register gint line_count = 0; register GString *str = g_string_sized_new(64); if(aclient->verbosity >= 3){ g_print("Message: client sent message [%s]\n", msg); if((aclient->verbosity == 3) && (strlen(reply) >= 80)) g_print("Message: client received reply [%.*s] \n", 80, reply); else g_print("Message: client received reply [%s]\n", reply); } do { while(*p){ if(!isspace(*p)) /* Strip blank lines */ break; p++; } line = p; if(!strncmp(p, "error:", 6)) g_error("Error from server: [%s]", p); if(!strncmp(p, "warning:", 8)){ g_warning("Warning from server: [%s]", p); } else if(!strncmp(p, "linecount:", 8)){ while(*p){ /* Skip to end of line */ if(*p == '\n') break; p++; } } else if(!strncmp(p, expect, strlen(expect))){ while(*p){ /* Skip to end of line */ if(*p == '\n') break; p++; } line_count++; *p = '\0'; g_string_append(str, line); g_string_append_c(str, '\n'); *p = '\n'; } else { if(!*p) break; g_error("Unexpected line from server [%s]", p); } } while(TRUE); g_free(reply); if(!line_count) g_error("No reply received from server msg=[%s]", msg); if((!multi_line_reply) && (line_count > 1)) g_error("Unexpected multi-line reply from server"); processed_reply = str->str; g_string_free(str, FALSE); return processed_reply; } static void Analysis_Client_set_param(Analysis_Client *aclient, GAM *gam){ register HSPset_ArgumentSet *has = HSPset_ArgumentSet_create(NULL); register Analysis_ArgumentSet *aas = Analysis_ArgumentSet_create(NULL); register gchar *msg, *reply; /**/ if(aas->custom_server_command){ msg = g_strdup_printf("%s\n", aas->custom_server_command); reply = Analysis_Client_send(aclient, msg, "ok:", FALSE); g_free(msg); g_free(reply); } /**/ msg = g_strdup_printf("set param seedrepeat %d", has->seed_repeat); reply = Analysis_Client_send(aclient, msg, "ok:", FALSE); g_free(msg); g_free(reply); /**/ msg = g_strdup_printf("set param dnahspthreshold %d", has->dna_hsp_threshold); reply = Analysis_Client_send(aclient, msg, "ok:", FALSE); g_free(msg); g_free(reply); /**/ msg = g_strdup_printf("set param proteinhspthreshold %d", has->protein_hsp_threshold); reply = Analysis_Client_send(aclient, msg, "ok:", FALSE); g_free(msg); g_free(reply); /**/ msg = g_strdup_printf("set param codonhspthreshold %d", has->codon_hsp_threshold); reply = Analysis_Client_send(aclient, msg, "ok:", FALSE); g_free(msg); g_free(reply); /**/ msg = g_strdup_printf("set param dnawordlimit %d", has->dna_word_limit); reply = Analysis_Client_send(aclient, msg, "ok:", FALSE); g_free(msg); g_free(reply); /**/ msg = g_strdup_printf("set param proteinwordlimit %d", has->protein_word_limit); reply = Analysis_Client_send(aclient, msg, "ok:", FALSE); g_free(msg); g_free(reply); /**/ msg = g_strdup_printf("set param codonwordlimit %d", has->codon_word_limit); reply = Analysis_Client_send(aclient, msg, "ok:", FALSE); g_free(msg); g_free(reply); /**/ msg = g_strdup_printf("set param dnahspdropoff %d", has->dna_hsp_dropoff); reply = Analysis_Client_send(aclient, msg, "ok:", FALSE); g_free(msg); g_free(reply); /**/ msg = g_strdup_printf("set param proteinhspdropoff %d", has->protein_hsp_dropoff); reply = Analysis_Client_send(aclient, msg, "ok:", FALSE); g_free(msg); g_free(reply); /**/ msg = g_strdup_printf("set param codonhspdropoff %d", has->codon_hsp_dropoff); reply = Analysis_Client_send(aclient, msg, "ok:", FALSE); g_free(msg); g_free(reply); /**/ msg = g_strdup_printf("set param geneseedthreshold %d", has->geneseed_threshold); reply = Analysis_Client_send(aclient, msg, "ok:", FALSE); g_free(msg); g_free(reply); /**/ msg = g_strdup_printf("set param geneseedrepeat %d", has->geneseed_repeat); reply = Analysis_Client_send(aclient, msg, "ok:", FALSE); g_free(msg); g_free(reply); /**/ msg = g_strdup_printf("set param maxqueryspan %d", gam->max_query_span); reply = Analysis_Client_send(aclient, msg, "ok:", FALSE); g_free(msg); g_free(reply); /**/ msg = g_strdup_printf("set param maxtargetspan %d", gam->max_target_span); reply = Analysis_Client_send(aclient, msg, "ok:", FALSE); g_free(msg); g_free(reply); /**/ return; } static void Analysis_Client_info(Analysis_Client *aclient){ return; } static Analysis_Client *Analysis_Client_create(gchar *path, gint verbosity){ register Analysis_Client *aclient; register SocketClient *sc = Analysis_Client_connect(path); register Alphabet_Type alphabet_type; register gchar *dbinfo, **dbinfo_word; if(!sc) return NULL; aclient = g_new(Analysis_Client, 1); aclient->ref_count = 1; aclient->sc = sc; aclient->verbosity = verbosity; aclient->probe_fdb = NULL; dbinfo = Analysis_Client_send(aclient, "dbinfo", "dbinfo:", FALSE); dbinfo_word = g_strsplit(dbinfo, " ", 8); /**/ g_assert(dbinfo_word[0]); g_assert(dbinfo_word[1]); g_assert(dbinfo_word[2]); g_assert(dbinfo_word[3]); g_assert(dbinfo_word[4]); /**/ alphabet_type = Alphabet_Type_UNKNOWN; if(!strcmp(dbinfo_word[1], "dna")) alphabet_type = Alphabet_Type_DNA; if(!strcmp(dbinfo_word[1], "protein")) alphabet_type = Alphabet_Type_PROTEIN; /**/ aclient->is_masked = FALSE; if(!strcmp(dbinfo_word[2], "softmasked")) aclient->is_masked = TRUE; else if(!strcmp(dbinfo_word[2], "unmasked")) aclient->is_masked = TRUE; else g_error("Unrecognised dbinfo masking state [%s]", dbinfo_word[2]); /**/ aclient->server_alphabet = Alphabet_create(alphabet_type, aclient->is_masked); aclient->num_seqs = atoll(dbinfo_word[3]); aclient->max_seq_len = atoll(dbinfo_word[4]); aclient->total_seq_len = atoll(dbinfo_word[5]); /**/ g_strfreev(dbinfo_word); g_free(dbinfo); aclient->curr_query = NULL; aclient->seq_cache = g_new0(Sequence*, aclient->num_seqs); Analysis_Client_info(aclient); return aclient; } static Analysis_Client *Analysis_Client_share(Analysis_Client *aclient){ aclient->ref_count++; return aclient; } static void Analysis_Client_destroy(Analysis_Client *aclient){ register gint i; register Sequence *seq; if(--aclient->ref_count) return; if(aclient->curr_query) Sequence_destroy(aclient->curr_query); for(i = 0; i < aclient->num_seqs; i++){ seq = aclient->seq_cache[i]; if(seq) Sequence_destroy(seq); } g_free(aclient->seq_cache); if(aclient->probe_fdb) FastaDB_close(aclient->probe_fdb); SocketClient_destroy(aclient->sc); Alphabet_destroy(aclient->server_alphabet); g_free(aclient); return; } static void Analysis_Client_set_probe_fdb(Analysis_Client *aclient, FastaDB *probe_fdb){ g_assert(!aclient->probe_fdb); aclient->probe_fdb = FastaDB_dup(probe_fdb); return; } static void Analysis_Client_set_query(Analysis_Client *aclient, Sequence *seq){ register gchar *seq_str = Sequence_get_str(seq); register gchar *msg = g_strdup_printf("set query %s", seq_str); register gchar *reply = Analysis_Client_send(aclient, msg, "ok:", FALSE); register gchar **word; register gint len, checksum; if(strncmp(reply, "ok:", 3)) g_error("Could not set query [%s] on server", seq->id); word = g_strsplit(reply+4, " ", 4); len = atoi(word[0]); checksum = atoi(word[1]); if(seq->len != len) g_error("Query length mismatch on server %d %d", seq->len, len); if(Sequence_checksum(seq) != checksum) g_error("Query checksum mismatch on server %d %d", Sequence_checksum(seq), checksum); g_strfreev(word); g_free(reply); g_free(msg); g_free(seq_str); if(aclient->curr_query) Sequence_destroy(aclient->curr_query); aclient->curr_query = Sequence_share(seq); return; } static void Analysis_Client_revcomp_query(Analysis_Client *aclient){ register gchar *reply = Analysis_Client_send(aclient, "revcomp query", "ok: query strand revcomp", FALSE); register Sequence *curr_query; curr_query = aclient->curr_query; aclient->curr_query = Sequence_revcomp(aclient->curr_query); Sequence_destroy(curr_query); g_free(reply); return; } static void Analysis_Client_revcomp_target(Analysis_Client *aclient){ register gchar *reply = Analysis_Client_send(aclient, "revcomp target", "ok: target strand", FALSE); g_free(reply); return; } /**/ typedef struct { Analysis_Client *aclient; gint target_id; gint seq_len; } Analysis_Client_Key; static Analysis_Client_Key *Analysis_Client_Key_create(Analysis_Client *aclient, gint target_id, gint seq_len){ register Analysis_Client_Key *key = g_new(Analysis_Client_Key, 1); key->aclient = Analysis_Client_share(aclient); key->target_id = target_id; key->seq_len = seq_len; return key; } static void Analysis_Client_Key_destroy(Analysis_Client_Key *key){ Analysis_Client_destroy(key->aclient); g_free(key); return; } static gpointer Analysis_Client_SparseCache_get_func(gint pos, gpointer page_data, gpointer user_data){ return GINT_TO_POINTER((gint)((gchar*)page_data)[pos]); } static SparseCache_Page *Analysis_Client_SparseCache_fill_func(gint start, gpointer user_data){ register Analysis_Client_Key *key = user_data; register SparseCache_Page *page = g_new(SparseCache_Page, 1); register gint len = MIN(SparseCache_PAGE_SIZE, key->seq_len-start), page_len; register gchar *msg = g_strdup_printf("get subseq %d %d %d", key->target_id, start, len); register gchar *reply = Analysis_Client_send(key->aclient, msg, "subseq:", FALSE); if(strncmp(reply, "subseq:", 7)) g_error("Failed to get subseq for target (%d,%d,%d) [%s]", key->target_id, start, len, reply); page->get_func = Analysis_Client_SparseCache_get_func; page->copy_func = NULL; page_len = strlen(reply+8)-1; page->data = g_strndup(reply+8, page_len); page->data_size = sizeof(gchar)*page_len; g_free(msg); g_free(reply); FastaDB_SparseCache_compress(page, page_len); return page; } /* FIXME: move compression stuff to SeqPage in Sequence */ static void Analysis_Client_SparseCache_free_func(gpointer user_data){ register Analysis_Client_Key *key = user_data; Analysis_Client_Key_destroy(key); return; } static SparseCache *Analysis_Client_get_SparseCache( Analysis_Client *aclient, gint sequence_id, gint len){ register Analysis_Client_Key *key = Analysis_Client_Key_create(aclient, sequence_id, len); return SparseCache_create(len, Analysis_Client_SparseCache_fill_func, NULL, Analysis_Client_SparseCache_free_func, key); } static Sequence *Analysis_Client_get_Sequence(Analysis_Client *aclient, gint sequence_id, gboolean revcomp_target){ register gchar *msg, *reply, *id, *def; register SparseCache *cache; register Sequence *seq = aclient->seq_cache[sequence_id]; register gint len, checksum; register gchar **seqinfo_word; if(seq){ if(revcomp_target) return Sequence_revcomp(seq); return Sequence_share(seq); } msg = g_strdup_printf("get info %d", sequence_id); reply = Analysis_Client_send(aclient, msg, "seqinfo:", FALSE); /**/ if(strncmp(reply, "seqinfo:", 8)) g_error("Failed to set info for target [%d]", sequence_id); /* parse seqinfo for and [] */ seqinfo_word = g_strsplit(reply+9, " ", 4); len = atoi(seqinfo_word[0]); checksum = atoi(seqinfo_word[1]); id = seqinfo_word[2]; def = seqinfo_word[3]; /* Strip any trailing newlines */ if(id[strlen(id)-1] == '\n') id[strlen(id)-1] = '\0'; if(def && (def[strlen(def)-1] == '\n')) def[strlen(def)-1] = '\0'; /**/ cache = Analysis_Client_get_SparseCache(aclient, sequence_id, len); seq = Sequence_create_extmem(id, def, len, (aclient->server_alphabet->type == Alphabet_Type_DNA) ?Sequence_Strand_FORWARD:Sequence_Strand_UNKNOWN, aclient->server_alphabet, cache); g_assert(!aclient->seq_cache[sequence_id]); aclient->seq_cache[sequence_id] = seq; g_strfreev(seqinfo_word); SparseCache_destroy(cache); g_free(reply); g_free(msg); if(revcomp_target) return Sequence_revcomp(seq); return Sequence_share(seq); } typedef enum { Analysis_Client_HSP_TOKEN_BEGIN_SET, Analysis_Client_HSP_TOKEN_END_SET, Analysis_Client_HSP_TOKEN_INT, Analysis_Client_HSP_TOKEN_FINISH } Analysis_Client_HSP_TOKEN; static Analysis_Client_HSP_TOKEN Analysis_Client_get_hsp_token(gchar *str, gint *pos, gint *intval){ register gint ch; gchar *endptr; g_assert(str); while((ch = str[(*pos)])){ switch(ch){ case 'h': if(strncmp(str+(*pos), "hspset:", 7)) g_error("Unexpected string in HSPset list"); (*pos) += 7; if(strncmp(str+(*pos), " empty\n", 7)){ return Analysis_Client_HSP_TOKEN_BEGIN_SET; } else { (*pos) += 7; break; } case ' ': (*pos)++; break; case '\n': (*pos)++; return Analysis_Client_HSP_TOKEN_END_SET; default: if(isdigit(ch)){ (*intval) = strtol(str+(*pos), &endptr, 10); (*pos) = endptr-str; return Analysis_Client_HSP_TOKEN_INT; } else { g_error("Unexpected character [%d] in HSPset list", ch); } break; } } return Analysis_Client_HSP_TOKEN_FINISH; } static void Analysis_Client_get_hsp_sets(Analysis_Client *aclient, Analysis *analysis, gboolean swap_chains, gboolean revcomp_target){ register gchar *reply = Analysis_Client_send(aclient, "get hsps", "hspset:", TRUE); gint pos = 0, intval = 0; register gboolean ok = TRUE; register Analysis_Client_HSP_TOKEN token; register gint target_id = -1, query_pos = -1, target_pos = -1, length; register Comparison *comparison = NULL; register Sequence *target = NULL; register Match_Type match_type = Match_Type_find(aclient->curr_query->alphabet->type, aclient->server_alphabet->type, FALSE); /* FIXME: use Match_Type_find with translate_both for codon alignments */ register HSPset *hsp_set = NULL; do { token = Analysis_Client_get_hsp_token(reply, &pos, &intval); switch(token){ case Analysis_Client_HSP_TOKEN_BEGIN_SET: break; case Analysis_Client_HSP_TOKEN_INT: if(target_id == -1){ target_id = intval; target = Analysis_Client_get_Sequence(aclient, target_id, revcomp_target); g_assert(!comparison); g_assert(aclient->curr_query); /* FIXME: temp : make work with other HSP types */ /* FIXME: should take necessary HSP params from server */ if(swap_chains) comparison = Comparison_create(analysis->comparison_param, target, aclient->curr_query); else comparison = Comparison_create(analysis->comparison_param, aclient->curr_query, target); /* FIXME: should ensure that the HSPset is created * without a horizon */ Sequence_destroy(target); target = NULL; } else if(query_pos == -1){ query_pos = intval; } else if(target_pos == -1){ target_pos = intval; } else { length = intval; g_assert(comparison); /* g_message("adding one [%d,%d,%d]", query_pos, target_pos, length); */ /* FIXME: need fix to work with for other match types */ switch(match_type){ case Match_Type_DNA2DNA: hsp_set = comparison->dna_hspset; break; case Match_Type_PROTEIN2PROTEIN: case Match_Type_PROTEIN2DNA: case Match_Type_DNA2PROTEIN: hsp_set = comparison->protein_hspset; break; default: g_error("Match_Type not supported [%s]", Match_Type_get_name(match_type)); } g_assert(hsp_set); if(swap_chains) HSPset_add_known_hsp(hsp_set, target_pos, query_pos, length); else HSPset_add_known_hsp(hsp_set, query_pos, target_pos, length); query_pos = target_pos = -1; } break; case Analysis_Client_HSP_TOKEN_END_SET: /* FIXME: needs to work for other hsp_set types */ Comparison_finalise(comparison); if(Comparison_has_hsps(comparison)){ #if 0 /* FIXME: move to use scan_query swap in report */ if(swap_chains) Comparison_swap(comparison); #endif /* 0 */ Analysis_report_func(comparison, analysis); } Comparison_destroy(comparison); comparison = NULL; target_id = -1; break; case Analysis_Client_HSP_TOKEN_FINISH: ok = FALSE; break; } } while(ok); g_assert(target_id == -1); /* format: { } */ /* tokens */ g_free(reply); return; } /* FIXME: only working for single hspset comparisons */ static void Analysis_Client_process_query(Analysis_Client *aclient, Analysis *analysis, Sequence *query, gboolean swap_chains, gboolean revcomp_target, gint priority){ Analysis_Client_set_query(aclient, query); Analysis_Client_get_hsp_sets(aclient, analysis, swap_chains, revcomp_target); /* Revcomp query if DNA */ if(aclient->curr_query->alphabet->type == Alphabet_Type_DNA){ Analysis_Client_revcomp_query(aclient); Analysis_Client_get_hsp_sets(aclient, analysis, swap_chains, revcomp_target); } return; } static void Analysis_Client_process(Analysis_Client *aclient, Analysis *analysis, gboolean swap_chains, gint priority){ register FastaDB_Seq *fdbs; /* FIXME: need to check for appropriate database type */ while((fdbs = FastaDB_next(aclient->probe_fdb, FastaDB_Mask_ALL))){ Analysis_Client_process_query(aclient, analysis, fdbs->seq, swap_chains, FALSE, priority); /* Revcomp target if protein vs DNA or translate_both */ if(((aclient->curr_query->alphabet->type == Alphabet_Type_PROTEIN) && (aclient->server_alphabet->type == Alphabet_Type_DNA)) || analysis->gam->translate_both){ Analysis_Client_revcomp_target(aclient); Analysis_Client_process_query(aclient, analysis, fdbs->seq, swap_chains, TRUE, priority); Analysis_Client_revcomp_target(aclient); } FastaDB_Seq_destroy(fdbs); } return; } /**/ static Analysis_Server *Analysis_Server_create(Analysis_Builder *ab, gchar *name, gint priority){ register Analysis_Server *as = g_new(Analysis_Server, 1); as->name = g_strdup(name); as->ab = ab; as->priority = priority; return as; } static void Analysis_Server_destroy(Analysis_Server *as){ g_free(as->name); g_free(as); return; } /**/ static Analysis_Builder *Analysis_Builder_create(GPtrArray *path_list, Analysis *analysis, gint verbosity){ register Analysis_Builder *ab; register gint i; register Analysis_Client *ac; register Analysis_Server *as; g_assert(path_list->len); ac = Analysis_Client_create(path_list->pdata[0], analysis->verbosity); if(!ac) return NULL; /**/ ab = g_new(Analysis_Builder, 1); ab->verbosity = verbosity; ab->server_list = g_ptr_array_new(); ab->server_type = ac->server_alphabet->type; ab->probe_fdb = NULL; ab->analysis = analysis; Analysis_Client_destroy(ac); for(i = 0; i < path_list->len; i++){ as = Analysis_Server_create(ab, path_list->pdata[i], i); g_ptr_array_add(ab->server_list, as); } return ab; } static void Analysis_Builder_destroy(Analysis_Builder *ab){ register gint i; register Analysis_Server *as; for(i = 0; i < ab->server_list->len; i++){ as = ab->server_list->pdata[i]; Analysis_Server_destroy(as); } g_ptr_array_free(ab->server_list, TRUE); if(ab->probe_fdb) FastaDB_close(ab->probe_fdb); g_free(ab); return; } static void Analysis_Server_run(gpointer data){ register Analysis_Server *server = data; register Analysis_Client *client = Analysis_Client_create(server->name, server->ab->analysis->verbosity); if(!client) g_error("Could not connect to server [%s]", server->name); Analysis_Client_set_param(client, server->ab->analysis->gam); Analysis_Client_set_probe_fdb(client, server->ab->probe_fdb); Analysis_Client_process(client, server->ab->analysis, server->ab->swap_chains, server->priority); /**/ Analysis_Client_destroy(client); return; } static void Analysis_Builder_process(Analysis_Builder *ab, Analysis *analysis, gboolean swap_chains){ register gint i; register Analysis_Server *as; ab->swap_chains = swap_chains; for(i = 0; i < ab->server_list->len; i++){ as = ab->server_list->pdata[i]; #ifdef USE_PTHREADS g_assert(analysis->job_queue); JobQueue_submit(analysis->job_queue, Analysis_Server_run, as, i); #else /* USE_PTHREADS */ Analysis_Server_run(as); #endif /* USE_PTHREADS */ } return; } static void Analysis_Builder_set_probe_fdb(Analysis_Builder *ab, FastaDB *probe_fdb){ g_assert(!ab->probe_fdb); ab->probe_fdb = FastaDB_share(probe_fdb); return; } /**/ static void Analysis_path_list_destroy(GPtrArray *path_list){ register gint i; register gchar *path; for(i = 0; i < path_list->len; i++){ path = path_list->pdata[i]; g_free(path); } g_ptr_array_free(path_list, TRUE); return; } static GPtrArray *Analysis_FOSN_expand_path_list(GPtrArray *path_list){ register gint i; register gchar *path, *npath; struct stat buf; register gboolean expanded_list = FALSE; register GPtrArray *npath_list = g_ptr_array_new(); register FILE *fp; register LineParse *lp; for(i = 0; i < path_list->len; i++){ path = path_list->pdata[i]; if((stat(path, &buf)) /* Cannot read file (ie. servername) */ || (S_ISDIR(buf.st_mode)) /* Is directory */ || (FastaDB_file_is_fasta(path))){ /* 1st non-whitespace char is '>' */ npath = g_strdup(path); g_ptr_array_add(npath_list, npath); } else { /* Assume to be fosn */ fp = fopen(path, "r"); if(!fp) g_error("Could not open FOSN file [%s]", path); lp = LineParse_create(fp); while(LineParse_line(lp) != EOF){ g_strstrip(lp->line->str); /* strip whitespace */ if(lp->line->str[0]){ npath = g_strdup(lp->line->str); g_ptr_array_add(npath_list, npath); expanded_list = TRUE; } } LineParse_destroy(lp); fclose(fp); } } if(!expanded_list){ for(i = 0; i < npath_list->len; i++){ npath = npath_list->pdata[i]; g_free(npath); } g_ptr_array_free(npath_list, TRUE); return NULL; } return npath_list; } /* If members of the path list are files * where the first non-whitespace character is not '>', * it is assumed to be a FOSN, and parsed to expand the path_list. */ Analysis *Analysis_create( GPtrArray *query_path_list, Alphabet_Type query_type, gint query_chunk_id, gint query_chunk_total, GPtrArray *target_path_list, Alphabet_Type target_type, gint target_chunk_id, gint target_chunk_total, gint verbosity){ register Analysis *analysis = g_new0(Analysis, 1); register FastaDB *query_fdb = NULL, *target_fdb = NULL, *seeder_query_fdb, *seeder_target_fdb; register Match *match; Match *dna_match = NULL, *protein_match = NULL, *codon_match = NULL; register HSP_Param *dna_hsp_param, *protein_hsp_param, *codon_hsp_param; register Match_ArgumentSet *mas = Match_ArgumentSet_create(NULL); register gboolean use_horizon; register GPtrArray *expanded_query_path_list = NULL, *expanded_target_path_list = NULL; g_assert(query_path_list); g_assert(target_path_list); g_assert(query_path_list->len); g_assert(target_path_list->len); analysis->aas = Analysis_ArgumentSet_create(NULL); analysis->verbosity = verbosity; analysis->job_queue = JobQueue_create(analysis->aas->thread_count); /* Expand FOSN paths */ expanded_query_path_list = Analysis_FOSN_expand_path_list( query_path_list); expanded_target_path_list = Analysis_FOSN_expand_path_list( target_path_list); if(expanded_query_path_list) query_path_list = expanded_query_path_list; if(expanded_target_path_list) target_path_list = expanded_target_path_list; /**/ analysis->query_builder = Analysis_Builder_create(query_path_list, analysis, verbosity); analysis->target_builder = Analysis_Builder_create(target_path_list, analysis, verbosity); /**/ if(query_type == Alphabet_Type_UNKNOWN){ if(analysis->query_builder){ query_type = analysis->query_builder->server_type; } else { query_type = FastaDB_guess_type( (gchar*)query_path_list->pdata[0]); if(verbosity > 1) g_message("Guessed query type [%s]", Alphabet_Type_get_name(query_type)); } } if(target_type == Alphabet_Type_UNKNOWN){ if(analysis->target_builder){ target_type = analysis->target_builder->server_type; } else { target_type = FastaDB_guess_type( (gchar*)target_path_list->pdata[0]); if(verbosity > 1) g_message("Guessed target type [%s]", Alphabet_Type_get_name(target_type)); } } g_assert((query_type == Alphabet_Type_DNA) ||(query_type == Alphabet_Type_PROTEIN)); g_assert((target_type == Alphabet_Type_DNA) ||(target_type == Alphabet_Type_PROTEIN)); if(verbosity > 1) g_message("Creating analysis with query[%s] target[%s]", Alphabet_Type_get_name(query_type), Alphabet_Type_get_name(target_type)); analysis->gam = GAM_create(query_type, target_type, mas->dna_submat, mas->protein_submat, mas->translate, analysis->aas->use_exhaustive, verbosity); /**/ Analysis_find_matches(analysis, &dna_match, &protein_match, &codon_match); match = dna_match; if(!match) match = protein_match; if(!match) match = codon_match; g_assert(match); if(!analysis->query_builder) query_fdb = FastaDB_open_list_with_limit(query_path_list, match->query->alphabet, query_chunk_id, query_chunk_total); if(!analysis->target_builder) target_fdb = FastaDB_open_list_with_limit(target_path_list, match->target->alphabet, target_chunk_id, target_chunk_total); if(analysis->aas->use_exhaustive){ if(analysis->query_builder || analysis->target_builder) /* FIXME: ni */ g_error("Exhaustive alignment against server not implemented"); analysis->fasta_pipe = FastaPipe_create( query_fdb, target_fdb, Analysis_FastaPipe_Pair_init_func, Analysis_FastaPipe_Pair_prep_func, Analysis_FastaPipe_Pair_term_func, Analysis_FastaPipe_Pair_query_func, Analysis_FastaPipe_Pair_target_func, FastaDB_Mask_ALL, analysis->gam->translate_both, analysis->aas->use_revcomp); analysis->curr_query = NULL; } else { /* Not exhaustive */ use_horizon = (analysis->aas->use_bigseq || analysis->query_builder || analysis->target_builder)?FALSE:TRUE; dna_hsp_param = dna_match ? HSP_Param_create(dna_match, use_horizon) : NULL; protein_hsp_param = protein_match ? HSP_Param_create(protein_match, use_horizon) : NULL; codon_hsp_param = codon_match ? HSP_Param_create(codon_match, use_horizon) : NULL; analysis->comparison_param = Comparison_Param_create( query_type, target_type, dna_hsp_param, protein_hsp_param, codon_hsp_param); if(dna_hsp_param) HSP_Param_destroy(dna_hsp_param); if(protein_hsp_param) HSP_Param_destroy(protein_hsp_param); if(codon_hsp_param) HSP_Param_destroy(codon_hsp_param); /* Raise HSP thresholds to score if ungapped */ if(!Model_Type_is_gapped(analysis->gam->gas->type)){ if(analysis->comparison_param->dna_hsp_param && (analysis->comparison_param->dna_hsp_param->threshold < analysis->gam->gas->threshold)) analysis->comparison_param->dna_hsp_param->threshold = analysis->gam->gas->threshold; if(analysis->comparison_param->protein_hsp_param && (analysis->comparison_param->protein_hsp_param->threshold < analysis->gam->gas->threshold)) analysis->comparison_param->protein_hsp_param->threshold = analysis->gam->gas->threshold; if(analysis->comparison_param->codon_hsp_param && (analysis->comparison_param->codon_hsp_param->threshold < analysis->gam->gas->threshold)) analysis->comparison_param->codon_hsp_param->threshold = analysis->gam->gas->threshold; } /* Don't need HSP horizon for bigseq comparison */ if(analysis->query_builder || analysis->target_builder){ if(analysis->query_builder && analysis->target_builder) g_error("Server vs server comparison not impelemented"); analysis->fasta_pipe = NULL; if(analysis->query_builder){ Analysis_Builder_set_probe_fdb(analysis->query_builder, target_fdb); } else { g_assert(analysis->target_builder); Analysis_Builder_set_probe_fdb(analysis->target_builder, query_fdb); } } else { if(analysis->aas->use_bigseq){ analysis->bsam = BSAM_create(analysis->comparison_param, analysis->aas->saturate_threshold, verbosity); analysis->fasta_pipe = FastaPipe_create( query_fdb, target_fdb, Analysis_FastaPipe_Pair_init_func, Analysis_FastaPipe_Pair_prep_func, Analysis_FastaPipe_Pair_term_func, Analysis_FastaPipe_Pair_query_func, Analysis_FastaPipe_Pair_target_func, FastaDB_Mask_ALL, analysis->gam->translate_both, analysis->aas->use_revcomp); analysis->curr_query = NULL; } else { /* Use Seeder */ analysis->scan_query = Analysis_decide_scan_query(query_fdb, target_fdb, analysis->aas->force_scan); if(verbosity > 1) g_message("Applying FSM scan to [%s]", analysis->scan_query?"query":"target"); /* Swap paths and types * for query and target when scan on query */ if(analysis->scan_query){ seeder_query_fdb = target_fdb; seeder_target_fdb = query_fdb; } else { seeder_query_fdb = query_fdb; seeder_target_fdb = target_fdb; } analysis->curr_seeder = NULL; analysis->fasta_pipe = FastaPipe_create( seeder_query_fdb, seeder_target_fdb, Analysis_FastaPipe_Seeder_init_func, Analysis_FastaPipe_Seeder_prep_func, Analysis_FastaPipe_Seeder_term_func, Analysis_FastaPipe_Seeder_query_func, Analysis_FastaPipe_Seeder_target_func, FastaDB_Mask_ALL, analysis->gam->translate_both, analysis->aas->use_revcomp); } } } if(query_fdb) FastaDB_close(query_fdb); if(target_fdb) FastaDB_close(target_fdb); /**/ if(expanded_query_path_list) Analysis_path_list_destroy(expanded_query_path_list); if(expanded_target_path_list) Analysis_path_list_destroy(expanded_target_path_list); return analysis; } void Analysis_destroy(Analysis *analysis){ JobQueue_destroy(analysis->job_queue); if(analysis->fasta_pipe) FastaPipe_destroy(analysis->fasta_pipe); if(analysis->curr_query) FastaDB_Seq_destroy(analysis->curr_query); if(analysis->curr_seeder) Seeder_destroy(analysis->curr_seeder); if(analysis->bsam) BSAM_destroy(analysis->bsam); if(analysis->comparison_param) Comparison_Param_destroy(analysis->comparison_param); /* if(analysis->query_ac) Analysis_Client_destroy(analysis->query_ac); if(analysis->target_ac) Analysis_Client_destroy(analysis->target_ac); */ if(analysis->query_builder) Analysis_Builder_destroy(analysis->query_builder); if(analysis->target_builder) Analysis_Builder_destroy(analysis->target_builder); GAM_destroy(analysis->gam); g_free(analysis); return; } void Analysis_process(Analysis *analysis){ if(analysis->query_builder){ Analysis_Builder_process(analysis->query_builder, analysis, TRUE); } else if(analysis->target_builder){ Analysis_Builder_process(analysis->target_builder, analysis, FALSE); } else { while(FastaPipe_process(analysis->fasta_pipe, analysis)); } if(analysis->job_queue) JobQueue_complete(analysis->job_queue); GAM_report(analysis->gam); return; } /**/ exonerate-2.4.0/src/hub/bsam.c0000644000175000017500000003107411162714301013064 00000000000000/****************************************************************\ * * * BSAM: Big Sequence Alignment Manager * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "bsam.h" #include "dejavu.h" #include "comparison.h" /**/ BSAM *BSAM_create(Comparison_Param *comparison_param, gint saturate_threshold, gint verbosity){ register BSAM *bsam = g_new(BSAM, 1); bsam->comparison_param = Comparison_Param_share(comparison_param); bsam->saturate_threshold = saturate_threshold; bsam->verbosity = verbosity; return bsam; } void BSAM_destroy(BSAM *bsam){ Comparison_Param_destroy(bsam->comparison_param); g_free(bsam); return; } /**/ typedef struct { BSAM *bsam; Sequence *query; Sequence *target; HSPset *hspset; GArray *query_pos_list; GArray *target_pos_list; GArray *seed_list; gint wordlen; gint partition; gint query_expect; gint target_expect; } BSAM_SeedSet; static gint BSAM_get_expect(BSAM *bsam, Sequence *sequence, gint wordlen){ register gint expect; register gdouble expectation; register gint alphabet_size = 0; switch(sequence->alphabet->type){ case Alphabet_Type_DNA: alphabet_size = 4; break; case Alphabet_Type_PROTEIN: alphabet_size = 20; break; default: g_error("Unknown alphabet type [%s]", Alphabet_Type_get_name(sequence->alphabet->type)); break; } expectation = 1.0/pow(alphabet_size, wordlen); expect = (gint)((expectation * (sequence->len - wordlen+1)) + bsam->saturate_threshold); return expect; } static BSAM_SeedSet *BSAM_SeedSet_create(BSAM *bsam, Comparison *comparison, HSPset *hspset){ register BSAM_SeedSet *seedset = g_new(BSAM_SeedSet, 1); seedset->bsam = bsam; seedset->query = Sequence_share(comparison->query); seedset->target = Sequence_share(comparison->target); seedset->query_pos_list = g_array_new(FALSE, FALSE, sizeof(gint)); seedset->target_pos_list = g_array_new(FALSE, FALSE, sizeof(gint)); seedset->seed_list = g_array_new(FALSE, FALSE, sizeof(gint)); if(hspset->param->match->query->is_translated) seedset->partition = 3 * ((comparison->query->len / 3)+1); else seedset->partition = comparison->query->len; seedset->hspset = HSPset_share(hspset); seedset->query_expect = BSAM_get_expect(bsam, comparison->query, seedset->hspset->param->wordlen); seedset->target_expect = BSAM_get_expect(bsam, comparison->target, seedset->hspset->param->wordlen); return seedset; } static void BSAM_SeedSet_destroy(BSAM_SeedSet *seedset){ Sequence_destroy(seedset->query); Sequence_destroy(seedset->target); HSPset_destroy(seedset->hspset); g_array_free(seedset->query_pos_list, TRUE); g_array_free(seedset->target_pos_list, TRUE); g_array_free(seedset->seed_list, TRUE); g_free(seedset); return; } static void BSAM_SeedSet_empty_current(BSAM_SeedSet *seedset){ register gint i, j; gint qpos, tpos; if(seedset->query_pos_list->len){ if(seedset->target_pos_list->len){ for(i = 0; i < seedset->query_pos_list->len; i++){ qpos = g_array_index(seedset->query_pos_list, gint, i); for(j = 0; j < seedset->target_pos_list->len; j++){ tpos = g_array_index(seedset->target_pos_list, gint, j); /* If seed is preceeded by a match, ignore. * (Do not filter when using a saturate threshold * eg. poly-T at match start would prevent a seed). */ if((!seedset->bsam->saturate_threshold) && ((qpos > 0) && (tpos > 0) && (Sequence_get_symbol(seedset->query, qpos-1) == Sequence_get_symbol(seedset->target, tpos-1)))) continue; g_array_append_val(seedset->seed_list, qpos); g_array_append_val(seedset->seed_list, tpos); } } g_array_set_size(seedset->target_pos_list, 0); } g_array_set_size(seedset->query_pos_list, 0); } /* (Can't have target positions without query positions) */ return; } static void BSAM_DejaVu_traverse(gint first_pos, gint curr_pos, gint length, gchar *seq, gint len, gpointer user_data){ register BSAM_SeedSet *seedset = user_data; gint tpos; /* If first_pos is in target, ignore this word */ if(first_pos > seedset->partition) return; if(first_pos == curr_pos) /* New word */ BSAM_SeedSet_empty_current(seedset); if(curr_pos < seedset->partition){ if((!seedset->bsam->saturate_threshold) || (seedset->query_pos_list->len <= seedset->query_expect)) g_array_append_val(seedset->query_pos_list, curr_pos); } else { /* Collect 2nd list. (only if 1st list not empty) */ /* Apply saturate threshold here * by emptying both lists. */ if(seedset->bsam->saturate_threshold &&((seedset->query_pos_list->len * seedset->target_pos_list->len) > (seedset->query_expect * seedset->target_expect))){ g_array_set_size(seedset->query_pos_list, 0); g_array_set_size(seedset->target_pos_list, 0); } /* FIXME: also need to apply saturatethreshold * to cap size of query_pos_list */ if(seedset->query_pos_list->len){ tpos = curr_pos - seedset->partition - 1; g_array_append_val(seedset->target_pos_list, tpos); } } return; } /* * <=>-<=>-<=>-<=> * init:012-345-678-9AB * trans:0369-147A-258B- * * frame = (pos < len)?0:(pos < (len<<1))?1:2; * frame_start = (frame * (aa_len + 1)); * codon = pos - frame_start * orig_pos = (codon * 3) + frame * */ static guint BSAM_SeedSet_aapos2dnapos(guint pos, gint dna_len){ register gint frame, frame_start, codon, aa_len; register guint dna_pos; aa_len = dna_len / 3; if(pos < ((aa_len+1) << 1)) if(pos < (aa_len+1)) frame = 0; else frame = 1; else frame = 2; frame_start = (frame * (aa_len + 1)); codon = pos - frame_start; dna_pos = (codon * 3) + frame; g_assert(dna_pos >= 0); g_assert(dna_pos < dna_len); return dna_pos; } static void BSAM_SeedSet_build_hspset(BSAM *bsam, BSAM_SeedSet *seedset, Comparison *comparison){ register gint i; guint pos; /* Convert query coordinates */ if(seedset->hspset->param->match->query->is_translated){ for(i = 0; i < seedset->seed_list->len; i+=2){ pos = g_array_index(seedset->seed_list, guint, i); pos = BSAM_SeedSet_aapos2dnapos(pos, comparison->query->len); g_array_index(seedset->seed_list, guint, i) = pos; } } /* Convert target coordinates */ if(seedset->hspset->param->match->target->is_translated){ for(i = 0; i < seedset->seed_list->len; i+=2){ pos = g_array_index(seedset->seed_list, guint, i+1); pos = BSAM_SeedSet_aapos2dnapos(pos, comparison->target->len); g_array_index(seedset->seed_list, guint, i+1) = pos; } } g_assert(!(seedset->seed_list->len & 1)); if(seedset->seed_list->len){ HSPset_seed_all_hsps(seedset->hspset, (guint*)seedset->seed_list->data, (guint)(seedset->seed_list->len >> 1)); } return; } static void BSAM_BigSeq_add_sequence(GString *seq, Sequence *input, Translate *translate, gboolean is_translated, guchar *filter){ register gint i, j, max_aa_len; register Sequence *aa_seq; if(is_translated){ g_assert(translate); max_aa_len = input->len / 3; for(i = 0; i < 3; i++){ aa_seq = Sequence_translate(input, translate, i+1); Sequence_strcpy(aa_seq, seq->str+seq->len); for(j = aa_seq->len; j < max_aa_len; j++) g_string_append_c(seq, '-'); g_string_append_c(seq, '-'); Sequence_destroy(aa_seq); } } else { Sequence_strcpy(input, seq->str+seq->len); seq->len += input->len; } return; } static GString *BSAM_create_BigSeq(BSAM *bsam, Sequence *query, Sequence *target, BSAM_SeedSet *seedset){ register Match *match = seedset->hspset->param->match; register GString *bigseq = g_string_sized_new(query->len+target->len+1); if(bsam->verbosity > 1) g_message("Preparing BigSeq comparison of [%s] and [%s]", query->id, target->id); BSAM_BigSeq_add_sequence(bigseq, query, match->mas->translate, match->query->is_translated, match->query->alphabet->masked); g_assert(bigseq->len == seedset->partition); g_string_append_c(bigseq, '-'); BSAM_BigSeq_add_sequence(bigseq, target, match->mas->translate, match->target->is_translated, match->target->alphabet->masked); return bigseq; } static void BSAM_build_HSPset(BSAM *bsam, Comparison *comparison, HSPset *hspset){ register DejaVu *dejavu; register GString *bigseq; register BSAM_SeedSet *seedset = BSAM_SeedSet_create(bsam, comparison, hspset); register guchar *filter = Alphabet_get_filter_by_type( seedset->hspset->param->match->comparison_alphabet, Alphabet_Filter_Type_NON_AMBIG); /* Will include softmasking */ bigseq = BSAM_create_BigSeq(bsam, comparison->query, comparison->target, seedset); if(bsam->verbosity > 1) g_message("Finding BigSeq seeds"); dejavu = DejaVu_create(bigseq->str, bigseq->len); DejaVu_traverse(dejavu, hspset->param->wordlen, hspset->param->wordlen, BSAM_DejaVu_traverse, seedset, filter, bsam->verbosity); BSAM_SeedSet_empty_current(seedset); DejaVu_destroy(dejavu); g_string_free(bigseq, TRUE); if(bsam->verbosity > 1) g_message("Building BigSeq HSPs from [%d] seeds", (seedset->seed_list->len >> 1)); BSAM_SeedSet_build_hspset(bsam, seedset, comparison); BSAM_SeedSet_destroy(seedset); if(bsam->verbosity > 1) g_message("Building BigSeq alignment from [%d] hsps", hspset->hsp_list->len); return; } Comparison *BSAM_compare(BSAM *bsam, Sequence *query, Sequence *target){ register Comparison *comparison = Comparison_create(bsam->comparison_param, query, target); if(bsam->comparison_param->dna_hsp_param) BSAM_build_HSPset(bsam, comparison, comparison->dna_hspset); if(bsam->comparison_param->protein_hsp_param) BSAM_build_HSPset(bsam, comparison, comparison->protein_hspset); if(bsam->comparison_param->codon_hsp_param) BSAM_build_HSPset(bsam, comparison, comparison->codon_hspset); if(!Comparison_has_hsps(comparison)){ Comparison_destroy(comparison); return NULL; } return comparison; } /**/ exonerate-2.4.0/src/hub/bsam.test.c0000644000175000017500000000210111162714301014027 00000000000000/****************************************************************\ * * * BSAM: Big Sequence Alignment Manager * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "bsam.h" #include "argument.h" int Argument_main(Argument *arg){ g_warning("Test [%s] not implemented", __FILE__); return 0; } exonerate-2.4.0/src/hub/gam.h0000644000175000017500000001141611222232404012705 00000000000000/****************************************************************\ * * * GAM: Gapped Alignment Manager * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_GAM_H #define INCLUDED_GAM_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #ifdef USE_PTHREADS #include #endif /* USE_PTHREADS */ #include "sequence.h" #include "c4.h" #include "heuristic.h" #include "hpair.h" #include "sequence.h" #include "hspset.h" #include "submat.h" #include "argument.h" #include "translate.h" #include "pqueue.h" #include "modeltype.h" #include "comparison.h" #include "sdp.h" #include "subopt.h" #include "threadref.h" typedef enum { GAM_Refinement_NONE, GAM_Refinement_REGION, GAM_Refinement_FULL, GAM_Refinement_TOTAL /* Just the total */ } GAM_Refinement; typedef struct { Model_Type type; C4_Score threshold; gfloat percent_threshold; gboolean show_alignment; gboolean show_sugar; gboolean show_cigar; gboolean show_vulgar; gboolean show_query_gff; gboolean show_target_gff; gchar *ryo; gint best_n; gboolean use_subopt; gboolean use_gapped_extension; /**/ GAM_Refinement refinement; gint refinement_boundary; } GAM_ArgumentSet; GAM_ArgumentSet *GAM_ArgumentSet_create(Argument *arg); typedef struct { C4_Score score; glong pos; glong len; } GAM_StoredResult; typedef struct { gchar *query_id; PQueue *pq; /* Contains GAM_StoredResult */ gint tie_count; /* For best_n */ C4_Score tie_score; /* For best_n */ } GAM_QueryResult; typedef struct { gchar *query_id; C4_Score threshold; } GAM_QueryInfo; typedef struct { ThreadRef *thread_ref; Alphabet_Type query_type; Alphabet_Type target_type; C4_Model *model; GPtrArray *match_list; Optimal *optimal; Heuristic *heuristic; SDP *sdp; Submat *dna_submat; Submat *protein_submat; Translate *translate; GAM_ArgumentSet *gas; GTree *bestn_tree; /* Contains GAM_QueryResult */ FILE *bestn_tmp_file; gint verbosity; gboolean translate_both; gboolean dual_match; GTree *percent_threshold_tree; /* Contains GAM_QueryInfo */ PQueueSet *pqueue_set; gint max_query_span; gint max_target_span; #ifdef USE_PTHREADS pthread_mutex_t gam_lock; #endif /* USE_PTHREADS */ } GAM; GAM *GAM_create(Alphabet_Type query_type, Alphabet_Type target_type, Submat *dna_submat, Submat *protein_submat, Translate *translate, gboolean use_exhaustive, gint verbosity); GAM *GAM_share(GAM *gam); void GAM_destroy(GAM *gam); void GAM_report(GAM *gam); typedef struct { gint ref_count; GAM *gam; Sequence *query; Sequence *target; GPtrArray *alignment_list; gpointer user_data; gpointer self_data; SubOpt *subopt; } GAM_Result; GAM_Result *GAM_Result_ungapped_create(GAM *gam, Comparison *comparison); /* Will return NULL when no alignments are produced */ GAM_Result *GAM_Result_heuristic_create(GAM *gam, Comparison *comparison); /* Will return NULL when no alignments are produced */ GAM_Result *GAM_Result_exhaustive_create(GAM *gam, Sequence *query, Sequence *target); GAM_Result *GAM_Result_share(GAM_Result *gam_result); void GAM_Result_destroy(GAM_Result *gam_result); void GAM_Result_submit(GAM_Result *gam_result); void GAM_lock(GAM *gam); void GAM_unlock(GAM *gam); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_GAM_H */ exonerate-2.4.0/src/hub/analysis.h0000644000175000017500000000701511222464342013774 00000000000000/****************************************************************\ * * * Analysis module for exonerate * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_ANALYSIS_H #define INCLUDED_ANALYSIS_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "fastapipe.h" #include "gam.h" #include "bsam.h" #include "argument.h" #include "submat.h" #include "seeder.h" #include "comparison.h" #include "socket.h" #include "jobqueue.h" typedef struct { gboolean use_exhaustive; gboolean use_bigseq; gboolean use_revcomp; gchar *force_scan; gint saturate_threshold; gchar *custom_server_command; gint thread_count; } Analysis_ArgumentSet; Analysis_ArgumentSet *Analysis_ArgumentSet_create(Argument *arg); /**/ typedef struct { gint verbosity; gint ref_count; Alphabet *server_alphabet; gboolean is_masked; guint64 num_seqs; guint64 max_seq_len; guint64 total_seq_len; /**/ SocketClient *sc; FastaDB *probe_fdb; Sequence *curr_query; Sequence **seq_cache; } Analysis_Client; typedef struct { gchar *name; gint priority; /* used for job queues */ struct Analysis_Builder *ab; } Analysis_Server; typedef struct Analysis_Builder { gint verbosity; FastaDB *probe_fdb; GPtrArray *server_list; /* Contains Analysis_Server objects */ Alphabet_Type server_type; struct Analysis *analysis; gboolean swap_chains; } Analysis_Builder; typedef struct Analysis { FastaPipe *fasta_pipe; GAM *gam; BSAM *bsam; Analysis_ArgumentSet *aas; FastaDB_Seq *curr_query; Seeder *curr_seeder; gboolean scan_query; Comparison_Param *comparison_param; gint verbosity; /* Analysis_Client *query_ac; Analysis_Client *target_ac; */ Analysis_Builder *query_builder; Analysis_Builder *target_builder; JobQueue *job_queue; } Analysis; Analysis *Analysis_create( GPtrArray *query_path_list, Alphabet_Type query_type, gint query_chunk_id, gint query_chunk_total, GPtrArray *target_path_list, Alphabet_Type target_type, gint target_chunk_id, gint target_chunk_total, gint verbosity); void Analysis_destroy(Analysis *analysis); void Analysis_process(Analysis *analysis); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_ANALYSIS_H */ exonerate-2.4.0/src/hub/bsam.h0000644000175000017500000000277611162714301013100 00000000000000/****************************************************************\ * * * BSAM: Big Sequence Alignment Manager * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #ifndef INCLUDED_BSAM_H #define INCLUDED_BSAM_H #ifdef __cplusplus extern "C" { #endif /* __cplusplus */ #include #include "comparison.h" /**/ typedef struct { Comparison_Param *comparison_param; gint saturate_threshold; gint verbosity; } BSAM; BSAM *BSAM_create(Comparison_Param *comparison_param, gint saturate_threshold, gint verbosity); void BSAM_destroy(BSAM *bsam); Comparison *BSAM_compare(BSAM *bsam, Sequence *query, Sequence *target); #ifdef __cplusplus } #endif /* __cplusplus */ #endif /* INCLUDED_BSAM_H */ exonerate-2.4.0/src/program/0000777000175000017500000000000011222703703012747 500000000000000exonerate-2.4.0/src/program/Makefile.am0000644000175000017500000002101511222231251014710 00000000000000 bin_PROGRAMS = exonerate ipcress exonerate-server INCLUDES = -I$(top_srcdir)/src/sequence \ -I$(top_srcdir)/src/database \ -I$(top_srcdir)/src/comparison \ -I$(top_srcdir)/src/struct \ -I$(top_srcdir)/src/c4 \ -I$(top_srcdir)/src/bsdp \ -I$(top_srcdir)/src/sdp \ -I$(top_srcdir)/src/model \ -I$(top_srcdir)/src/hub \ -I$(top_srcdir)/src/general \ -DHOSTTYPE="\"@host@\"" \ -DSOURCE_ROOT_DIR="\"@source_root_dir@\"" \ -DCUSTOM_GUINT64_FORMAT="\"@custom_guint64_format@\"" # Missing viterb.o and scheduler.o so codegen versions may be substituted C4_OBJECTS = $(top_srcdir)/src/c4/c4.o \ $(top_srcdir)/src/c4/codegen.o \ $(top_srcdir)/src/c4/cgutil.o \ $(top_srcdir)/src/c4/opair.o \ $(top_srcdir)/src/c4/alignment.o \ $(top_srcdir)/src/c4/optimal.o \ $(top_srcdir)/src/c4/layout.o \ $(top_srcdir)/src/c4/region.o \ $(top_srcdir)/src/c4/subopt.o \ $(top_srcdir)/src/bsdp/bsdp.o \ $(top_srcdir)/src/bsdp/sar.o \ $(top_srcdir)/src/bsdp/heuristic.o \ $(top_srcdir)/src/bsdp/hpair.o \ $(top_srcdir)/src/sdp/boundary.o \ $(top_srcdir)/src/sdp/sdp.o \ $(top_srcdir)/src/sdp/lookahead.o \ $(top_srcdir)/src/sdp/straceback.o \ $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/model/affine.o \ $(top_srcdir)/src/model/est2genome.o \ $(top_srcdir)/src/model/ner.o \ $(top_srcdir)/src/model/protein2dna.o \ $(top_srcdir)/src/model/protein2genome.o \ $(top_srcdir)/src/model/coding2coding.o \ $(top_srcdir)/src/model/coding2genome.o \ $(top_srcdir)/src/model/cdna2genome.o \ $(top_srcdir)/src/model/genome2genome.o \ $(top_srcdir)/src/model/ungapped.o \ $(top_srcdir)/src/model/intron.o \ $(top_srcdir)/src/model/frameshift.o \ $(top_srcdir)/src/model/phase.o \ $(top_srcdir)/src/model/modeltype.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/socket.o \ $(top_srcdir)/src/general/jobqueue.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o EXTRA_exonerate_SOURCES = c4_model_archive.a c4_model_archive.h BUILT_SOURCES = c4_model_archive.a BOOTSTRAPPER = $(top_srcdir)/src/model/bootstrapper c4_model_archive.h: $(BOOTSTRAPPER) $(BOOTSTRAPPER) --compiled no c4_model_archive.a: $(BOOTSTRAPPER) $(BOOTSTRAPPER) --compiled no # Special viterbi.o and scheduler.o for compiled model linking viterbi.o: $(top_srcdir)/src/c4/viterbi.c $(CC) $(CFLAGS) $(INCLUDES) -I. \ -DUSE_COMPILED_MODELS \ -c $(top_srcdir)/src/c4/viterbi.c scheduler.o: $(top_srcdir)/src/sdp/scheduler.c $(CC) $(CFLAGS) $(INCLUDES) -I. \ -DUSE_COMPILED_MODELS \ -c $(top_srcdir)/src/sdp/scheduler.c exonerate_COMPONENTS = @codegen_extra_ldadd@ \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/seeder.o \ $(top_srcdir)/src/comparison/comparison.o \ $(top_srcdir)/src/database/fastapipe.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/struct/dejavu.o \ $(top_srcdir)/src/struct/fsm.o \ $(top_srcdir)/src/struct/vfsm.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/hub/gam.o \ $(top_srcdir)/src/hub/bsam.o \ $(top_srcdir)/src/hub/analysis.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/threadref.o \ $(C4_OBJECTS) exonerate_SOURCES = exonerate.c exonerate_DEPENDENCIES = @codegen_extra_sources@ Makefile \ $(exonerate_COMPONENTS) exonerate_LDADD = $(exonerate_COMPONENTS) -lm # Remove built sources from distribution # (will be able to use nodist_exonerate_SOURCES in automake 1.5) # nodist_exonerate_SOURCES = c4_model_archive.a dist-hook: for file in $(EXTRA_exonerate_SOURCES) ; \ do rm -f $(distdir)/$$file ; done ; ipcress_SOURCES = ipcress.c ipcress_LDADD = $(top_srcdir)/src/comparison/pcr.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/struct/fsm.o \ $(top_srcdir)/src/struct/slist.o \ $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ -lm # Files to clear away exonerate_server_SOURCES = exonerate-server.c exonerate_server_LDADD = $(top_srcdir)/src/general/socket.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/database/dataset.o \ $(top_srcdir)/src/database/index.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/struct/bitarray.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/struct/splaytree.o \ $(top_srcdir)/src/struct/noitree.o \ $(top_srcdir)/src/struct/vfsm.o \ $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o \ -lm CLEANFILES = $(EXTRA_exonerate_SOURCES) @codegen_extra_sources@ \ @codegen_extra_ld_add@ MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/src/program/Makefile.in0000644000175000017500000004427011222703652014742 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ bin_PROGRAMS = exonerate ipcress exonerate-server INCLUDES = -I$(top_srcdir)/src/sequence -I$(top_srcdir)/src/database -I$(top_srcdir)/src/comparison -I$(top_srcdir)/src/struct -I$(top_srcdir)/src/c4 -I$(top_srcdir)/src/bsdp -I$(top_srcdir)/src/sdp -I$(top_srcdir)/src/model -I$(top_srcdir)/src/hub -I$(top_srcdir)/src/general -DHOSTTYPE="\"@host@\"" -DSOURCE_ROOT_DIR="\"@source_root_dir@\"" -DCUSTOM_GUINT64_FORMAT="\"@custom_guint64_format@\"" # Missing viterb.o and scheduler.o so codegen versions may be substituted C4_OBJECTS = $(top_srcdir)/src/c4/c4.o $(top_srcdir)/src/c4/codegen.o $(top_srcdir)/src/c4/cgutil.o $(top_srcdir)/src/c4/opair.o $(top_srcdir)/src/c4/alignment.o $(top_srcdir)/src/c4/optimal.o $(top_srcdir)/src/c4/layout.o $(top_srcdir)/src/c4/region.o $(top_srcdir)/src/c4/subopt.o $(top_srcdir)/src/bsdp/bsdp.o $(top_srcdir)/src/bsdp/sar.o $(top_srcdir)/src/bsdp/heuristic.o $(top_srcdir)/src/bsdp/hpair.o $(top_srcdir)/src/sdp/boundary.o $(top_srcdir)/src/sdp/sdp.o $(top_srcdir)/src/sdp/lookahead.o $(top_srcdir)/src/sdp/straceback.o $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/rangetree.o $(top_srcdir)/src/sequence/sequence.o $(top_srcdir)/src/sequence/alphabet.o $(top_srcdir)/src/sequence/translate.o $(top_srcdir)/src/sequence/splice.o $(top_srcdir)/src/model/affine.o $(top_srcdir)/src/model/est2genome.o $(top_srcdir)/src/model/ner.o $(top_srcdir)/src/model/protein2dna.o $(top_srcdir)/src/model/protein2genome.o $(top_srcdir)/src/model/coding2coding.o $(top_srcdir)/src/model/coding2genome.o $(top_srcdir)/src/model/cdna2genome.o $(top_srcdir)/src/model/genome2genome.o $(top_srcdir)/src/model/ungapped.o $(top_srcdir)/src/model/intron.o $(top_srcdir)/src/model/frameshift.o $(top_srcdir)/src/model/phase.o $(top_srcdir)/src/model/modeltype.o $(top_srcdir)/src/general/argument.o $(top_srcdir)/src/general/lineparse.o $(top_srcdir)/src/general/socket.o $(top_srcdir)/src/general/jobqueue.o $(top_srcdir)/src/comparison/match.o $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/sequence/codonsubmat.o EXTRA_exonerate_SOURCES = c4_model_archive.a c4_model_archive.h BUILT_SOURCES = c4_model_archive.a BOOTSTRAPPER = $(top_srcdir)/src/model/bootstrapper exonerate_COMPONENTS = @codegen_extra_ldadd@ $(top_srcdir)/src/comparison/wordhood.o $(top_srcdir)/src/comparison/hspset.o $(top_srcdir)/src/comparison/seeder.o $(top_srcdir)/src/comparison/comparison.o $(top_srcdir)/src/database/fastapipe.o $(top_srcdir)/src/database/fastadb.o $(top_srcdir)/src/struct/dejavu.o $(top_srcdir)/src/struct/fsm.o $(top_srcdir)/src/struct/vfsm.o $(top_srcdir)/src/struct/sparsecache.o $(top_srcdir)/src/hub/gam.o $(top_srcdir)/src/hub/bsam.o $(top_srcdir)/src/hub/analysis.o $(top_srcdir)/src/general/compoundfile.o $(top_srcdir)/src/general/threadref.o $(C4_OBJECTS) exonerate_SOURCES = exonerate.c exonerate_DEPENDENCIES = @codegen_extra_sources@ Makefile $(exonerate_COMPONENTS) exonerate_LDADD = $(exonerate_COMPONENTS) -lm ipcress_SOURCES = ipcress.c ipcress_LDADD = $(top_srcdir)/src/comparison/pcr.o $(top_srcdir)/src/comparison/wordhood.o $(top_srcdir)/src/database/fastadb.o $(top_srcdir)/src/sequence/alphabet.o $(top_srcdir)/src/sequence/sequence.o $(top_srcdir)/src/sequence/splice.o $(top_srcdir)/src/sequence/translate.o $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/sequence/codonsubmat.o $(top_srcdir)/src/struct/fsm.o $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/sparsecache.o $(top_srcdir)/src/general/lineparse.o $(top_srcdir)/src/general/argument.o $(top_srcdir)/src/general/compoundfile.o -lm # Files to clear away exonerate_server_SOURCES = exonerate-server.c exonerate_server_LDADD = $(top_srcdir)/src/general/socket.o $(top_srcdir)/src/general/argument.o $(top_srcdir)/src/comparison/hspset.o $(top_srcdir)/src/comparison/match.o $(top_srcdir)/src/comparison/wordhood.o $(top_srcdir)/src/database/dataset.o $(top_srcdir)/src/database/index.o $(top_srcdir)/src/database/fastadb.o $(top_srcdir)/src/struct/bitarray.o $(top_srcdir)/src/struct/sparsecache.o $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/struct/splaytree.o $(top_srcdir)/src/struct/noitree.o $(top_srcdir)/src/struct/vfsm.o $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/rangetree.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/sequence/sequence.o $(top_srcdir)/src/sequence/alphabet.o $(top_srcdir)/src/sequence/translate.o $(top_srcdir)/src/sequence/splice.o $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/sequence/codonsubmat.o $(top_srcdir)/src/general/compoundfile.o $(top_srcdir)/src/general/lineparse.o $(top_srcdir)/src/general/threadref.o -lm CLEANFILES = $(EXTRA_exonerate_SOURCES) @codegen_extra_sources@ @codegen_extra_ld_add@ MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = PROGRAMS = $(bin_PROGRAMS) DEFS = @DEFS@ -I. -I$(srcdir) CPPFLAGS = @CPPFLAGS@ LDFLAGS = @LDFLAGS@ LIBS = @LIBS@ exonerate_OBJECTS = exonerate.o exonerate_LDFLAGS = ipcress_OBJECTS = ipcress.o ipcress_DEPENDENCIES = $(top_srcdir)/src/comparison/pcr.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o $(top_srcdir)/src/struct/fsm.o \ $(top_srcdir)/src/struct/slist.o $(top_srcdir)/src/struct/matrix.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o ipcress_LDFLAGS = exonerate_server_OBJECTS = exonerate-server.o exonerate_server_DEPENDENCIES = $(top_srcdir)/src/general/socket.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/database/dataset.o $(top_srcdir)/src/database/index.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/struct/bitarray.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o $(top_srcdir)/src/struct/splaytree.o \ $(top_srcdir)/src/struct/noitree.o $(top_srcdir)/src/struct/vfsm.o \ $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/sequence/splice.o $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/general/threadref.o exonerate_server_LDFLAGS = CFLAGS = @CFLAGS@ COMPILE = $(CC) $(DEFS) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) CCLD = $(CC) LINK = $(CCLD) $(AM_CFLAGS) $(CFLAGS) $(LDFLAGS) -o $@ DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best SOURCES = $(exonerate_SOURCES) $(EXTRA_exonerate_SOURCES) $(ipcress_SOURCES) $(exonerate_server_SOURCES) OBJECTS = $(exonerate_OBJECTS) $(ipcress_OBJECTS) $(exonerate_server_OBJECTS) all: all-redirect .SUFFIXES: .SUFFIXES: .S .c .o .s $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps src/program/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status mostlyclean-binPROGRAMS: clean-binPROGRAMS: -test -z "$(bin_PROGRAMS)" || rm -f $(bin_PROGRAMS) distclean-binPROGRAMS: maintainer-clean-binPROGRAMS: install-binPROGRAMS: $(bin_PROGRAMS) @$(NORMAL_INSTALL) $(mkinstalldirs) $(DESTDIR)$(bindir) @list='$(bin_PROGRAMS)'; for p in $$list; do \ if test -f $$p; then \ echo " $(INSTALL_PROGRAM) $$p $(DESTDIR)$(bindir)/`echo $$p|sed 's/$(EXEEXT)$$//'|sed '$(transform)'|sed 's/$$/$(EXEEXT)/'`"; \ $(INSTALL_PROGRAM) $$p $(DESTDIR)$(bindir)/`echo $$p|sed 's/$(EXEEXT)$$//'|sed '$(transform)'|sed 's/$$/$(EXEEXT)/'`; \ else :; fi; \ done uninstall-binPROGRAMS: @$(NORMAL_UNINSTALL) list='$(bin_PROGRAMS)'; for p in $$list; do \ rm -f $(DESTDIR)$(bindir)/`echo $$p|sed 's/$(EXEEXT)$$//'|sed '$(transform)'|sed 's/$$/$(EXEEXT)/'`; \ done .c.o: $(COMPILE) -c $< .s.o: $(COMPILE) -c $< .S.o: $(COMPILE) -c $< mostlyclean-compile: -rm -f *.o core *.core clean-compile: distclean-compile: -rm -f *.tab.c maintainer-clean-compile: exonerate: $(exonerate_OBJECTS) $(exonerate_DEPENDENCIES) @rm -f exonerate $(LINK) $(exonerate_LDFLAGS) $(exonerate_OBJECTS) $(exonerate_LDADD) $(LIBS) ipcress: $(ipcress_OBJECTS) $(ipcress_DEPENDENCIES) @rm -f ipcress $(LINK) $(ipcress_LDFLAGS) $(ipcress_OBJECTS) $(ipcress_LDADD) $(LIBS) exonerate-server: $(exonerate_server_OBJECTS) $(exonerate_server_DEPENDENCIES) @rm -f exonerate-server $(LINK) $(exonerate_server_LDFLAGS) $(exonerate_server_OBJECTS) $(exonerate_server_LDADD) $(LIBS) tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = src/program distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done $(MAKE) $(AM_MAKEFLAGS) top_distdir="$(top_distdir)" distdir="$(distdir)" dist-hook info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-binPROGRAMS install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall-binPROGRAMS uninstall: uninstall-am all-am: Makefile $(PROGRAMS) all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: $(mkinstalldirs) $(DESTDIR)$(bindir) mostlyclean-generic: clean-generic: -test -z "$(CLEANFILES)" || rm -f $(CLEANFILES) distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(BUILT_SOURCES)$(MAINTAINERCLEANFILES)" || rm -f $(BUILT_SOURCES) $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-binPROGRAMS mostlyclean-compile \ mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-binPROGRAMS clean-compile clean-tags clean-generic \ mostlyclean-am clean: clean-am distclean-am: distclean-binPROGRAMS distclean-compile distclean-tags \ distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-binPROGRAMS \ maintainer-clean-compile maintainer-clean-tags \ maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: mostlyclean-binPROGRAMS distclean-binPROGRAMS clean-binPROGRAMS \ maintainer-clean-binPROGRAMS uninstall-binPROGRAMS install-binPROGRAMS \ mostlyclean-compile distclean-compile clean-compile \ maintainer-clean-compile tags mostlyclean-tags distclean-tags \ clean-tags maintainer-clean-tags distdir info-am info dvi-am dvi check \ check-am installcheck-am installcheck install-exec-am install-exec \ install-data-am install-data install-am install uninstall-am uninstall \ all-redirect all-am all installdirs mostlyclean-generic \ distclean-generic clean-generic maintainer-clean-generic clean \ mostlyclean distclean maintainer-clean c4_model_archive.h: $(BOOTSTRAPPER) $(BOOTSTRAPPER) --compiled no c4_model_archive.a: $(BOOTSTRAPPER) $(BOOTSTRAPPER) --compiled no # Special viterbi.o and scheduler.o for compiled model linking viterbi.o: $(top_srcdir)/src/c4/viterbi.c $(CC) $(CFLAGS) $(INCLUDES) -I. \ -DUSE_COMPILED_MODELS \ -c $(top_srcdir)/src/c4/viterbi.c scheduler.o: $(top_srcdir)/src/sdp/scheduler.c $(CC) $(CFLAGS) $(INCLUDES) -I. \ -DUSE_COMPILED_MODELS \ -c $(top_srcdir)/src/sdp/scheduler.c # Remove built sources from distribution # (will be able to use nodist_exonerate_SOURCES in automake 1.5) # nodist_exonerate_SOURCES = c4_model_archive.a dist-hook: for file in $(EXTRA_exonerate_SOURCES) ; \ do rm -f $(distdir)/$$file ; done ; # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/src/program/exonerate.c0000644000175000017500000001256111162714320015026 00000000000000/****************************************************************\ * * * exonerate : a generic sequence comparison tool * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "argument.h" #include "analysis.h" #include "gam.h" #include "optimal.h" #include "codegen.h" #include "fastadb.h" #include "est2genome.h" #include "affine.h" #include "ner.h" #include "intron.h" #include "heuristic.h" #include "protein2dna.h" #include "frameshift.h" #include "alphabet.h" #include "hspset.h" #include "match.h" #include "alignment.h" #include "sar.h" #include "sdp.h" #include "splice.h" int Argument_main(Argument *arg){ register Analysis *analysis; register ArgumentSet *as_input = ArgumentSet_create("Sequence Input Options"); GPtrArray *query_path_list, *target_path_list; Alphabet_Type query_type, target_type; gint query_chunk_id, target_chunk_id, query_chunk_total, target_chunk_total, verbosity; /**/ ArgumentSet_add_option(as_input, 'q', "query", "path", "Specify query sequences as a fasta format file", NULL, NULL, &query_path_list); ArgumentSet_add_option(as_input, 't', "target", "path", "Specify target sequences as a fasta format file", NULL, NULL, &target_path_list); /**/ ArgumentSet_add_option(as_input, 'Q', "querytype", "alphabet type", "Specify query alphabet type", "unknown", Alphabet_Argument_parse_alphabet_type, &query_type); ArgumentSet_add_option(as_input, 'T', "targettype", "alphabet type", "Specify target alphabet type", "unknown", Alphabet_Argument_parse_alphabet_type, &target_type); /**/ ArgumentSet_add_option(as_input, '\0', "querychunkid", NULL, "Specify query job number", "0", Argument_parse_int, &query_chunk_id); ArgumentSet_add_option(as_input, '\0', "targetchunkid", NULL, "Specify target job number", "0", Argument_parse_int, &target_chunk_id); ArgumentSet_add_option(as_input, '\0', "querychunktotal", NULL, "Specify total number of query jobs", "0", Argument_parse_int, &query_chunk_total); ArgumentSet_add_option(as_input, '\0', "targetchunktotal", NULL, "Specify total number of target jobs", "0", Argument_parse_int, &target_chunk_total); /**/ ArgumentSet_add_option(as_input, 'V', "verbose", "level", "Show search progress", "1", Argument_parse_int, &verbosity); /**/ Argument_absorb_ArgumentSet(arg, as_input); Translate_ArgumentSet_create(arg); Analysis_ArgumentSet_create(arg); FastaDB_ArgumentSet_create(arg); GAM_ArgumentSet_create(arg); Viterbi_ArgumentSet_create(arg); Codegen_ArgumentSet_create(arg); Heuristic_ArgumentSet_create(arg); SDP_ArgumentSet_create(arg); BSDP_ArgumentSet_create(arg); /**/ Sequence_ArgumentSet_create(arg); Match_ArgumentSet_create(arg); Seeder_ArgumentSet_create(arg); Affine_ArgumentSet_create(arg); NER_ArgumentSet_create(arg); Intron_ArgumentSet_create(arg); Frameshift_ArgumentSet_create(arg); Alphabet_ArgumentSet_create(arg); HSPset_ArgumentSet_create(arg); Alignment_ArgumentSet_create(arg); SAR_ArgumentSet_create(arg); Splice_ArgumentSet_create(arg); /**/ Argument_process(arg, "exonerate", "A generic sequence comparison tool\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2009.\n", "\n" "Examples of use:\n" "\n" "1. Ungapped alignment of any DNA or protein sequences:\n" " exonerate queries.fa targets.fa\n" "2. Gapped alignment of Mouse proteins to Fugu proteins:\n" " exonerate --model affine:local mouse.fa fugu.fa\n" "3. Find top 10 matches of each EST to a genome:\n" " exonerate --model est2genome --bestn 10 est.fa genome.fa\n" "4. Find proteins with at least a 50% match to a genome:\n" " exonerate --model protein2genome --percent 50 p.fa g.fa\n" "5. Perform a full Smith-Waterman-Gotoh alignment:\n" " exonerate --model affine:local --exhaustive yes a.fa b.fa\n" "6. Many more combinations are possible. To find out more:\n" " exonerate --help\n" " man exonerate\n" "\n"); if(verbosity > 0) Argument_info(arg); /**/ analysis = Analysis_create(query_path_list, query_type, query_chunk_id, query_chunk_total, target_path_list, target_type, target_chunk_id, target_chunk_total, verbosity); Analysis_process(analysis); Analysis_destroy(analysis); /**/ if(verbosity > 0) g_print("-- completed exonerate analysis\n"); return 0; } /**/ exonerate-2.4.0/src/program/ipcress.c0000644000175000017500000003341411162714320014504 00000000000000/****************************************************************\ * * * ipcress : in-silico PCR experiment simulation system * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include #include /* For atoi() */ #include /* For strlen() */ #include "argument.h" #include "pcr.h" #include "fastadb.h" #include "lineparse.h" /* Input Format: GO9892 CAAAGCTCTGTCCCAAAAAA TCTCTGCCCCTTTTACAGTG 90 100 Output Format: ...AAAAAAAAAAAAAAAAA.........................# forward |||||||||||||||||--> AAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAA # primers <--||||||||||||||| ...........................AAAAAAAAAAAAAAA...# revcomp ipcress: seq_id experiment_id product_length \ 3 {primer_5:[A|B] pos_5, mismatch_5} \ # 3 {primer_3:[A|B] pos_3, mismatch_3} \ # 3 description: normal, revcomp, single_A, single_B # 1 */ typedef struct { FILE *input_fp; LineParse *input_lp; gint mismatches; gint seed_length; gint memory_limit; gboolean display_pretty; gboolean display_products; } Ipcress; typedef enum { Ipcress_Type_FORWARD, Ipcress_Type_REVCOMP, Ipcress_Type_SINGLE_A, Ipcress_Type_SINGLE_B } Ipcress_Type; static gchar *Ipcress_Type_get_description(Ipcress_Type ipcress_type){ register gchar *description = NULL; switch(ipcress_type){ case Ipcress_Type_FORWARD: description = "forward"; break; case Ipcress_Type_REVCOMP: description = "revcomp"; break; case Ipcress_Type_SINGLE_A: description = "single_A"; break; case Ipcress_Type_SINGLE_B: description = "single_B"; break; default: g_error("Unknown ipcress type [%d]", ipcress_type); break; } return description; } static Ipcress_Type Ipcress_Type_create(PCR_Probe *probe_a, PCR_Probe *probe_b){ register Ipcress_Type ipcress_type; register PCR_Primer *primer_a = probe_a->pcr_primer, *primer_b = probe_b->pcr_primer; register PCR_Experiment *pcr_experiment = primer_a->pcr_experiment; if((primer_a == pcr_experiment->primer_a) && (primer_b == pcr_experiment->primer_b)) ipcress_type = Ipcress_Type_FORWARD; else ipcress_type = Ipcress_Type_REVCOMP; if(primer_a == primer_b){ if(primer_a == pcr_experiment->primer_a) ipcress_type = Ipcress_Type_SINGLE_A; else ipcress_type = Ipcress_Type_SINGLE_B; } return ipcress_type; } static gboolean Ipcress_report_product(Sequence *sequence, PCR_Match *match_a, PCR_Match *match_b, gint product_length, gpointer user_data){ register PCR_Probe *probe_a = match_a->pcr_probe, *probe_b = match_b->pcr_probe; register PCR_Primer *primer_a = probe_a->pcr_primer, *primer_b = probe_b->pcr_primer; register PCR_Experiment *pcr_experiment = primer_a->pcr_experiment; register Ipcress *ipcress = user_data; register gint i; register Ipcress_Type ipcress_type = Ipcress_Type_create(probe_a, probe_b); register gchar *description, *s; register Sequence *subseq, *seq; g_assert(probe_a->strand == Sequence_Strand_FORWARD); g_assert(probe_b->strand == Sequence_Strand_REVCOMP); g_assert(pcr_experiment == primer_b->pcr_experiment); description = Ipcress_Type_get_description(ipcress_type); if(ipcress->display_pretty){ g_print("\n"); g_print("Ipcress result\n"); g_print("--------------\n"); g_print(" Experiment: %s\n", pcr_experiment->id); g_print(" Primers: %c %c\n", (primer_a == pcr_experiment->primer_a)?'A':'B', (primer_b == pcr_experiment->primer_a)?'A':'B'); g_print(" Target: %s%s%s\n", sequence->id, sequence->def?" ":"", sequence->def?sequence->def:""); g_print(" Matches: %d/%d %d/%d\n", primer_a->length-match_a->mismatch, primer_a->length, primer_b->length-match_b->mismatch, primer_b->length); g_print(" Product: %d bp (range %d-%d)\n", product_length, pcr_experiment->min_product_len, pcr_experiment->max_product_len); g_print("Result type: %s\n", description); g_print("\n"); /**/ s = Sequence_get_substr(sequence, match_a->position, primer_a->length); g_print("...%s.......", s); /* Start of top line */ g_free(s); for(i = 0; i < primer_b->length; i++) g_print("."); g_print("... # forward\n"); g_print(" "); /* Start of upper match line */ for(i = 0; i < primer_a->length; i++) if(primer_a->forward[i] == Sequence_get_symbol(sequence, match_a->position+i)) g_print("|"); else g_print(" "); g_print("-->\n"); g_print("5'-%.*s-3' ", /* Start of primer line */ primer_a->length, primer_a->forward); Sequence_reverse_in_place(primer_b->forward, primer_b->length); g_print("3'-%.*s-5' # primers\n", primer_b->length, primer_b->forward); Sequence_reverse_in_place(primer_b->forward, primer_b->length); g_print(" "); /* Start of lower match line */ for(i = 0; i < primer_a->length; i++) g_print(" "); g_print(" <--"); for(i = 0; i < primer_b->length; i++) if(primer_b->revcomp[i] == Sequence_get_symbol(sequence, match_b->position+i)) g_print("|"); else g_print(" "); g_print("\n"); g_print("..."); /* Start of base line */ for(i = 0; i < primer_a->length; i++) g_print("."); g_print("......."); seq = Sequence_filter(sequence, Alphabet_Filter_Type_COMPLEMENT); s = Sequence_get_substr(seq, match_b->position, primer_b->length); g_print("%s", s); Sequence_destroy(seq); g_free(s); g_print("... # revcomp\n"); g_print("--\n"); } g_print("ipcress: %s %s %d %c %d %d %c %d %d %s\n", sequence->id, pcr_experiment->id, product_length, /**/ (primer_a == pcr_experiment->primer_a)?'A':'B', match_a->position, match_a->mismatch, /**/ (primer_b == pcr_experiment->primer_a)?'A':'B', match_b->position, match_b->mismatch, /**/ description); if(ipcress->display_products){ subseq = Sequence_subseq(sequence, match_a->position, product_length); /* Should report product on the forward strand */ if(ipcress_type == Ipcress_Type_REVCOMP) seq = Sequence_revcomp(subseq); else seq = Sequence_share(subseq); Sequence_destroy(subseq); g_print(">%s_product_%d seq %s start %d length %d\n", pcr_experiment->id, ++pcr_experiment->product_count, sequence->id, match_a->position, product_length); Sequence_print_fasta_block(seq, stdout); Sequence_destroy(seq); } return FALSE; } static Ipcress *Ipcress_create(gchar *ipcress_path, gint mismatches, gint seed_length, gint memory_limit, gboolean display_pretty, gboolean display_products){ register Ipcress *ipcress = g_new(Ipcress, 1); ipcress->input_fp = fopen(ipcress_path, "r"); if(!ipcress->input_fp) g_error("Could not open ipcress file [%s]", ipcress_path); ipcress->input_lp = LineParse_create(ipcress->input_fp); ipcress->mismatches = mismatches; ipcress->seed_length = seed_length; ipcress->memory_limit = memory_limit; ipcress->display_pretty = display_pretty; ipcress->display_products = display_products; return ipcress; } static PCR *Ipcress_prepare_PCR(Ipcress *ipcress){ register PCR *pcr = PCR_create(Ipcress_report_product, ipcress, ipcress->mismatches, ipcress->seed_length); register gsize memory_usage; register gint experiment_count = 0, word_count; register gchar *id, *primer_a, *primer_b; register gint min_product_len, max_product_len; /* Read as many experiments as possible into memory */ while((word_count = LineParse_word(ipcress->input_lp)) != EOF){ if(word_count == 0) continue; if(word_count != 5) g_error("Bad line in ipcress file with [%d] fields", word_count); id = ipcress->input_lp->word->pdata[0]; primer_a = ipcress->input_lp->word->pdata[1]; primer_b = ipcress->input_lp->word->pdata[2]; g_strup(primer_a); g_strup(primer_b); min_product_len = atoi(ipcress->input_lp->word->pdata[3]); max_product_len = atoi(ipcress->input_lp->word->pdata[4]); memory_usage = PCR_add_experiment(pcr, id, primer_a, primer_b, min_product_len, max_product_len); experiment_count++; if(memory_usage > (ipcress->memory_limit << 20)){ break; } } if(!experiment_count){ PCR_destroy(pcr); return NULL; } g_message("Loaded [%d] experiments", experiment_count); PCR_prepare(pcr); return pcr; } static void Ipcress_analyse(Ipcress *ipcress, GPtrArray *sequence_path_list){ register PCR *pcr; register Alphabet *alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE); register FastaDB *fdb = FastaDB_open_list(sequence_path_list, alphabet); register FastaDB_Seq *fdbs; register Sequence *seq; /* While not at the end of the input file */ while((pcr = Ipcress_prepare_PCR(ipcress))){ /* Analyse each sequence in DB */ while((fdbs = FastaDB_next(fdb, FastaDB_Mask_ALL))){ seq = Sequence_filter(fdbs->seq, Alphabet_Filter_Type_UNMASKED); PCR_simulate(pcr, seq); Sequence_destroy(seq); FastaDB_Seq_destroy(fdbs); } FastaDB_rewind(fdb); PCR_destroy(pcr); } FastaDB_close(fdb); Alphabet_destroy(alphabet); return; } static void Ipcress_destroy(Ipcress *ipcress){ LineParse_destroy(ipcress->input_lp); fclose(ipcress->input_fp); g_free(ipcress); return; } int Argument_main(Argument *arg){ gchar *ipcress_path; GPtrArray *sequence_path_list; gint mismatches, memory_limit, seed_length; gboolean display_pretty, display_products; /**/ register ArgumentSet *as_input = ArgumentSet_create("File input options"); register ArgumentSet *as_parameter = ArgumentSet_create("Ipcress parameters"); register Ipcress *ipcress; /**/ ArgumentSet_add_option(as_input, 'i', "input", "ipcress file", "Primer data in IPCRESS file format", NULL, Argument_parse_string, &ipcress_path); ArgumentSet_add_option(as_input, 's', "sequence", "fasta paths", "Fasta format sequence database", NULL, NULL, &sequence_path_list); ArgumentSet_add_option(as_parameter, 'm', "mismatch", "mismatches", "number of mismatches allowed per primer", "0", Argument_parse_int, &mismatches); ArgumentSet_add_option(as_parameter, 'M', "memory", "Mb", "Memory limit for FSM data", "32", Argument_parse_int, &memory_limit); ArgumentSet_add_option(as_parameter, 'p', "pretty", NULL, "Include \'pretty\' output", "TRUE", Argument_parse_boolean, &display_pretty); ArgumentSet_add_option(as_parameter, 'S', "seed", NULL, "Seed length (use zero for full length)", "12", Argument_parse_int, &seed_length); ArgumentSet_add_option(as_parameter, 'P', "products", NULL, "Report PCR products", "FALSE", Argument_parse_boolean, &display_products); Argument_absorb_ArgumentSet(arg, as_input); Argument_absorb_ArgumentSet(arg, as_parameter); Argument_process(arg, "ipcress", "In-silico PCR experiment simulation system\n" "Guy St.C. Slater. guy@ebi.ac.uk June 2001.\n", NULL); ipcress = Ipcress_create(ipcress_path, mismatches, seed_length, memory_limit, display_pretty, display_products); Ipcress_analyse(ipcress, sequence_path_list); Ipcress_destroy(ipcress); g_print("-- completed ipcress analysis\n"); return 0; } exonerate-2.4.0/src/program/exonerate-server.c0000644000175000017500000011212511222476340016333 00000000000000/****************************************************************\ * * * The exonerate server * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include #include #include #include /* For isspace() */ #include "argument.h" #include "socket.h" #include "dataset.h" #include "index.h" #include "hspset.h" typedef struct { Dataset *dataset; Index *index; gint verbosity; } Exonerate_Server; static void Exonerate_Server_memory_usage(Exonerate_Server *exonerate_server){ register guint64 dataset_memory, index_memory; dataset_memory = Dataset_memory_usage(exonerate_server->dataset), index_memory = exonerate_server->index ? (Index_memory_usage(exonerate_server->index) - dataset_memory) : 0; if(exonerate_server->verbosity > 0) g_message("Memory usage: dataset: %d Mb, index: %d Mb, Total %d Mb", (gint)(dataset_memory >> 20), (gint)(index_memory >> 20), (gint)((dataset_memory+index_memory) >> 20)); return; } static Exonerate_Server *Exonerate_Server_create(gchar *input_path, gboolean preload, gint verbosity){ register Exonerate_Server *exonerate_server = g_new0(Exonerate_Server, 1); if(verbosity > 0) g_message("Starting server ..."); if(Dataset_check_filetype(input_path)){ exonerate_server->dataset = Dataset_read(input_path); } else if(Index_check_filetype(input_path)){ exonerate_server->index = Index_open(input_path); exonerate_server->dataset = Dataset_share(exonerate_server->index->dataset); if(preload) Index_preload_index(exonerate_server->index); } else { g_error("Unknown filetype for input file [%s]", input_path); } if(preload) Dataset_preload_seqs(exonerate_server->dataset); exonerate_server->verbosity = verbosity; if(verbosity >= 1){ Dataset_info(exonerate_server->dataset); Index_info(exonerate_server->index); } return exonerate_server; } static void Exonerate_Server_destroy(Exonerate_Server *exonerate_server){ Dataset_destroy(exonerate_server->dataset); if(exonerate_server->index) Index_destroy(exonerate_server->index); Dataset_destroy(exonerate_server->dataset); g_free(exonerate_server); return; } /**/ typedef struct { Alphabet *query_alphabet; Sequence_Strand query_strand; gboolean query_is_masked; gboolean revcomp_query; gboolean revcomp_target; Sequence *query; HSP_Param *hsp_param; /**/ gint seed_repeat; gint dna_hsp_threshold; gint protein_hsp_threshold; gint codon_hsp_threshold; gint dna_word_limit; gint protein_word_limit; gint codon_word_limit; gint dna_hsp_dropoff; gint protein_hsp_dropoff; gint codon_hsp_dropoff; gint geneseed_threshold; gint geneseed_repeat; gint max_query_span; gint max_target_span; } Exonerate_Server_Connection; static Exonerate_Server_Connection *Exonerate_Server_Connection_create(void){ register Exonerate_Server_Connection *connection = g_new(Exonerate_Server_Connection, 1); register HSPset_ArgumentSet *has = HSPset_ArgumentSet_create(NULL); connection->query = NULL; connection->hsp_param = NULL; connection->query_alphabet = NULL; connection->query_strand = Sequence_Strand_UNKNOWN; connection->query_is_masked = FALSE; connection->revcomp_query = FALSE; connection->revcomp_target = FALSE; /**/ connection->seed_repeat = has->seed_repeat; connection->dna_hsp_threshold = has->dna_hsp_threshold; connection->protein_hsp_threshold = has->protein_hsp_threshold; connection->codon_hsp_threshold = has->codon_hsp_threshold; connection->dna_word_limit = has->dna_word_limit; connection->protein_word_limit = has->protein_word_limit; connection->codon_word_limit = has->codon_word_limit; connection->dna_hsp_dropoff = has->dna_hsp_dropoff; connection->protein_hsp_dropoff = has->protein_hsp_dropoff; connection->codon_hsp_dropoff = has->codon_hsp_dropoff; connection->geneseed_threshold = has->geneseed_threshold; connection->geneseed_repeat = has->geneseed_repeat; connection->max_query_span = 0; connection->max_target_span = 0; return connection; } static void Exonerate_Server_Connection_destroy( Exonerate_Server_Connection *connection){ if(connection->query_alphabet) Alphabet_destroy(connection->query_alphabet); if(connection->hsp_param) HSP_Param_destroy(connection->hsp_param); if(connection->query) Sequence_destroy(connection->query); g_free(connection); return; } /**/ static gpointer Exonerate_Server_Connection_open(gpointer user_data){ return Exonerate_Server_Connection_create(); } static void Exonerate_Server_Connection_close(gpointer connection_data, gpointer user_data){ register Exonerate_Server_Connection *server_connection = connection_data; Exonerate_Server_Connection_destroy(server_connection); return; } static void Exonerate_Server_Connection_revcomp_query( Exonerate_Server_Connection *connection){ register Sequence *rc_seq; g_assert(connection->query); rc_seq = Sequence_revcomp(connection->query); Sequence_destroy(connection->query); connection->query = rc_seq; connection->revcomp_query = connection->revcomp_query?FALSE:TRUE; return; } static void Exonerate_Server_Connection_revcomp_target( Exonerate_Server_Connection *connection){ connection->revcomp_target = connection->revcomp_target?FALSE:TRUE; return; } static GPtrArray *Exonerate_Server_get_word_list(gchar *msg){ register GPtrArray *word_list = g_ptr_array_new(); register gchar *prev, *ptr; for(ptr = msg; isspace(*ptr); ptr++); /* skip start */ prev = ptr; while(*ptr){ if(isspace(*ptr)){ *ptr = '\0'; do { ptr++; } while(isspace(*ptr)); if(!*ptr) break; g_ptr_array_add(word_list, prev); /* add a word */ prev = ptr; } ptr++; } if(prev != ptr) g_ptr_array_add(word_list, prev); /* add final word */ return word_list; } static gchar *Exonerate_Server_help(void){ return g_strdup_printf( "exonerate-server commands:\n" " help : print this message\n" " version : show version information\n" " exit : disconnect from server\n" " dbinfo : show database info\n" " : \n" "\n" " lookup : get internal from external identifier\n" " get info : get sequence info \n" " : []\n" " get seq : get sequence\n" " get subseq : get subsequence\n" "\n" " set query : set query sequence\n" " get hsps : get hsps against current query\n" " : { } \n" "\n" " revcomp \n" " set param \n" "\n" "\n" " valid parameters:\n" " seedrepeat\n" "\n" " dnahspthreshold\n" " proteinhspthreshold\n" " codonhspthreshold\n" "\n" " dnawordlimit\n" " proteinwordlimit\n" " codonwordlimit\n" "\n" " geneseedthreshold\n" " geneseedrepeat\n" " maxqueryspan\n" " maxtargetspan\n" "--\n"); } static gchar *Exonerate_Server_get_info(Dataset *dataset, gint num){ register Dataset_Sequence *ds; register Sequence *seq; register gchar *reply; if((num >= 0) && (num < dataset->seq_list->len)){ ds = dataset->seq_list->pdata[num]; seq = Dataset_get_sequence(dataset, num); reply = g_strdup_printf("seqinfo: %d %d %s%s%s\n", (gint)ds->key->length, (gint)ds->gcg_checksum, ds->id, seq->def?" ":"", seq->def?seq->def:""); Sequence_destroy(seq); } else { reply = g_strdup_printf("error: sequence num out of range [%d]\n", num); } return reply; } static gchar *Exonerate_Server_get_seq(Dataset *dataset, gint num){ register Dataset_Sequence *ds; register Sequence *seq; register gchar *str, *reply; if((num >= 0) && (num < dataset->seq_list->len)){ ds = dataset->seq_list->pdata[num]; seq = Dataset_get_sequence(dataset, num); str = Sequence_get_str(seq); Sequence_destroy(seq); reply = g_strdup_printf("seq: %s\n", str); g_free(str); } else { reply = g_strdup_printf("error: sequence num out of range [%d]\n", num); } return reply; } static gchar *Exonerate_Server_get_subseq(Dataset *dataset, gint num, gint start, gint len){ register gchar *reply, *str; register Dataset_Sequence *ds; register Sequence *seq, *subseq; if((num >= 0) && (num < dataset->seq_list->len)){ ds = dataset->seq_list->pdata[num]; if(len <= 0){ reply = g_strdup_printf("error: subseq len (%d) must be >= 0\n", len); } else if((start >= 0) && ((start+len) <= ds->key->length)){ seq = Dataset_get_sequence(dataset, num); subseq = Sequence_subseq(seq, start, len); Sequence_destroy(seq); str = Sequence_get_str(subseq); Sequence_destroy(subseq); reply = g_strdup_printf("subseq: %s\n", str); g_free(str); } else { reply = g_strdup_printf("error: subsequence beyond seq len [%d]\n", ds->key->length); } } else { reply = g_strdup_printf("error: sequence num out of range [%d]\n", num); } return reply; } static gchar *Exonerate_Server_get_hsps(Exonerate_Server *exonerate_server, Exonerate_Server_Connection *connection){ register gint i, j, hsp_total = 0; register GPtrArray *index_hsp_set_list; register Index_HSPset *index_hsp_set; register gchar *reply; register GString *str; register HSP *hsp; g_assert(connection->hsp_param); g_assert(connection->query); if(connection->revcomp_target && (connection->hsp_param->match->type != Match_Type_PROTEIN2DNA)) return g_strdup_printf( "error: revcomp target only available for protein2dna matches"); if(connection->geneseed_threshold > 0){ if(connection->geneseed_threshold < connection->hsp_param->threshold) return g_strdup_printf( "error: geneseed threshold must be >= hsp threshold"); index_hsp_set_list = Index_get_HSPsets_geneseed(exonerate_server->index, connection->hsp_param, connection->query, connection->revcomp_target, connection->geneseed_threshold, connection->geneseed_repeat, connection->max_query_span, connection->max_target_span); } else { index_hsp_set_list = Index_get_HSPsets(exonerate_server->index, connection->hsp_param, connection->query, connection->revcomp_target); } if(index_hsp_set_list){ str = g_string_sized_new(1024); for(i = 0; i < index_hsp_set_list->len; i++){ index_hsp_set = index_hsp_set_list->pdata[i]; g_assert(index_hsp_set->hsp_set->is_finalised); g_string_sprintfa(str, "hspset: %d", index_hsp_set->target_id); /**/ hsp_total += index_hsp_set->hsp_set->hsp_list->len; for(j = 0; j < index_hsp_set->hsp_set->hsp_list->len; j++){ hsp = index_hsp_set->hsp_set->hsp_list->pdata[j]; g_string_sprintfa(str, " %d %d %d", hsp->query_start, hsp->target_start, hsp->length); } g_string_sprintfa(str, "\n"); Index_HSPset_destroy(index_hsp_set); } if(exonerate_server->verbosity > 1) g_message("served [%d] HSPsets containing [%d] hsps", index_hsp_set_list->len, hsp_total); g_ptr_array_free(index_hsp_set_list, TRUE); reply = str->str; g_assert(reply); g_string_free(str, FALSE); } else { reply = g_strdup_printf("hspset: empty\n"); } return reply; } static Sequence *Exonerate_Server_get_query(Index *index, Exonerate_Server_Connection *connection, gchar *query){ register Alphabet_Type alphabet_type; register Match_Type match_type; register Match *match; if(!connection->query_alphabet){ alphabet_type = Alphabet_Type_guess(query); if(alphabet_type == Alphabet_Type_DNA){ connection->query_strand = Sequence_Strand_FORWARD; } else { g_assert(alphabet_type == Alphabet_Type_PROTEIN); connection->query_strand = Sequence_Strand_UNKNOWN; if((index->dataset->alphabet->type == Alphabet_Type_DNA) && (!(index->header->type & 1))){ g_message("Cannot use protein query with untranslated DNA index"); return NULL; } } connection->query_alphabet = Alphabet_create(alphabet_type, connection->query_is_masked); match_type = Match_Type_find(alphabet_type, index->dataset->alphabet->type, FALSE); /* FIXME: use Match_Type_find with translate_both for codon alignments */ match = Match_find(match_type); g_assert(match); connection->hsp_param = HSP_Param_create(match, FALSE); connection->hsp_param->seed_repeat = connection->seed_repeat; /**/ HSP_Param_set_dna_hsp_threshold(connection->hsp_param, connection->dna_hsp_threshold); HSP_Param_set_protein_hsp_threshold(connection->hsp_param, connection->protein_hsp_threshold); HSP_Param_set_codon_hsp_threshold(connection->hsp_param, connection->codon_hsp_threshold); /**/ HSP_Param_set_dna_word_limit(connection->hsp_param, connection->dna_word_limit); HSP_Param_set_protein_word_limit(connection->hsp_param, connection->protein_word_limit); HSP_Param_set_codon_word_limit(connection->hsp_param, connection->codon_word_limit); /**/ HSP_Param_set_dna_hsp_dropoff(connection->hsp_param, connection->dna_hsp_dropoff); HSP_Param_set_protein_hsp_dropoff(connection->hsp_param, connection->protein_hsp_dropoff); HSP_Param_set_codon_hsp_dropoff(connection->hsp_param, connection->codon_hsp_dropoff); /**/ g_assert(connection->hsp_param); HSP_Param_set_wordlen(connection->hsp_param, index->header->word_length); } g_assert(connection->query_alphabet); return Sequence_create("query", NULL, query, 0, connection->query_strand, connection->query_alphabet); } static gchar *Exonerate_Server_set_param_seedrepeat( Exonerate_Server_Connection *connection, GPtrArray *word_list){ register gint seed_repeat = atoi(word_list->pdata[3]); if(seed_repeat < 1) return g_strdup_printf("error: seedrepeat must be > 0\n"); connection->seed_repeat = seed_repeat; if(connection->hsp_param) connection->hsp_param->seed_repeat = seed_repeat; return g_strdup_printf("ok: set\n"); } /**/ static gchar *Exonerate_Server_set_param_dnahspthreshold( Exonerate_Server_Connection *connection, GPtrArray *word_list){ register gint dnahspthreshold = atoi(word_list->pdata[3]); if(dnahspthreshold < 1) return g_strdup_printf("error: dnahspthreshold must be > 0\n"); connection->dna_hsp_threshold = dnahspthreshold; if(connection->hsp_param) HSP_Param_set_dna_hsp_threshold(connection->hsp_param, dnahspthreshold); return g_strdup_printf("ok: set\n"); } static gchar *Exonerate_Server_set_param_proteinhspthreshold( Exonerate_Server_Connection *connection, GPtrArray *word_list){ register gint proteinhspthreshold = atoi(word_list->pdata[3]); if(proteinhspthreshold < 1) return g_strdup_printf("error: proteinhspthreshold must be > 0\n"); connection->protein_hsp_threshold = proteinhspthreshold; if(connection->hsp_param) HSP_Param_set_protein_hsp_threshold(connection->hsp_param, proteinhspthreshold); return g_strdup_printf("ok: set\n"); } static gchar *Exonerate_Server_set_param_codonhspthreshold( Exonerate_Server_Connection *connection, GPtrArray *word_list){ register gint codonhspthreshold = atoi(word_list->pdata[3]); if(codonhspthreshold < 1) return g_strdup_printf("error: codonhspthreshold must be > 0\n"); connection->codon_hsp_threshold = codonhspthreshold; if(connection->hsp_param) HSP_Param_set_codon_hsp_threshold(connection->hsp_param, codonhspthreshold); return g_strdup_printf("ok: set\n"); } /**/ static gchar *Exonerate_Server_set_param_dnawordlimit( Exonerate_Server_Connection *connection, GPtrArray *word_list){ register gint dnawordlimit = atoi(word_list->pdata[3]); if(dnawordlimit < 0) return g_strdup_printf("error: dnawordlimit must be >= 0\n"); connection->dna_word_limit = dnawordlimit; if(connection->hsp_param) HSP_Param_set_dna_word_limit(connection->hsp_param, dnawordlimit); return g_strdup_printf("ok: set\n"); } static gchar *Exonerate_Server_set_param_proteinwordlimit( Exonerate_Server_Connection *connection, GPtrArray *word_list){ register gint proteinwordlimit = atoi(word_list->pdata[3]); if(proteinwordlimit < 0) return g_strdup_printf("error: proteinwordlimit must be >= 0\n"); connection->protein_word_limit = proteinwordlimit; if(connection->hsp_param) HSP_Param_set_protein_word_limit(connection->hsp_param, proteinwordlimit); return g_strdup_printf("ok: set\n"); } static gchar *Exonerate_Server_set_param_codonwordlimit( Exonerate_Server_Connection *connection, GPtrArray *word_list){ register gint codonwordlimit = atoi(word_list->pdata[3]); if(codonwordlimit < 0) return g_strdup_printf("error: codonwordlimit must be >= 0\n"); connection->codon_word_limit = codonwordlimit; if(connection->hsp_param) HSP_Param_set_codon_word_limit(connection->hsp_param, codonwordlimit); return g_strdup_printf("ok: set\n"); } /**/ static gchar *Exonerate_Server_set_param_dnahspdropoff( Exonerate_Server_Connection *connection, GPtrArray *word_list){ register gint dnahspdropoff = atoi(word_list->pdata[3]); if(dnahspdropoff < 0) return g_strdup_printf("error: dnahspdropoff must be >= 0\n"); connection->dna_hsp_dropoff = dnahspdropoff; if(connection->hsp_param) HSP_Param_set_dna_hsp_dropoff(connection->hsp_param, dnahspdropoff); return g_strdup_printf("ok: set\n"); } static gchar *Exonerate_Server_set_param_proteinhspdropoff( Exonerate_Server_Connection *connection, GPtrArray *word_list){ register gint proteinhspdropoff = atoi(word_list->pdata[3]); if(proteinhspdropoff < 0) return g_strdup_printf("error: proteinhspdropoff must be >= 0\n"); connection->protein_hsp_dropoff = proteinhspdropoff; if(connection->hsp_param) HSP_Param_set_protein_hsp_dropoff(connection->hsp_param, proteinhspdropoff); return g_strdup_printf("ok: set\n"); } static gchar *Exonerate_Server_set_param_codonhspdropoff( Exonerate_Server_Connection *connection, GPtrArray *word_list){ register gint codonhspdropoff = atoi(word_list->pdata[3]); if(codonhspdropoff < 0) return g_strdup_printf("error: codonhspdropoff must be >= 0\n"); connection->codon_hsp_dropoff = codonhspdropoff; if(connection->hsp_param) HSP_Param_set_codon_hsp_dropoff(connection->hsp_param, codonhspdropoff); return g_strdup_printf("ok: set\n"); } /**/ static gchar *Exonerate_Server_set_param_geneseedthreshold( Exonerate_Server_Connection *connection, GPtrArray *word_list){ register gint geneseed_threshold = atoi(word_list->pdata[3]); if(geneseed_threshold < 0) return g_strdup_printf("error: geneseed_threshold must be >= 0\n"); connection->geneseed_threshold = geneseed_threshold; return g_strdup_printf("ok: set\n"); } static gchar *Exonerate_Server_set_param_geneseedrepeat( Exonerate_Server_Connection *connection, GPtrArray *word_list){ register gint geneseed_repeat = atoi(word_list->pdata[3]); if(geneseed_repeat <= 1) return g_strdup_printf("error: geneseed_repeat must be > 1\n"); connection->geneseed_repeat = geneseed_repeat; return g_strdup_printf("ok: set\n"); } static gchar *Exonerate_Server_set_param_max_query_span( Exonerate_Server_Connection *connection, GPtrArray *word_list){ register gint max_query_span= atoi(word_list->pdata[3]); if(max_query_span < 0) return g_strdup_printf("error: max_query_span must be >= 0\n"); connection->max_query_span = max_query_span; return g_strdup_printf("ok: set\n"); } static gchar *Exonerate_Server_set_param_max_target_span( Exonerate_Server_Connection *connection, GPtrArray *word_list){ register gint max_target_span= atoi(word_list->pdata[3]); if(max_target_span < 0) return g_strdup_printf("error: max_target_span must be >= 0\n"); connection->max_target_span = max_target_span; return g_strdup_printf("ok: set\n"); } static gchar *Exonerate_Server_set_param(Exonerate_Server_Connection *connection, GPtrArray *word_list){ register gchar *reply = NULL; register gchar *name = word_list->pdata[2]; if(!strcmp(name, "seedrepeat")){ reply = Exonerate_Server_set_param_seedrepeat(connection, word_list); } else if(!strcmp(name, "dnahspthreshold")){ reply = Exonerate_Server_set_param_dnahspthreshold(connection, word_list); } else if(!strcmp(name, "proteinhspthreshold")){ reply = Exonerate_Server_set_param_proteinhspthreshold(connection, word_list); } else if(!strcmp(name, "codonhspthreshold")){ reply = Exonerate_Server_set_param_codonhspthreshold(connection, word_list); } else if(!strcmp(name, "dnawordlimit")){ reply = Exonerate_Server_set_param_dnawordlimit(connection, word_list); } else if(!strcmp(name, "proteinwordlimit")){ reply = Exonerate_Server_set_param_proteinwordlimit(connection, word_list); } else if(!strcmp(name, "codonwordlimit")){ reply = Exonerate_Server_set_param_codonwordlimit(connection, word_list); } else if(!strcmp(name, "dnahspdropoff")){ reply = Exonerate_Server_set_param_dnahspdropoff(connection, word_list); } else if(!strcmp(name, "proteinhspdropoff")){ reply = Exonerate_Server_set_param_proteinhspdropoff(connection, word_list); } else if(!strcmp(name, "codonhspdropoff")){ reply = Exonerate_Server_set_param_codonhspdropoff(connection, word_list); } else if(!strcmp(name, "geneseedthreshold")){ reply = Exonerate_Server_set_param_geneseedthreshold(connection, word_list); } else if(!strcmp(name, "geneseedrepeat")){ reply = Exonerate_Server_set_param_geneseedrepeat(connection, word_list); } else if(!strcmp(name, "maxqueryspan")){ reply = Exonerate_Server_set_param_max_query_span(connection, word_list); } else if(!strcmp(name, "maxtargetspan")){ reply = Exonerate_Server_set_param_max_target_span(connection, word_list); } else { reply = g_strdup_printf( "warning: set param %s ignored by server\n", name); } g_assert(reply); return reply; } static gboolean Exonerate_Server_process(gchar *msg, gchar **reply, gpointer connection_data, gpointer user_data){ register gint msg_len = strlen(msg); register gboolean keep_connection = TRUE; register Exonerate_Server *server = user_data; register GPtrArray *word_list; register gchar *word, *id, *item, *query; register gint start, len, num; register Exonerate_Server_Connection *connection = connection_data; g_assert(msg); if(server->verbosity >= 3) g_print("Message: server received command [%s]\n", msg); if(!msg[msg_len-1] == '\n') msg[--msg_len] = '\0'; word_list = Exonerate_Server_get_word_list(msg); (*reply) = NULL; if(word_list->len){ word = word_list->pdata[0]; if(!strcmp(word, "help")){ (*reply) = Exonerate_Server_help(); } else if(!strcmp(word, "version")){ (*reply) = g_strdup_printf("version: exonerate-server %s\n", VERSION); } else if(!strcmp(word, "exit")){ (*reply) = NULL; keep_connection = FALSE; } else if(!strcmp(word, "dbinfo")){ (*reply) = g_strdup_printf("dbinfo: %s %s" " %" CUSTOM_GUINT64_FORMAT " %" CUSTOM_GUINT64_FORMAT " %" CUSTOM_GUINT64_FORMAT "\n", (server->dataset->header->type & 1)?"dna":"protein", (server->dataset->header->type & (1<<1)) ?"softmasked":"unmasked", server->dataset->header->number_of_seqs, server->dataset->header->max_seq_len, server->dataset->header->total_seq_len); } else if(!strcmp(word, "lookup")){ if(word_list->len == 2){ id = word_list->pdata[1]; num = Dataset_lookup_id(server->dataset, id); if(num == -1) (*reply) = g_strdup_printf("error: id not found\n"); else (*reply) = g_strdup_printf("lookup: %d\n", num); } else { (*reply) = g_strdup_printf("error: usage: lookup \n"); } } else if(!strcmp(word, "get")){ if(word_list->len >= 2){ item = word_list->pdata[1]; if(!strcmp(item, "info")){ if(word_list->len == 3){ num = atoi(word_list->pdata[2]); (*reply) = Exonerate_Server_get_info(server->dataset, num); } else { (*reply) = g_strdup_printf( "error: usage: get info \n"); } } else if(!strcmp(item, "seq")){ if(word_list->len == 3){ num = atoi(word_list->pdata[2]); (*reply) = Exonerate_Server_get_seq(server->dataset, num); } else { (*reply) = g_strdup_printf( "error: usage: get seq \n"); } } else if(!strcmp(item, "subseq")){ if(word_list->len == 5){ num = atoi(word_list->pdata[2]); start = atoi(word_list->pdata[3]); len = atoi(word_list->pdata[4]); (*reply) = Exonerate_Server_get_subseq(server->dataset, num, start, len); } else { (*reply) = g_strdup_printf( "error: usage: get subseq \n"); } } else if(!strcmp(item, "hsps")){ if(connection->query){ if(server->index){ (*reply) = Exonerate_Server_get_hsps(server, connection); } else { (*reply) = g_strdup_printf("error: no index for hsps\n"); } } else { (*reply) = g_strdup_printf("error: query not set\n"); } } else { (*reply) = g_strdup_printf("error: Unknown get command\n"); } } else { (*reply) = g_strdup_printf("error: get what ?\n"); } } else if(!strcmp(word, "set")){ if(word_list->len >= 2){ item = word_list->pdata[1]; if(!strcmp(item, "query")){ if(word_list->len == 3){ query = word_list->pdata[2]; if(connection->query) Sequence_destroy(connection->query); connection->query = Exonerate_Server_get_query( server->index, connection, query); connection->revcomp_query = FALSE; if(connection->query) (*reply) = g_strdup_printf("ok: %d %d\n", connection->query->len, Sequence_checksum(connection->query)); else (*reply) = g_strdup_printf("error: bad query\n"); } else { (*reply) = g_strdup_printf( "error: usage: set query \n"); } } else if(!strcmp(item, "param")){ if(word_list->len < 3){ (*reply) = g_strdup_printf( "error: usage: set param \n"); } else { (*reply) = Exonerate_Server_set_param(connection, word_list); } } else { (*reply) = g_strdup_printf("error: Unknown set command\n"); } } else { (*reply) = g_strdup_printf("error: set what ?\n"); } } else if(!strcmp(word, "revcomp")){ if(word_list->len == 2){ item = word_list->pdata[1]; if(!strcmp(item, "query")){ if(!connection->query) (*reply) = g_strdup_printf("error: query not set\n"); if(connection->query_alphabet->type == Alphabet_Type_DNA){ Exonerate_Server_Connection_revcomp_query(connection); (*reply) = g_strdup_printf( "ok: query strand %s\n", connection->revcomp_query?"revcomp":"forward"); } else { (*reply) = g_strdup_printf( "error: cannot revcomp non-DNA query\n"); } } else if(!strcmp(item, "target")){ Exonerate_Server_Connection_revcomp_target(connection); (*reply) = g_strdup_printf( "ok: target strand %s\n", connection->revcomp_target?"revcomp":"forward"); } else { (*reply) = g_strdup_printf("error: Unknown revcomp command\n"); } } else { (*reply) = g_strdup_printf("error: revcomp what ?\n"); } } else { (*reply) = g_strdup_printf("error: Unknown command: [%s]\n", msg); } } g_ptr_array_free(word_list, TRUE); if((server->verbosity >= 3) && (*reply)){ if((server->verbosity == 3) && (strlen(*reply) >= 80)) g_print("Message: server returned reply [%.*s]\n", 80, (*reply)); else g_print("Message: server returned reply [%s]\n", (*reply)); } return keep_connection; } static void run_server(gint port, gchar *input_path, gboolean preload, gint max_connections, gint verbosity){ register Exonerate_Server *exonerate_server = Exonerate_Server_create(input_path, preload, verbosity); register SocketServer *ss = SocketServer_create(port, max_connections, Exonerate_Server_process, Exonerate_Server_Connection_open, Exonerate_Server_Connection_close, exonerate_server); Exonerate_Server_memory_usage(exonerate_server); if(verbosity > 0) g_message("listening on port [%d] ...", port); while(SocketServer_listen(ss)); SocketServer_destroy(ss); Exonerate_Server_destroy(exonerate_server); return; } int Argument_main(Argument *arg){ gint port, max_connections, verbosity; gchar *input_path; gboolean preload; register ArgumentSet *as = ArgumentSet_create("Exonerate Server options"); ArgumentSet_add_option(as, '\0', "port", "port", "Port number to run server on", "12886", Argument_parse_int, &port); ArgumentSet_add_option(as, '\0', "input", "path", "Path to input file (.esd or .esi)", NULL, Argument_parse_string, &input_path); ArgumentSet_add_option(as, '\0', "preload", NULL, "Preload index and sequence data", "TRUE", Argument_parse_boolean, &preload); ArgumentSet_add_option(as, '\0', "maxconnections", "threads", "Maximum concurrent server connections", "4", Argument_parse_int, &max_connections); ArgumentSet_add_option(as, 'V', "verbosity", "level", "Set server verbosity level", "1", Argument_parse_int, &verbosity); Argument_absorb_ArgumentSet(arg, as); /**/ Match_ArgumentSet_create(arg); HSPset_ArgumentSet_create(arg); /**/ Argument_process(arg, "exonerate-server", "Exonerate Server.\n", "Guy St.C. Slater. guy@ebi.ac.uk June 2006\n"); run_server(port, input_path, preload, max_connections, verbosity); g_message("-- server exiting"); return 0; } /**/ exonerate-2.4.0/src/util/0000777000175000017500000000000011222703704012256 500000000000000exonerate-2.4.0/src/util/Makefile.am0000644000175000017500000000622111222231313014217 00000000000000 bin_PROGRAMS = @installed_util_list@ EXTRA_PROGRAMS = fastaclean fastaclip fastachecksum fastacomposition \ fastadiff fastaexplode fastafetch fastahardmask \ fastaindex fastalength fastanrdb fastaoverlap \ fastareformat fastaremove fastarevcomp fastasoftmask \ fastasort fastasplit fastasubseq fastatranslate \ fastavalidcds fasta2esd fastaannotatecdna \ esd2esi INCLUDES = -I$(top_srcdir)/src/sequence \ -I$(top_srcdir)/src/database \ -I$(top_srcdir)/src/struct \ -I$(top_srcdir)/src/comparison \ -I$(top_srcdir)/src/general LDADD = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o \ -lm fastaclean_SOURCES = fastaclean.c fastaclip_SOURCES = fastaclip.c fastachecksum_SOURCES = fastachecksum.c fastacomposition_SOURCES = fastacomposition.c fastadiff_SOURCES = fastadiff.c fastaexplode_SOURCES = fastaexplode.c fastafetch_SOURCES = fastafetch.c fastahardmask_SOURCES = fastahardmask.c fastaindex_SOURCES = fastaindex.c fastalength_SOURCES = fastalength.c fastanrdb_SOURCES = fastanrdb.c fastaoverlap_SOURCES = fastaoverlap.c fastareformat_SOURCES = fastareformat.c fastaremove_SOURCES = fastaremove.c fastarevcomp_SOURCES = fastarevcomp.c fastasoftmask_SOURCES = fastasoftmask.c fastasort_SOURCES = fastasort.c fastasplit_SOURCES = fastasplit.c fastasubseq_SOURCES = fastasubseq.c fastatranslate_SOURCES = fastatranslate.c fasta2esd_SOURCES = fasta2esd.c fastaannotatecdna_SOURCES = fastaannotatecdna.c esd2esi_SOURCES = esd2esi.c fasta2esd_LDADD = $(top_srcdir)/src/database/dataset.o \ $(top_srcdir)/src/struct/bitarray.o \ $(LDADD) esd2esi_LDADD = $(top_srcdir)/src/database/dataset.o \ $(top_srcdir)/src/database/index.o \ $(top_srcdir)/src/general/threadref.o \ $(top_srcdir)/src/struct/bitarray.o \ $(top_srcdir)/src/struct/vfsm.o \ $(top_srcdir)/src/struct/pqueue.o \ $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o \ $(top_srcdir)/src/struct/noitree.o \ $(top_srcdir)/src/struct/splaytree.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/match.o \ $(LDADD) EXTRA_fastavalidcds_SOURCES = fastavalidcds.c # Files to clear away MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/src/util/Makefile.in0000644000175000017500000007014411222703652014247 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ bin_PROGRAMS = @installed_util_list@ EXTRA_PROGRAMS = fastaclean fastaclip fastachecksum fastacomposition fastadiff fastaexplode fastafetch fastahardmask fastaindex fastalength fastanrdb fastaoverlap fastareformat fastaremove fastarevcomp fastasoftmask fastasort fastasplit fastasubseq fastatranslate fastavalidcds fasta2esd fastaannotatecdna esd2esi INCLUDES = -I$(top_srcdir)/src/sequence -I$(top_srcdir)/src/database -I$(top_srcdir)/src/struct -I$(top_srcdir)/src/comparison -I$(top_srcdir)/src/general LDADD = $(top_srcdir)/src/sequence/sequence.o $(top_srcdir)/src/sequence/alphabet.o $(top_srcdir)/src/sequence/splice.o $(top_srcdir)/src/sequence/translate.o $(top_srcdir)/src/database/fastadb.o $(top_srcdir)/src/general/argument.o $(top_srcdir)/src/general/compoundfile.o $(top_srcdir)/src/general/lineparse.o $(top_srcdir)/src/struct/sparsecache.o $(top_srcdir)/src/struct/matrix.o -lm fastaclean_SOURCES = fastaclean.c fastaclip_SOURCES = fastaclip.c fastachecksum_SOURCES = fastachecksum.c fastacomposition_SOURCES = fastacomposition.c fastadiff_SOURCES = fastadiff.c fastaexplode_SOURCES = fastaexplode.c fastafetch_SOURCES = fastafetch.c fastahardmask_SOURCES = fastahardmask.c fastaindex_SOURCES = fastaindex.c fastalength_SOURCES = fastalength.c fastanrdb_SOURCES = fastanrdb.c fastaoverlap_SOURCES = fastaoverlap.c fastareformat_SOURCES = fastareformat.c fastaremove_SOURCES = fastaremove.c fastarevcomp_SOURCES = fastarevcomp.c fastasoftmask_SOURCES = fastasoftmask.c fastasort_SOURCES = fastasort.c fastasplit_SOURCES = fastasplit.c fastasubseq_SOURCES = fastasubseq.c fastatranslate_SOURCES = fastatranslate.c fasta2esd_SOURCES = fasta2esd.c fastaannotatecdna_SOURCES = fastaannotatecdna.c esd2esi_SOURCES = esd2esi.c fasta2esd_LDADD = $(top_srcdir)/src/database/dataset.o $(top_srcdir)/src/struct/bitarray.o $(LDADD) esd2esi_LDADD = $(top_srcdir)/src/database/dataset.o $(top_srcdir)/src/database/index.o $(top_srcdir)/src/general/threadref.o $(top_srcdir)/src/struct/bitarray.o $(top_srcdir)/src/struct/vfsm.o $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/recyclebin.o $(top_srcdir)/src/struct/rangetree.o $(top_srcdir)/src/struct/noitree.o $(top_srcdir)/src/struct/splaytree.o $(top_srcdir)/src/sequence/submat.o $(top_srcdir)/src/sequence/codonsubmat.o $(top_srcdir)/src/comparison/wordhood.o $(top_srcdir)/src/comparison/hspset.o $(top_srcdir)/src/comparison/match.o $(LDADD) EXTRA_fastavalidcds_SOURCES = fastavalidcds.c # Files to clear away MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = PROGRAMS = $(bin_PROGRAMS) DEFS = @DEFS@ -I. -I$(srcdir) CPPFLAGS = @CPPFLAGS@ LDFLAGS = @LDFLAGS@ LIBS = @LIBS@ fastaclean_OBJECTS = fastaclean.o fastaclean_LDADD = $(LDADD) fastaclean_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastaclean_LDFLAGS = fastaclip_OBJECTS = fastaclip.o fastaclip_LDADD = $(LDADD) fastaclip_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastaclip_LDFLAGS = fastachecksum_OBJECTS = fastachecksum.o fastachecksum_LDADD = $(LDADD) fastachecksum_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastachecksum_LDFLAGS = fastacomposition_OBJECTS = fastacomposition.o fastacomposition_LDADD = $(LDADD) fastacomposition_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastacomposition_LDFLAGS = fastadiff_OBJECTS = fastadiff.o fastadiff_LDADD = $(LDADD) fastadiff_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastadiff_LDFLAGS = fastaexplode_OBJECTS = fastaexplode.o fastaexplode_LDADD = $(LDADD) fastaexplode_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastaexplode_LDFLAGS = fastafetch_OBJECTS = fastafetch.o fastafetch_LDADD = $(LDADD) fastafetch_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastafetch_LDFLAGS = fastahardmask_OBJECTS = fastahardmask.o fastahardmask_LDADD = $(LDADD) fastahardmask_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastahardmask_LDFLAGS = fastaindex_OBJECTS = fastaindex.o fastaindex_LDADD = $(LDADD) fastaindex_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastaindex_LDFLAGS = fastalength_OBJECTS = fastalength.o fastalength_LDADD = $(LDADD) fastalength_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastalength_LDFLAGS = fastanrdb_OBJECTS = fastanrdb.o fastanrdb_LDADD = $(LDADD) fastanrdb_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastanrdb_LDFLAGS = fastaoverlap_OBJECTS = fastaoverlap.o fastaoverlap_LDADD = $(LDADD) fastaoverlap_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastaoverlap_LDFLAGS = fastareformat_OBJECTS = fastareformat.o fastareformat_LDADD = $(LDADD) fastareformat_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastareformat_LDFLAGS = fastaremove_OBJECTS = fastaremove.o fastaremove_LDADD = $(LDADD) fastaremove_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastaremove_LDFLAGS = fastarevcomp_OBJECTS = fastarevcomp.o fastarevcomp_LDADD = $(LDADD) fastarevcomp_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastarevcomp_LDFLAGS = fastasoftmask_OBJECTS = fastasoftmask.o fastasoftmask_LDADD = $(LDADD) fastasoftmask_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastasoftmask_LDFLAGS = fastasort_OBJECTS = fastasort.o fastasort_LDADD = $(LDADD) fastasort_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastasort_LDFLAGS = fastasplit_OBJECTS = fastasplit.o fastasplit_LDADD = $(LDADD) fastasplit_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastasplit_LDFLAGS = fastasubseq_OBJECTS = fastasubseq.o fastasubseq_LDADD = $(LDADD) fastasubseq_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastasubseq_LDFLAGS = fastatranslate_OBJECTS = fastatranslate.o fastatranslate_LDADD = $(LDADD) fastatranslate_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastatranslate_LDFLAGS = fastavalidcds_SOURCES = fastavalidcds.c fastavalidcds_OBJECTS = fastavalidcds.o fastavalidcds_LDADD = $(LDADD) fastavalidcds_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastavalidcds_LDFLAGS = fasta2esd_OBJECTS = fasta2esd.o fasta2esd_DEPENDENCIES = $(top_srcdir)/src/database/dataset.o \ $(top_srcdir)/src/struct/bitarray.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fasta2esd_LDFLAGS = fastaannotatecdna_OBJECTS = fastaannotatecdna.o fastaannotatecdna_LDADD = $(LDADD) fastaannotatecdna_DEPENDENCIES = $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o fastaannotatecdna_LDFLAGS = esd2esi_OBJECTS = esd2esi.o esd2esi_DEPENDENCIES = $(top_srcdir)/src/database/dataset.o \ $(top_srcdir)/src/database/index.o \ $(top_srcdir)/src/general/threadref.o \ $(top_srcdir)/src/struct/bitarray.o $(top_srcdir)/src/struct/vfsm.o \ $(top_srcdir)/src/struct/pqueue.o $(top_srcdir)/src/struct/recyclebin.o \ $(top_srcdir)/src/struct/rangetree.o $(top_srcdir)/src/struct/noitree.o \ $(top_srcdir)/src/struct/splaytree.o \ $(top_srcdir)/src/sequence/submat.o \ $(top_srcdir)/src/sequence/codonsubmat.o \ $(top_srcdir)/src/comparison/wordhood.o \ $(top_srcdir)/src/comparison/hspset.o \ $(top_srcdir)/src/comparison/match.o \ $(top_srcdir)/src/sequence/sequence.o \ $(top_srcdir)/src/sequence/alphabet.o \ $(top_srcdir)/src/sequence/splice.o \ $(top_srcdir)/src/sequence/translate.o \ $(top_srcdir)/src/database/fastadb.o \ $(top_srcdir)/src/general/argument.o \ $(top_srcdir)/src/general/compoundfile.o \ $(top_srcdir)/src/general/lineparse.o \ $(top_srcdir)/src/struct/sparsecache.o \ $(top_srcdir)/src/struct/matrix.o esd2esi_LDFLAGS = CFLAGS = @CFLAGS@ COMPILE = $(CC) $(DEFS) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) CCLD = $(CC) LINK = $(CCLD) $(AM_CFLAGS) $(CFLAGS) $(LDFLAGS) -o $@ DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best SOURCES = $(fastaclean_SOURCES) $(fastaclip_SOURCES) $(fastachecksum_SOURCES) $(fastacomposition_SOURCES) $(fastadiff_SOURCES) $(fastaexplode_SOURCES) $(fastafetch_SOURCES) $(fastahardmask_SOURCES) $(fastaindex_SOURCES) $(fastalength_SOURCES) $(fastanrdb_SOURCES) $(fastaoverlap_SOURCES) $(fastareformat_SOURCES) $(fastaremove_SOURCES) $(fastarevcomp_SOURCES) $(fastasoftmask_SOURCES) $(fastasort_SOURCES) $(fastasplit_SOURCES) $(fastasubseq_SOURCES) $(fastatranslate_SOURCES) fastavalidcds.c $(EXTRA_fastavalidcds_SOURCES) $(fasta2esd_SOURCES) $(fastaannotatecdna_SOURCES) $(esd2esi_SOURCES) OBJECTS = $(fastaclean_OBJECTS) $(fastaclip_OBJECTS) $(fastachecksum_OBJECTS) $(fastacomposition_OBJECTS) $(fastadiff_OBJECTS) $(fastaexplode_OBJECTS) $(fastafetch_OBJECTS) $(fastahardmask_OBJECTS) $(fastaindex_OBJECTS) $(fastalength_OBJECTS) $(fastanrdb_OBJECTS) $(fastaoverlap_OBJECTS) $(fastareformat_OBJECTS) $(fastaremove_OBJECTS) $(fastarevcomp_OBJECTS) $(fastasoftmask_OBJECTS) $(fastasort_OBJECTS) $(fastasplit_OBJECTS) $(fastasubseq_OBJECTS) $(fastatranslate_OBJECTS) fastavalidcds.o $(fasta2esd_OBJECTS) $(fastaannotatecdna_OBJECTS) $(esd2esi_OBJECTS) all: all-redirect .SUFFIXES: .SUFFIXES: .S .c .o .s $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps src/util/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status mostlyclean-binPROGRAMS: clean-binPROGRAMS: -test -z "$(bin_PROGRAMS)" || rm -f $(bin_PROGRAMS) distclean-binPROGRAMS: maintainer-clean-binPROGRAMS: install-binPROGRAMS: $(bin_PROGRAMS) @$(NORMAL_INSTALL) $(mkinstalldirs) $(DESTDIR)$(bindir) @list='$(bin_PROGRAMS)'; for p in $$list; do \ if test -f $$p; then \ echo " $(INSTALL_PROGRAM) $$p $(DESTDIR)$(bindir)/`echo $$p|sed 's/$(EXEEXT)$$//'|sed '$(transform)'|sed 's/$$/$(EXEEXT)/'`"; \ $(INSTALL_PROGRAM) $$p $(DESTDIR)$(bindir)/`echo $$p|sed 's/$(EXEEXT)$$//'|sed '$(transform)'|sed 's/$$/$(EXEEXT)/'`; \ else :; fi; \ done uninstall-binPROGRAMS: @$(NORMAL_UNINSTALL) list='$(bin_PROGRAMS)'; for p in $$list; do \ rm -f $(DESTDIR)$(bindir)/`echo $$p|sed 's/$(EXEEXT)$$//'|sed '$(transform)'|sed 's/$$/$(EXEEXT)/'`; \ done .c.o: $(COMPILE) -c $< .s.o: $(COMPILE) -c $< .S.o: $(COMPILE) -c $< mostlyclean-compile: -rm -f *.o core *.core clean-compile: distclean-compile: -rm -f *.tab.c maintainer-clean-compile: fastaclean: $(fastaclean_OBJECTS) $(fastaclean_DEPENDENCIES) @rm -f fastaclean $(LINK) $(fastaclean_LDFLAGS) $(fastaclean_OBJECTS) $(fastaclean_LDADD) $(LIBS) fastaclip: $(fastaclip_OBJECTS) $(fastaclip_DEPENDENCIES) @rm -f fastaclip $(LINK) $(fastaclip_LDFLAGS) $(fastaclip_OBJECTS) $(fastaclip_LDADD) $(LIBS) fastachecksum: $(fastachecksum_OBJECTS) $(fastachecksum_DEPENDENCIES) @rm -f fastachecksum $(LINK) $(fastachecksum_LDFLAGS) $(fastachecksum_OBJECTS) $(fastachecksum_LDADD) $(LIBS) fastacomposition: $(fastacomposition_OBJECTS) $(fastacomposition_DEPENDENCIES) @rm -f fastacomposition $(LINK) $(fastacomposition_LDFLAGS) $(fastacomposition_OBJECTS) $(fastacomposition_LDADD) $(LIBS) fastadiff: $(fastadiff_OBJECTS) $(fastadiff_DEPENDENCIES) @rm -f fastadiff $(LINK) $(fastadiff_LDFLAGS) $(fastadiff_OBJECTS) $(fastadiff_LDADD) $(LIBS) fastaexplode: $(fastaexplode_OBJECTS) $(fastaexplode_DEPENDENCIES) @rm -f fastaexplode $(LINK) $(fastaexplode_LDFLAGS) $(fastaexplode_OBJECTS) $(fastaexplode_LDADD) $(LIBS) fastafetch: $(fastafetch_OBJECTS) $(fastafetch_DEPENDENCIES) @rm -f fastafetch $(LINK) $(fastafetch_LDFLAGS) $(fastafetch_OBJECTS) $(fastafetch_LDADD) $(LIBS) fastahardmask: $(fastahardmask_OBJECTS) $(fastahardmask_DEPENDENCIES) @rm -f fastahardmask $(LINK) $(fastahardmask_LDFLAGS) $(fastahardmask_OBJECTS) $(fastahardmask_LDADD) $(LIBS) fastaindex: $(fastaindex_OBJECTS) $(fastaindex_DEPENDENCIES) @rm -f fastaindex $(LINK) $(fastaindex_LDFLAGS) $(fastaindex_OBJECTS) $(fastaindex_LDADD) $(LIBS) fastalength: $(fastalength_OBJECTS) $(fastalength_DEPENDENCIES) @rm -f fastalength $(LINK) $(fastalength_LDFLAGS) $(fastalength_OBJECTS) $(fastalength_LDADD) $(LIBS) fastanrdb: $(fastanrdb_OBJECTS) $(fastanrdb_DEPENDENCIES) @rm -f fastanrdb $(LINK) $(fastanrdb_LDFLAGS) $(fastanrdb_OBJECTS) $(fastanrdb_LDADD) $(LIBS) fastaoverlap: $(fastaoverlap_OBJECTS) $(fastaoverlap_DEPENDENCIES) @rm -f fastaoverlap $(LINK) $(fastaoverlap_LDFLAGS) $(fastaoverlap_OBJECTS) $(fastaoverlap_LDADD) $(LIBS) fastareformat: $(fastareformat_OBJECTS) $(fastareformat_DEPENDENCIES) @rm -f fastareformat $(LINK) $(fastareformat_LDFLAGS) $(fastareformat_OBJECTS) $(fastareformat_LDADD) $(LIBS) fastaremove: $(fastaremove_OBJECTS) $(fastaremove_DEPENDENCIES) @rm -f fastaremove $(LINK) $(fastaremove_LDFLAGS) $(fastaremove_OBJECTS) $(fastaremove_LDADD) $(LIBS) fastarevcomp: $(fastarevcomp_OBJECTS) $(fastarevcomp_DEPENDENCIES) @rm -f fastarevcomp $(LINK) $(fastarevcomp_LDFLAGS) $(fastarevcomp_OBJECTS) $(fastarevcomp_LDADD) $(LIBS) fastasoftmask: $(fastasoftmask_OBJECTS) $(fastasoftmask_DEPENDENCIES) @rm -f fastasoftmask $(LINK) $(fastasoftmask_LDFLAGS) $(fastasoftmask_OBJECTS) $(fastasoftmask_LDADD) $(LIBS) fastasort: $(fastasort_OBJECTS) $(fastasort_DEPENDENCIES) @rm -f fastasort $(LINK) $(fastasort_LDFLAGS) $(fastasort_OBJECTS) $(fastasort_LDADD) $(LIBS) fastasplit: $(fastasplit_OBJECTS) $(fastasplit_DEPENDENCIES) @rm -f fastasplit $(LINK) $(fastasplit_LDFLAGS) $(fastasplit_OBJECTS) $(fastasplit_LDADD) $(LIBS) fastasubseq: $(fastasubseq_OBJECTS) $(fastasubseq_DEPENDENCIES) @rm -f fastasubseq $(LINK) $(fastasubseq_LDFLAGS) $(fastasubseq_OBJECTS) $(fastasubseq_LDADD) $(LIBS) fastatranslate: $(fastatranslate_OBJECTS) $(fastatranslate_DEPENDENCIES) @rm -f fastatranslate $(LINK) $(fastatranslate_LDFLAGS) $(fastatranslate_OBJECTS) $(fastatranslate_LDADD) $(LIBS) fastavalidcds: $(fastavalidcds_OBJECTS) $(fastavalidcds_DEPENDENCIES) @rm -f fastavalidcds $(LINK) $(fastavalidcds_LDFLAGS) $(fastavalidcds_OBJECTS) $(fastavalidcds_LDADD) $(LIBS) fasta2esd: $(fasta2esd_OBJECTS) $(fasta2esd_DEPENDENCIES) @rm -f fasta2esd $(LINK) $(fasta2esd_LDFLAGS) $(fasta2esd_OBJECTS) $(fasta2esd_LDADD) $(LIBS) fastaannotatecdna: $(fastaannotatecdna_OBJECTS) $(fastaannotatecdna_DEPENDENCIES) @rm -f fastaannotatecdna $(LINK) $(fastaannotatecdna_LDFLAGS) $(fastaannotatecdna_OBJECTS) $(fastaannotatecdna_LDADD) $(LIBS) esd2esi: $(esd2esi_OBJECTS) $(esd2esi_DEPENDENCIES) @rm -f esd2esi $(LINK) $(esd2esi_LDFLAGS) $(esd2esi_OBJECTS) $(esd2esi_LDADD) $(LIBS) tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = src/util distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-binPROGRAMS install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall-binPROGRAMS uninstall: uninstall-am all-am: Makefile $(PROGRAMS) all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: $(mkinstalldirs) $(DESTDIR)$(bindir) mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-binPROGRAMS mostlyclean-compile \ mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-binPROGRAMS clean-compile clean-tags clean-generic \ mostlyclean-am clean: clean-am distclean-am: distclean-binPROGRAMS distclean-compile distclean-tags \ distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-binPROGRAMS \ maintainer-clean-compile maintainer-clean-tags \ maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: mostlyclean-binPROGRAMS distclean-binPROGRAMS clean-binPROGRAMS \ maintainer-clean-binPROGRAMS uninstall-binPROGRAMS install-binPROGRAMS \ mostlyclean-compile distclean-compile clean-compile \ maintainer-clean-compile tags mostlyclean-tags distclean-tags \ clean-tags maintainer-clean-tags distdir info-am info dvi-am dvi check \ check-am installcheck-am installcheck install-exec-am install-exec \ install-data-am install-data install-am install uninstall-am uninstall \ all-redirect all-am all installdirs mostlyclean-generic \ distclean-generic clean-generic maintainer-clean-generic clean \ mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/src/util/fastaclean.c0000644000175000017500000000551011162714343014444 00000000000000/****************************************************************\ * * * fastaclean : a utility to clean fasta format files * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "argument.h" #include "fastadb.h" typedef struct { gboolean clean_protein; gboolean clean_acgtn; } FastaClean_Info; static gboolean fasta_clean_traverse_func(FastaDB_Seq *fdbs, gpointer user_data){ register FastaClean_Info *fci = user_data; register Alphabet_Filter_Type filter_type; register Sequence *seq; if(fci->clean_acgtn) filter_type = Alphabet_Filter_Type_CLEAN_ACGTN; else filter_type = Alphabet_Filter_Type_CLEAN; seq = Sequence_filter(fdbs->seq, filter_type); Sequence_print_fasta(seq, stdout, FALSE); Sequence_destroy(seq); return FALSE; } int Argument_main(Argument *arg){ register FastaDB *fdb; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *query_path; register Alphabet *alphabet; FastaClean_Info fci; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); ArgumentSet_add_option(as, 'p', "protein", NULL, "Clean protein database", "FALSE", Argument_parse_boolean, &fci.clean_protein); ArgumentSet_add_option(as, 'a', "acgtn", NULL, "Only allow [ACGTN] nucleotide symbols", "FALSE", Argument_parse_boolean, &fci.clean_acgtn); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastaclean", "A utility to clean fasta format file symbols\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); if(fci.clean_protein) alphabet = Alphabet_create(Alphabet_Type_PROTEIN, FALSE); else alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE); fdb = FastaDB_open(query_path, alphabet); FastaDB_traverse(fdb, FastaDB_Mask_ALL, fasta_clean_traverse_func, &fci); FastaDB_close(fdb); Alphabet_destroy(alphabet); return 0; } exonerate-2.4.0/src/util/fastaclip.c0000644000175000017500000000520011162714343014305 00000000000000/****************************************************************\ * * * fastaclip : clip terminal Ns from fasta format sequences * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "argument.h" #include "fastadb.h" static gboolean fasta_clip_traverse_func(FastaDB_Seq *fdbs, gpointer user_data){ register gint clip_char = GPOINTER_TO_INT(user_data); register gint i, start_clip, end_clip, clip_len; register Sequence *clip_seq; for(i = 0; i < fdbs->seq->len; i++) if(Sequence_get_symbol(fdbs->seq, i) != clip_char) break; start_clip = i; for(i = fdbs->seq->len-1; i >= 0; i--) if(Sequence_get_symbol(fdbs->seq, i) != clip_char) break; end_clip = fdbs->seq->len-i-1; clip_len = fdbs->seq->len-start_clip-end_clip; clip_seq = Sequence_subseq(fdbs->seq, start_clip, clip_len); Sequence_print_fasta(clip_seq, stdout, FALSE); Sequence_destroy(clip_seq); return FALSE; } int Argument_main(Argument *arg){ register FastaDB *fdb; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *query_path; gchar clip_char; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); ArgumentSet_add_option(as, 'c', "clip", "path", "Clip character", "N", Argument_parse_char, &clip_char); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastaclip", "A utility clip terminal Ns from fasta format sequences\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); fdb = FastaDB_open(query_path, NULL); FastaDB_traverse(fdb, FastaDB_Mask_ALL, fasta_clip_traverse_func, GINT_TO_POINTER((gint)clip_char)); FastaDB_close(fdb); return 0; } exonerate-2.4.0/src/util/fastachecksum.c0000644000175000017500000000377311162714343015175 00000000000000/****************************************************************\ * * * fastachecksum : a utility to print GCG checksums * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "argument.h" #include "fastadb.h" static gboolean fasta_checksum_traverse_func(FastaDB_Seq *fdbs, gpointer user_data){ g_print("%d %d %s\n", Sequence_checksum(fdbs->seq), fdbs->seq->len, fdbs->seq->id); return FALSE; } int Argument_main(Argument *arg){ register FastaDB *fdb; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *query_path; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastachecksum", "A utility to report GCG checksums for fasta sequences\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2004.\n", NULL); fdb = FastaDB_open(query_path, NULL); FastaDB_traverse(fdb, FastaDB_Mask_ID |FastaDB_Mask_LEN |FastaDB_Mask_SEQ, fasta_checksum_traverse_func, NULL); FastaDB_close(fdb); return 0; } exonerate-2.4.0/src/util/fastacomposition.c0000644000175000017500000000716011162714343015730 00000000000000/****************************************************************\ * * * fastacomposition : a utility to dump sequence composition * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For memset() */ #include /* For toupper() */ #include "argument.h" #include "fastadb.h" static void fasta_composition_report(gchar *name, glong *count, gboolean ignore_case){ register gint i; g_print("%s", name); if(ignore_case){ for(i = 0; i < 256; i++) if(count[i]){ if(islower(i)){ g_print(" %c %ld", i, count[tolower(i)] + count[toupper(i)]); } else { if(!isupper(i)) g_print(" %c %ld", i, count[i]); } } } else { for(i = 0; i < 256; i++) if(count[i]) g_print(" %c %ld", i, count[i]); } g_print("\n"); return; } typedef struct { glong count[256]; gboolean report_separately; gboolean ignore_case; } FastaComposition_Data; static gboolean fasta_composition_traverse_func(FastaDB_Seq *fdbs, gpointer user_data){ register gint i; register FastaComposition_Data *fcd = user_data; register gchar *str = Sequence_get_str(fdbs->seq); if(fcd->report_separately) memset(fcd->count, 0, sizeof(gint)*256); for(i = 0; i < fdbs->seq->len; i++) fcd->count[(guchar)str[i]]++; g_free(str); if(fcd->report_separately) fasta_composition_report(fdbs->seq->id, fcd->count, fcd->ignore_case); return FALSE; } int Argument_main(Argument *arg){ register FastaDB *fdb; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *query_path; FastaComposition_Data fcd = {{0},FALSE, FALSE}; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); ArgumentSet_add_option(as, 'i', "ignorecase", NULL, "Ignore sequence case", "FALSE", Argument_parse_boolean, &fcd.ignore_case); ArgumentSet_add_option(as, 's', "separate", NULL, "Report composition for each sequence separately", "FALSE", Argument_parse_boolean, &fcd.report_separately); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastacomposition", "A utility to report sequence composition\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); fdb = FastaDB_open(query_path, NULL); FastaDB_traverse(fdb, FastaDB_Mask_ID|FastaDB_Mask_SEQ, fasta_composition_traverse_func, &fcd); if(!fcd.report_separately) fasta_composition_report(query_path, fcd.count, fcd.ignore_case); FastaDB_close(fdb); return 0; } exonerate-2.4.0/src/util/fastadiff.c0000644000175000017500000001350011162714344014271 00000000000000/****************************************************************\ * * * fastadiff : a utility to compare fasta sequence files * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strcmp() */ #include /* For toupper() */ #include "argument.h" #include "fastadb.h" static gint fasta_diff_ambig_strcmp(const gchar *s1, const gchar *s2){ register gchar c1, c2; while(*s1 && *s2){ c1 = toupper(*s1); c2 = toupper(*s2); if((c1 != 'N') && (c1 != 'X') && (c2 != 'N') && (c2 != 'X')){ if(*s1 != *s2) return *s1 - *s2; } s1++; s2++; } return *s1 - *s2; } static gint fasta_diff_ambig_strcasecmp(const gchar *s1, const gchar *s2){ register gchar c1, c2; while(*s1 && *s2){ c1 = toupper(*s1); c2 = toupper(*s2); if((c1 != 'N') && (c1 != 'X') && (c2 != 'N') && (c2 != 'X')){ if(c1 != c2) return c1 - c2; } s1++; s2++; } return *s1 - *s2; } typedef gint (*fasta_diff_compare_func)(const gchar *s1, const gchar *s2); static gint fasta_diff(gchar *first_path, gchar *second_path, gboolean allow_ambiguity, gboolean ignore_case, gboolean check_ids){ register FastaDB *fdb_a, *fdb_b; register FastaDB_Seq *fdbs_a = NULL, *fdbs_b = NULL; register gint ret_val = 0; register FastaDB_Mask mask = FastaDB_Mask_ID|FastaDB_Mask_SEQ; register fasta_diff_compare_func comp = NULL; register gchar *seq_a, *seq_b; if(ignore_case){ if(allow_ambiguity){ comp = fasta_diff_ambig_strcasecmp; } else { comp = g_strcasecmp; } } else { if(allow_ambiguity){ comp = fasta_diff_ambig_strcmp; } else { comp = strcmp; } } fdb_a = FastaDB_open(first_path, NULL); fdb_b = FastaDB_open(second_path, NULL); while((fdbs_a = FastaDB_next(fdb_a, mask))){ fdbs_b = FastaDB_next(fdb_b, mask); if(!fdbs_b){ g_print("fastadiff: %s: id %s absent\n", second_path, fdbs_a->seq->id); ret_val = 1; break; } /* Check ids */ if((check_ids) && (strcmp(fdbs_a->seq->id, fdbs_b->seq->id))){ g_print("fastadiff: id mismatch: %s %s\n", fdbs_a->seq->id, fdbs_b->seq->id); ret_val = 1; break; } /* Check length */ if(fdbs_a->seq->len != fdbs_b->seq->len){ g_print("fastadiff: length mismatch: %s(%d) %s(%d)\n", fdbs_a->seq->id, fdbs_a->seq->len, fdbs_b->seq->id, fdbs_b->seq->len); ret_val = 1; break; } /* Check seqs */ seq_a = Sequence_get_str(fdbs_a->seq); seq_b = Sequence_get_str(fdbs_b->seq); if(comp(seq_a, seq_b)){ g_print("fastadiff: sequence mismatch: %s %s\n", fdbs_a->seq->id, fdbs_b->seq->id); ret_val = 1; break; } g_free(seq_a); g_free(seq_b); FastaDB_Seq_destroy(fdbs_a); FastaDB_Seq_destroy(fdbs_b); fdbs_a = fdbs_b = NULL; } if(fdbs_a) FastaDB_Seq_destroy(fdbs_a); if(fdbs_b) FastaDB_Seq_destroy(fdbs_b); if(ret_val == 0){ fdbs_b = FastaDB_next(fdb_b, mask); if(fdbs_b){ g_print("fastadiff: %s: id %s absent\n", first_path, fdbs_b->seq->id); ret_val = 1; FastaDB_Seq_destroy(fdbs_b); } } FastaDB_close(fdb_a); FastaDB_close(fdb_b); return ret_val; } int Argument_main(Argument *arg){ register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *first_path, *second_path; gboolean allow_ambiguity, ignore_case, check_ids; ArgumentSet_add_option(as, '1', "first", "path", "First fasta input file", NULL, Argument_parse_string, &first_path); ArgumentSet_add_option(as, '2', "second", "path", "Second fasta input file", NULL, Argument_parse_string, &second_path); ArgumentSet_add_option(as, 'i', "ignorecase", NULL, "Ignore sequence case", "FALSE", Argument_parse_boolean, &ignore_case); ArgumentSet_add_option(as, 'a', "ambiguity", NULL, "Allow sequence ambiguity with N or X", "FALSE", Argument_parse_boolean, &allow_ambiguity); ArgumentSet_add_option(as, 'c', "checkids", NULL, "Check ids for consistency", "TRUE", Argument_parse_boolean, &check_ids); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastadiff", "Compare fasta format sequences (assumes same order)\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); return fasta_diff(first_path, second_path, allow_ambiguity, ignore_case, check_ids); } exonerate-2.4.0/src/util/fastaexplode.c0000644000175000017500000000507011162714344015024 00000000000000/****************************************************************\ * * * fastaexplode : break a fasta file into individual sequences * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "argument.h" #include "fastadb.h" static gboolean fasta_explode_traverse_func(FastaDB_Seq *fdbs, gpointer user_data){ register gchar *dir_path = user_data; register gchar *output_path = g_strconcat(dir_path, G_DIR_SEPARATOR_S, fdbs->seq->id, ".fa", NULL); register FILE *fp = fopen(output_path, "r"); if(fp){ fclose(fp); g_error("File [%s] already exists", output_path); } fp = fopen(output_path, "w"); if(!fp) g_error("Could not open [%s] to write output", output_path); FastaDB_Seq_print(fdbs, fp, FastaDB_Mask_ID |FastaDB_Mask_DEF |FastaDB_Mask_SEQ); g_free(output_path); fclose(fp); return FALSE; } int Argument_main(Argument *arg){ register FastaDB *fdb; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *query_path, *dir_path; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); ArgumentSet_add_option(as, 'd', "directory", "path", "Output file directory", ".", Argument_parse_string, &dir_path); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastaexplode", "Split a fasta file up into individual sequences\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); fdb = FastaDB_open(query_path, NULL); FastaDB_traverse(fdb, FastaDB_Mask_ALL, fasta_explode_traverse_func, dir_path); FastaDB_close(fdb); return 0; } exonerate-2.4.0/src/util/fastafetch.c0000644000175000017500000001530211200067375014452 00000000000000/****************************************************************\ * * * fastafetch : fetch a sequence from a fasta format file * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include #include #include #include "argument.h" #include "fastadb.h" static gint fasta_index_seek_to_char(FILE *fp, gchar c){ register gint ch; while((ch = getc(fp)) != EOF){ if(ch == c) break; } return ch; } static gint fasta_index_read_to_char(FILE *fp, gchar c, GString *str){ register gint ch; g_string_truncate(str, 0); while((ch = getc(fp)) != EOF){ if(ch == c) break; g_string_append_c(str, ch); } return ch; } static CompoundFile_Pos fasta_index_lookup(FastaDB *fdb, FILE *index_fp, gchar *id){ register long left, right, pos; CompoundFile_Pos_scan_type result = -1; register GString *curr = g_string_sized_new(100); register gint cond; left = 0; fseek(index_fp, 0, SEEK_END); right = ftell(index_fp); if(right == -1) g_error("Could not seek to index end [%s]", strerror(errno)); while(left < right){ /* Binary search */ pos = (left+right)>>1; fseek(index_fp, pos, SEEK_SET); /* Move between l&r */ /* Move to next line (unless at beginning of index) */ if(pos && (fasta_index_seek_to_char(index_fp, '\n') == EOF)){ right = pos; continue; } if(fasta_index_read_to_char(index_fp, ' ', curr) == EOF){ right = pos; continue; } cond = strcmp(id, curr->str); if(cond == 0){ /* Disco */ fasta_index_read_to_char(index_fp, '\n', curr); /* Convert to position in result */ CompoundFile_Pos_scan(curr->str, &result); break; } if(cond > 0){ if((!pos) || (left == (pos-1))) break; left = pos - 1; /* Higher */ } else { right = pos; /* Lower */ } } g_string_free(curr, TRUE); return (CompoundFile_Pos)result; } static FastaDB_Seq *fasta_index_fetch(FastaDB *fdb, FILE *index_fp, FastaDB_Mask mask, gchar *id, gboolean be_silent){ register CompoundFile_Pos pos = fasta_index_lookup(fdb, index_fp, id); register FastaDB_Seq *fdbs; if(pos == -1){ if(be_silent) return NULL; g_error("Could not find identifier [%s] (missing -F ?)", id); } fdbs = FastaDB_fetch(fdb, mask, pos); g_assert(!strcmp(fdbs->seq->id, id)); return fdbs; } /**/ static gboolean read_next_line(FILE *fp, GString *s){ register gint ch; g_string_truncate(s, 0); while((ch = getc(fp)) != EOF){ if(ch == '\n'){ return TRUE; } else { g_string_append_c(s, ch); } } return s->len?TRUE:FALSE; } static GPtrArray *get_query_list(gchar *query, gboolean use_fosn){ register GPtrArray *query_list = g_ptr_array_new(); register FILE *fp = NULL; register GString *id; if(!g_strcasecmp(query, "stdin")){ fp = stdin; } else { if(use_fosn) fp = fopen(query, "r"); } if(fp){ id = g_string_sized_new(64); while(read_next_line(fp, id)) g_ptr_array_add(query_list, g_strdup(id->str)); fclose(fp); g_string_free(id, TRUE); } else { g_ptr_array_add(query_list, g_strdup(query)); } return query_list; } static void fetch_sequences(gchar *fasta_path, gchar *index_path, gboolean use_fosn, gboolean be_silent, gchar *query){ register GPtrArray *query_list = get_query_list(query, use_fosn); register gint i; register FastaDB *fdb = FastaDB_open(fasta_path, NULL); register FILE *index_fp = fopen(index_path, "r"); register FastaDB_Seq *fdbs; register FastaDB_Mask mask = FastaDB_Mask_ID | FastaDB_Mask_DEF | FastaDB_Mask_SEQ; if(!index_fp) g_error("Could not open fasta index [%s]", index_path); for(i = 0; i < query_list->len; i++){ fdbs = fasta_index_fetch(fdb, index_fp, mask, query_list->pdata[i], be_silent); if(fdbs){ FastaDB_Seq_print(fdbs, stdout, mask); FastaDB_Seq_destroy(fdbs); } } for(i = 0; i < query_list->len; i++) g_free(query_list->pdata[i]); g_ptr_array_free(query_list, TRUE); FastaDB_close(fdb); fclose(index_fp); return; } int Argument_main(Argument *arg){ register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *fasta_path, *index_path, *query; gboolean use_fosn, be_silent; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &fasta_path); ArgumentSet_add_option(as, 'i', "index", "path", "Index file path", NULL, Argument_parse_string, &index_path); ArgumentSet_add_option(as, 'F', "fosn", NULL, "Interpret query as FOSN (file of sequence names)", "FALSE", Argument_parse_boolean, &use_fosn); ArgumentSet_add_option(as, 'q', "query", "name", "Fetch query < id | fosn | stdin >", NULL, Argument_parse_string, &query); ArgumentSet_add_option(as, 's', "silent", NULL, "Silently skip ids which cannot be found", "FALSE", Argument_parse_boolean, &be_silent); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastafetch", "Fetch a sequence from a fasta file\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); fetch_sequences(fasta_path, index_path, use_fosn, be_silent, query); return 0; } exonerate-2.4.0/src/util/fastahardmask.c0000644000175000017500000000402311162714344015153 00000000000000/****************************************************************\ * * * fastahardmask : convert lower case chars to 'N's * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include "argument.h" #include "fastadb.h" static gboolean fasta_hard_mask_traverse_func(FastaDB_Seq *fdbs, gpointer user_data){ register Sequence *s = Sequence_filter(fdbs->seq, Alphabet_Filter_Type_MASKED); Sequence_print_fasta(s, stdout, FALSE); Sequence_destroy(s); return FALSE; } int Argument_main(Argument *arg){ register FastaDB *fdb; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *query_path; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastahardmask", "A utility to convert lower fasta sequence symbols to Ns\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); fdb = FastaDB_open(query_path, NULL); FastaDB_traverse(fdb, FastaDB_Mask_ALL, fasta_hard_mask_traverse_func, NULL); FastaDB_close(fdb); return 0; } exonerate-2.4.0/src/util/fastaindex.c0000644000175000017500000000556111162714344014500 00000000000000/****************************************************************\ * * * fastaindex : index a fasta format file * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include #include #include "argument.h" #include "fastadb.h" static int Fasta_Index_compare(const void *a, const void *b){ register FastaDB_Seq **fdbs_a = (FastaDB_Seq**)a, **fdbs_b = (FastaDB_Seq**)b; register gint result = strcmp((*fdbs_a)->seq->id, (*fdbs_b)->seq->id); if(!result) g_error("Duplicate fasta identifier [%s]", (*fdbs_a)->seq->id); return result; } static void fasta_index_build(gchar *fasta_path, gchar *index_path){ register FILE *fp; register FastaDB_Seq **fdbs_list; register gint i; guint total = 0; fp = fopen(index_path, "r"); if(fp){ fclose(fp); g_error("Index path [%s] already exists", index_path); } fp = fopen(index_path, "w"); if(!fp) g_error("Could not write fasta index to [%s]", index_path); fdbs_list = FastaDB_all(fasta_path, NULL, FastaDB_Mask_ID, &total); qsort(fdbs_list, total, sizeof(FastaDB_Seq*), Fasta_Index_compare); for(i = 0; i < total; i++){ fprintf(fp, "%s ", fdbs_list[i]->seq->id); CompoundFile_Pos_print(fp, fdbs_list[i]->location->pos); fprintf(fp, "\n"); } FastaDB_Seq_all_destroy(fdbs_list); fclose(fp); return; } int Argument_main(Argument *arg){ register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *fasta_path, *index_path; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &fasta_path); ArgumentSet_add_option(as, 'i', "index", "path", "Index file path", NULL, Argument_parse_string, &index_path); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastaindex", "Index a fasta file\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); fasta_index_build(fasta_path, index_path); return 0; } exonerate-2.4.0/src/util/fastalength.c0000644000175000017500000000355311162714344014651 00000000000000/****************************************************************\ * * * fastalength : a utility to print fasta sequence length * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "argument.h" #include "fastadb.h" static gboolean fasta_length_traverse_func(FastaDB_Seq *fdbs, gpointer user_data){ g_print("%d %s\n", fdbs->seq->len, fdbs->seq->id); return FALSE; } int Argument_main(Argument *arg){ register FastaDB *fdb; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *query_path; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastalength", "A utility to report fasta sequence lengths\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); fdb = FastaDB_open(query_path, NULL); FastaDB_traverse(fdb, FastaDB_Mask_ID|FastaDB_Mask_LEN, fasta_length_traverse_func, NULL); FastaDB_close(fdb); return 0; } exonerate-2.4.0/src/util/fastanrdb.c0000644000175000017500000002017211162714344014311 00000000000000/****************************************************************\ * * * fastanrdb : create non-redundant versions of fasta database * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include /* For strcmp() */ #include /* For toupper() */ #include "argument.h" #include "fastadb.h" /**/ typedef struct { gint checksum; gint length; FastaDB_Key *key; FastaDB_Seq *seq; } NRDB_Data; static NRDB_Data *NRDB_Data_create(FastaDB_Seq *fdbs){ register NRDB_Data *nd = g_new(NRDB_Data, 1); nd->length = fdbs->seq->len; nd->checksum = Sequence_checksum(fdbs->seq); nd->key = FastaDB_Seq_get_key(fdbs); nd->seq = NULL; return nd; } static void NRDB_Data_destroy(NRDB_Data *nd){ if(nd->key) FastaDB_Key_destroy(nd->key); if(nd->seq) FastaDB_Seq_destroy(nd->seq); g_free(nd); return; } static int NRDB_Data_sort_checksum_function(const void *a, const void *b){ register NRDB_Data **nd_a = (NRDB_Data**)a, **nd_b = (NRDB_Data**)b; return (*nd_a)->checksum-(*nd_b)->checksum; } /**/ typedef int (*NRDB_CompFunc)(const char *s1, const char *s2); typedef struct { GPtrArray *nrdb_data_list; gboolean ignore_case; gboolean check_revcomp; NRDB_CompFunc comp; } NRDB_Info; static gboolean fasta_nrdb_traverse_func(FastaDB_Seq *fdbs, gpointer user_data){ register NRDB_Info *ni = user_data; register NRDB_Data *nd = NRDB_Data_create(fdbs); register FastaDB_Seq *revcomp_fdbs; register NRDB_Data *revcomp_nd; register gchar *seq, *rseq; g_ptr_array_add(ni->nrdb_data_list, nd); if(ni->check_revcomp){ revcomp_fdbs = FastaDB_Seq_revcomp(fdbs); /* Check against palindromes */ seq = Sequence_get_str(fdbs->seq); rseq = Sequence_get_str(revcomp_fdbs->seq); if(ni->comp(seq, rseq)){ revcomp_nd = NRDB_Data_create(revcomp_fdbs); g_ptr_array_add(ni->nrdb_data_list, revcomp_nd); } FastaDB_Seq_destroy(revcomp_fdbs); g_free(seq); g_free(rseq); } return FALSE; } /**/ static void NRDB_Data_report_redundant_set(GPtrArray *redundant_set){ register gint i; register GPtrArray *forward_list = g_ptr_array_new(), *reverse_list = g_ptr_array_new(); register FastaDB_Seq *fdbs, *first_forward = NULL; register GString *merge_def; register gchar *curr_def; register FastaDB_Mask mask = FastaDB_Mask_ID | FastaDB_Mask_DEF | FastaDB_Mask_SEQ; for(i = 0; i < redundant_set->len; i++){ fdbs = redundant_set->pdata[i]; if(fdbs->seq->strand == Sequence_Strand_REVCOMP){ g_ptr_array_add(reverse_list, fdbs); } else { if(first_forward){ g_ptr_array_add(forward_list, fdbs); } else { first_forward = fdbs; } } } if(first_forward && ((forward_list->len+1) >= reverse_list->len)){ curr_def = first_forward->seq->def; merge_def = g_string_sized_new(64); for(i = 0; i < forward_list->len; i++){ fdbs = forward_list->pdata[i]; g_string_append_c(merge_def, ' '); g_string_append(merge_def, fdbs->seq->id); } for(i = 0; i < reverse_list->len; i++){ fdbs = reverse_list->pdata[i]; g_string_append_c(merge_def, ' '); g_string_append(merge_def, fdbs->seq->id); g_string_append(merge_def, ".revcomp"); } first_forward->seq->def = merge_def->str; FastaDB_Seq_print(first_forward, stdout, mask); g_string_free(merge_def, TRUE); first_forward->seq->def = curr_def; } g_ptr_array_free(forward_list, TRUE); g_ptr_array_free(reverse_list, TRUE); return; } static void NRDB_Data_merge_and_print(NRDB_Info *ni, FastaDB *fdb){ register gint i, j; register NRDB_Data *nd_a, *nd_b; register FastaDB_Seq *fdbs_a, *fdbs_b; register GPtrArray *redundant_set = g_ptr_array_new(); register gchar *seq_a, *seq_b; for(i = 0; i < ni->nrdb_data_list->len; i++){ nd_a = ni->nrdb_data_list->pdata[i]; if(!nd_a) /* Already used */ continue; fdbs_a = FastaDB_Key_get_seq(nd_a->key, FastaDB_Mask_ALL); g_ptr_array_add(redundant_set, fdbs_a); for(j = i+1; j < ni->nrdb_data_list->len; j++){ nd_b = ni->nrdb_data_list->pdata[j]; if(!nd_b) /* Already used */ continue; if(nd_a->checksum != nd_b->checksum) break; if(nd_a->length != nd_b->length) continue; if(!nd_b->seq){ /* Not available from previous checksum collision */ nd_b->seq = FastaDB_Key_get_seq(nd_b->key, FastaDB_Mask_ALL); } fdbs_b = nd_b->seq; seq_a = Sequence_get_str(fdbs_a->seq); seq_b = Sequence_get_str(fdbs_b->seq); if(!ni->comp(seq_a, seq_b)){ g_ptr_array_add(redundant_set, FastaDB_Seq_share(fdbs_b)); NRDB_Data_destroy(nd_b); ni->nrdb_data_list->pdata[j] = NULL; } g_free(seq_a); g_free(seq_b); } NRDB_Data_destroy(nd_a); ni->nrdb_data_list->pdata[i] = NULL; NRDB_Data_report_redundant_set(redundant_set); for(j = 0; j < redundant_set->len; j++) FastaDB_Seq_destroy(redundant_set->pdata[j]); g_ptr_array_set_size(redundant_set, 0); } return; } /**/ int Argument_main(Argument *arg){ register FastaDB *fdb; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); register Alphabet *alphabet; NRDB_Info ni; gchar *query_path; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); ArgumentSet_add_option(as, 'i', "ignorecase", NULL, "Ignore sequence case", "FALSE", Argument_parse_boolean, &ni.ignore_case); ArgumentSet_add_option(as, 'r', "revcomp", NULL, "Check for revcomp duplicates", "FALSE", Argument_parse_boolean, &ni.check_revcomp); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastanrdb", "A utility create non-redundant fasta sequence database\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); ni.nrdb_data_list = g_ptr_array_new(); if(ni.ignore_case) ni.comp = strcasecmp; else ni.comp = strcmp; if(ni.check_revcomp) alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE); else alphabet = Alphabet_create(Alphabet_Type_UNKNOWN, FALSE); fdb = FastaDB_open(query_path, alphabet); FastaDB_traverse(fdb, FastaDB_Mask_ALL, fasta_nrdb_traverse_func, &ni); qsort(ni.nrdb_data_list->pdata, ni.nrdb_data_list->len, sizeof(gpointer), NRDB_Data_sort_checksum_function); NRDB_Data_merge_and_print(&ni, fdb); /* nrdb_data_list will be already cleared */ g_ptr_array_free(ni.nrdb_data_list, TRUE); FastaDB_close(fdb); Alphabet_destroy(alphabet); return 0; } exonerate-2.4.0/src/util/fastaoverlap.c0000644000175000017500000000636111162714344015040 00000000000000/****************************************************************\ * * * fastaoverlap: generate overlapping sequences * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "argument.h" #include "fastadb.h" typedef struct { gint chunk_size; gint jump_size; } FastaOverlap_Info; static gboolean fasta_overlap_traverse_func(FastaDB_Seq *fdbs, gpointer user_data){ register FastaOverlap_Info *foi = user_data; register gint region_start = 0, region_end = foi->chunk_size; register Sequence *nfe; if(region_end > fdbs->seq->len) region_end = fdbs->seq->len; nfe = Sequence_subseq(fdbs->seq, region_start, region_end - region_start); Sequence_print_fasta(nfe, stdout, FALSE); Sequence_destroy(nfe); while(region_end < fdbs->seq->len){ region_start += foi->jump_size; if(region_start > fdbs->seq->len) break; region_end += foi->jump_size; if(region_end > fdbs->seq->len) region_end = fdbs->seq->len; nfe = Sequence_subseq(fdbs->seq, region_start, region_end - region_start); Sequence_print_fasta(nfe, stdout, FALSE); Sequence_destroy(nfe); } return FALSE; } int Argument_main(Argument *arg){ register FastaDB *fdb; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *query_path; FastaOverlap_Info foi; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); ArgumentSet_add_option(as, 'c', "chunk", NULL, "Maximum size of each chunk", "100000", Argument_parse_int, &foi.chunk_size); ArgumentSet_add_option(as, 'j', "jump", NULL, "Jump between each chunk", "90000", Argument_parse_int, &foi.jump_size); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastaoverlap", "Generate overlapping fasta sequences\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); fdb = FastaDB_open(query_path, NULL); if(foi.chunk_size < 1) g_error("Chunk size (%d) must be greater than zero", foi.chunk_size); if(foi.jump_size < 1) g_error("Jump size (%d) must be greater than zero", foi.jump_size); FastaDB_traverse(fdb, FastaDB_Mask_ALL, fasta_overlap_traverse_func, &foi); FastaDB_close(fdb); return 0; } exonerate-2.4.0/src/util/fastareformat.c0000644000175000017500000000371111162714344015203 00000000000000/****************************************************************\ * * * fastalength : a utility to reformat fasta format sequences * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "argument.h" #include "fastadb.h" static gboolean fasta_reformat_traverse_func(FastaDB_Seq *fdbs, gpointer user_data){ FastaDB_Seq_print(fdbs, stdout, FastaDB_Mask_ID |FastaDB_Mask_DEF |FastaDB_Mask_SEQ); return FALSE; } int Argument_main(Argument *arg){ register FastaDB *fdb; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *query_path; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastareformat", "A utility to reformat fasta sequence files\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); fdb = FastaDB_open(query_path, NULL); FastaDB_traverse(fdb, FastaDB_Mask_ALL, fasta_reformat_traverse_func, NULL); FastaDB_close(fdb); return 0; } exonerate-2.4.0/src/util/fastaremove.c0000644000175000017500000000675711162714344014676 00000000000000/****************************************************************\ * * * fastaremove : remove sequences from a fasta format file * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strcmp () */ #include /* For isspace() */ #include "argument.h" #include "fastadb.h" static gboolean fasta_remove_traverse_func(FastaDB_Seq *fdbs, gpointer user_data){ register GTree *id_tree = user_data; if(!g_tree_lookup(id_tree, fdbs->seq->id)) FastaDB_Seq_print(fdbs, stdout, FastaDB_Mask_ID |FastaDB_Mask_DEF |FastaDB_Mask_SEQ); return FALSE; } static void id_tree_prepare(FILE *fp, GTree *id_tree, GStringChunk *id_alloc){ register gint ch; register GString *id = g_string_sized_new(64); register gchar *temp; while((ch = getc(fp)) != EOF){ if(ch == '\n'){ temp = g_string_chunk_insert(id_alloc, id->str); g_tree_insert(id_tree, temp, temp); g_string_truncate(id, 0); } else { if(!isspace(ch)) g_string_append_c(id, ch); } } g_string_free(id, TRUE); return; } static gint id_tree_comp_func(gconstpointer a, gconstpointer b){ return strcmp((gchar*)a, (gchar*)b); } int Argument_main(Argument *arg){ register FastaDB *fdb; register GTree *id_tree = g_tree_new(id_tree_comp_func); register GStringChunk *id_alloc = g_string_chunk_new(4096); register FILE *remove_fp; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *query_path, *remove_path; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); ArgumentSet_add_option(as, 'r', "remove", "path | stdin", "List of sequences to remove", NULL, Argument_parse_string, &remove_path); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastaremove", "Remove sequences from a fasta file\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); if(!g_strcasecmp(remove_path, "stdin")){ remove_fp = stdin; } else { remove_fp = fopen(remove_path, "r"); if(!remove_fp) g_error("Could not open list for removal [%s]", remove_path); } id_tree_prepare(remove_fp, id_tree, id_alloc); fclose(remove_fp); fdb = FastaDB_open(query_path, NULL); FastaDB_traverse(fdb, FastaDB_Mask_ALL, fasta_remove_traverse_func, id_tree); FastaDB_close(fdb); g_tree_destroy(id_tree); g_string_chunk_free(id_alloc); return 0; } exonerate-2.4.0/src/util/fastarevcomp.c0000644000175000017500000000444311162714344015042 00000000000000/****************************************************************\ * * * fastarevcomp : a utility to revcomp fasta format sequences * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "argument.h" #include "fastadb.h" static gboolean fasta_revcomp_traverse_func(FastaDB_Seq *fdbs, gpointer user_data){ register FastaDB_Seq *revcomp_fdbs = FastaDB_Seq_revcomp(fdbs); FastaDB_Seq_print(revcomp_fdbs, stdout, FastaDB_Mask_ID |FastaDB_Mask_DEF |FastaDB_Mask_SEQ); FastaDB_Seq_destroy(revcomp_fdbs); return FALSE; } int Argument_main(Argument *arg){ register FastaDB *fdb; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); register Alphabet *alphabet; gchar *query_path; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastarevcomp", "A utility to reverse complement fasta sequence files\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE); fdb = FastaDB_open(query_path, alphabet); FastaDB_traverse(fdb, FastaDB_Mask_ALL, fasta_revcomp_traverse_func, NULL); Alphabet_destroy(alphabet); FastaDB_close(fdb); return 0; } /* FIXME: should avoid FastaDB_traverse * to allow the revcomp to be done in place. */ exonerate-2.4.0/src/util/fastasoftmask.c0000644000175000017500000001073211162714344015214 00000000000000/****************************************************************\ * * * fastasoftmask: merge masked and unmasked files to softmask * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strcmp() */ #include /* For tolower() */ #include "argument.h" #include "fastadb.h" static void fasta_softmask_merge(Sequence *unmasked, Sequence *masked){ register Sequence *softmask_seq; register gchar *sm = g_new(gchar, unmasked->len+1); register gint i; register gchar *ms = Sequence_get_str(unmasked), *us = Sequence_get_str(masked); sm[unmasked->len] = '\0'; for(i = 0; i < unmasked->len; i++){ if((ms[i] == 'N') || (ms[i] == 'n') || (ms[i] == 'X') || (ms[i] == 'x')){ sm[i] = tolower(us[i]); } else { sm[i] = us[i]; } } softmask_seq = Sequence_create(unmasked->id, unmasked->def, sm, unmasked->len, Sequence_Strand_UNKNOWN, masked->alphabet); g_free(ms); g_free(us); g_free(sm); Sequence_print_fasta(softmask_seq, stdout, FALSE); Sequence_destroy(softmask_seq); return; } static void fasta_softmask(gchar *unmasked_path, gchar *masked_path){ register Alphabet *alphabet_unmasked, *alphabet_masked; register FastaDB *fdb_unmasked, *fdb_masked; register FastaDB_Seq *fdbs_unmasked = NULL, *fdbs_masked = NULL; register FastaDB_Mask mask = FastaDB_Mask_ID |FastaDB_Mask_DEF |FastaDB_Mask_SEQ; alphabet_unmasked = Alphabet_create(Alphabet_Type_UNKNOWN, FALSE); alphabet_masked = Alphabet_create(Alphabet_Type_UNKNOWN, TRUE); fdb_unmasked = FastaDB_open(unmasked_path, alphabet_unmasked); fdb_masked = FastaDB_open(masked_path, alphabet_masked); while((fdbs_unmasked = FastaDB_next(fdb_unmasked, mask))){ fdbs_masked = FastaDB_next(fdb_masked, mask); if(!fdbs_masked) g_error("Could not find masked to match [%s]", fdbs_unmasked->seq->id); if(strcmp(fdbs_unmasked->seq->id, fdbs_masked->seq->id)) g_error("ID mismatch unmasked:[%s] masked:[%s]", fdbs_unmasked->seq->id, fdbs_masked->seq->id); if(fdbs_unmasked->seq->len != fdbs_unmasked->seq->len) g_error("Length mismatch: unmasked:%s(%d) masked:%s(%d)\n", fdbs_unmasked->seq->id, fdbs_unmasked->seq->len, fdbs_masked->seq->id, fdbs_masked->seq->len); fasta_softmask_merge(fdbs_unmasked->seq, fdbs_masked->seq); FastaDB_Seq_destroy(fdbs_unmasked); FastaDB_Seq_destroy(fdbs_masked); } fdbs_masked = FastaDB_next(fdb_masked, mask); if(fdbs_masked) g_error("Could not find unmasked to match [%s]", fdbs_masked->seq->id); FastaDB_close(fdb_unmasked); FastaDB_close(fdb_masked); Alphabet_destroy(alphabet_unmasked); Alphabet_destroy(alphabet_masked); return; } int Argument_main(Argument *arg){ register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *unmasked_path, *masked_path; ArgumentSet_add_option(as, 'u', "unmasked", "path", "First fasta input file", NULL, Argument_parse_string, &unmasked_path); ArgumentSet_add_option(as, 'm', "masked", "path", "Second fasta input file", NULL, Argument_parse_string, &masked_path); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastasoftmask", "Merge masked and unmasked files as softmasked sequences\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); fasta_softmask(unmasked_path, masked_path); return 0; } exonerate-2.4.0/src/util/fastasort.c0000644000175000017500000002470211162714344014356 00000000000000/****************************************************************\ * * * fastasort : a utility for sorting fasta format databases * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include #include /* For strcmp() */ #include /* For toupper() */ #include "argument.h" #include "fastadb.h" /**/ typedef enum { Sort_KeyMask_ID = (1<<0), Sort_KeyMask_LEN = (1<<1), Sort_KeyMask_SEQ = (1<<2), Sort_KeyMask_FORWARD = (1<<3), Sort_KeyMask_REVERSE = (1<<4) } Sort_KeyMask; typedef struct { FastaDB_Key *key; union { gchar *id; gint len; gchar *seq_prefix; } key_data; } Sort_Data; static Sort_Data *Sort_Data_create(FastaDB_Seq *fdbs, Sort_KeyMask sort_key_mask){ register Sort_Data *sd = g_new(Sort_Data, 1); if(sort_key_mask & Sort_KeyMask_ID){ sd->key_data.id = g_strdup(fdbs->seq->id); } else if(sort_key_mask & Sort_KeyMask_LEN){ sd->key_data.len = fdbs->seq->len; } else if(sort_key_mask & Sort_KeyMask_SEQ){ sd->key_data.seq_prefix = Sequence_get_substr(fdbs->seq, 0, 16); } sd->key = FastaDB_Seq_get_key(fdbs); return sd; } static void Sort_Data_destroy(Sort_Data *sd, Sort_KeyMask sort_key_mask){ if(sort_key_mask & Sort_KeyMask_ID){ g_free(sd->key_data.id); } else if(sort_key_mask & Sort_KeyMask_SEQ){ g_free(sd->key_data.seq_prefix); } FastaDB_Key_destroy(sd->key); g_free(sd); return; } typedef int (*FastaSort_CompFunc)(const void *a, const void *b); static int FastaSort_compare_id_forward(const void *a, const void *b){ register Sort_Data **sd_a = (Sort_Data**)a, **sd_b = (Sort_Data**)b; register gint ret_val = strcmp((*sd_a)->key_data.id, (*sd_b)->key_data.id); if(!ret_val) g_error("Duplicate key [%s]", (*sd_a)->key_data.id); return ret_val; } static int FastaSort_compare_id_reverse(const void *a, const void *b){ return FastaSort_compare_id_forward(b, a); } static int FastaSort_compare_len_forward(const void *a, const void *b){ register Sort_Data **sd_a = (Sort_Data**)a, **sd_b = (Sort_Data**)b; return (*sd_a)->key_data.len - (*sd_b)->key_data.len; } static int FastaSort_compare_len_reverse(const void *a, const void *b){ return FastaSort_compare_len_forward(b, a); } static int FastaSort_compare_seq_forward(const void *a, const void *b){ register Sort_Data **sd_a = (Sort_Data**)a, **sd_b = (Sort_Data**)b; register gint ret_val = strcmp((*sd_a)->key_data.seq_prefix, (*sd_b)->key_data.seq_prefix); register FastaDB_Seq *fdbs_a, *fdbs_b; register gchar *seq_a, *seq_b; if(!ret_val){ fdbs_a = FastaDB_Key_get_seq((*sd_a)->key, FastaDB_Mask_SEQ); fdbs_b = FastaDB_Key_get_seq((*sd_b)->key, FastaDB_Mask_SEQ); seq_a = Sequence_get_str(fdbs_a->seq); seq_b = Sequence_get_str(fdbs_b->seq); FastaDB_Seq_destroy(fdbs_a); FastaDB_Seq_destroy(fdbs_b); ret_val = strcmp(seq_a, seq_b); g_free(seq_a); g_free(seq_b); } return ret_val; } static int FastaSort_compare_seq_reverse(const void *a, const void *b){ return FastaSort_compare_seq_forward(b, a); } /* FIXME: optimisation * : use a sequence cache for faster sequence sorting */ static Sort_KeyMask SortKey_Mask_create(gchar *sort_key, gboolean reverse_order){ register Sort_KeyMask mask = 0; if(reverse_order) mask |= Sort_KeyMask_REVERSE; else mask |= Sort_KeyMask_FORWARD; if(!g_strcasecmp(sort_key, "id")) mask |= Sort_KeyMask_ID; else if(!g_strcasecmp(sort_key, "len")) mask |= Sort_KeyMask_LEN; else if(!g_strcasecmp(sort_key, "seq")) mask |= Sort_KeyMask_SEQ; else g_error("Unknown sort key [%s]", sort_key); return mask; } static FastaSort_CompFunc fasta_sort_get_compare_func( Sort_KeyMask sort_key_mask){ switch((gint)sort_key_mask){ case Sort_KeyMask_ID |Sort_KeyMask_FORWARD: return FastaSort_compare_id_forward; break; case Sort_KeyMask_ID |Sort_KeyMask_REVERSE: return FastaSort_compare_id_reverse; break; case Sort_KeyMask_LEN |Sort_KeyMask_FORWARD: return FastaSort_compare_len_forward; break; case Sort_KeyMask_LEN |Sort_KeyMask_REVERSE: return FastaSort_compare_len_reverse; break; case Sort_KeyMask_SEQ |Sort_KeyMask_FORWARD: return FastaSort_compare_seq_forward; break; case Sort_KeyMask_SEQ |Sort_KeyMask_REVERSE: return FastaSort_compare_seq_reverse; break; } g_error("Unknown sort key mask [%d]", sort_key_mask); return NULL; } typedef struct { Sort_Data *prev_sd; FastaSort_CompFunc comp_func; Sort_KeyMask sort_key_mask; } Sort_Check_Info; static gboolean fasta_sort_check_traverse_func(FastaDB_Seq *fdbs, gpointer user_data){ register Sort_Check_Info *sci = user_data; register gint ret_val; Sort_Data *sd = Sort_Data_create(fdbs, sci->sort_key_mask); if(sci->prev_sd){ ret_val = sci->comp_func(&sci->prev_sd, &sd); if(ret_val > 1){ g_print("File is not sorted: "); if(sci->sort_key_mask & Sort_KeyMask_ID){ g_print("id [%s] followed by [%s]", sci->prev_sd->key_data.id, sd->key_data.id); } else if(sci->sort_key_mask & Sort_KeyMask_LEN){ g_print("len [%d] followed by [%d]", sci->prev_sd->key_data.len, sd->key_data.len); } else if(sci->sort_key_mask & Sort_KeyMask_SEQ){ g_print("seq [%s...] followed by [%s...]", sci->prev_sd->key_data.seq_prefix, sd->key_data.seq_prefix); } g_print("\n"); exit(1); return TRUE; } Sort_Data_destroy(sci->prev_sd, sci->sort_key_mask); } sci->prev_sd = sd; return FALSE; } static void fasta_sort_check_order(FastaDB *fdb, Sort_KeyMask sort_key_mask){ Sort_Check_Info sci; sci.prev_sd = NULL; sci.comp_func = fasta_sort_get_compare_func(sort_key_mask); sci.sort_key_mask = sort_key_mask; FastaDB_traverse(fdb, FastaDB_Mask_ALL, fasta_sort_check_traverse_func, &sci); if(sci.prev_sd) Sort_Data_destroy(sci.prev_sd, sci.sort_key_mask); return; } /**/ typedef struct { GPtrArray *sort_data_list; Sort_KeyMask sort_key_mask; } Sort_Info; static gboolean fasta_sort_traverse_func(FastaDB_Seq *fdbs, gpointer user_data){ register Sort_Info *si = user_data; register Sort_Data *sd = Sort_Data_create(fdbs, si->sort_key_mask); g_ptr_array_add(si->sort_data_list, sd); return FALSE; } static void fasta_sort_sort_data(FastaDB *fdb, Sort_KeyMask sort_key_mask){ register gint i; register FastaDB_Seq *fdbs; register Sort_Data *sd; register FastaSort_CompFunc comp_func = fasta_sort_get_compare_func(sort_key_mask); Sort_Info si; si.sort_data_list = g_ptr_array_new(); si.sort_key_mask = sort_key_mask; FastaDB_traverse(fdb, FastaDB_Mask_ALL, fasta_sort_traverse_func, &si); qsort(si.sort_data_list->pdata, si.sort_data_list->len, sizeof(gpointer), comp_func); for(i = 0; i < si.sort_data_list->len; i++){ sd = si.sort_data_list->pdata[i]; fdbs = FastaDB_Key_get_seq(sd->key, FastaDB_Mask_ALL); FastaDB_Seq_print(fdbs, stdout, FastaDB_Mask_ID |FastaDB_Mask_DEF |FastaDB_Mask_SEQ); FastaDB_Seq_destroy(fdbs); } for(i = 0; i < si.sort_data_list->len; i++){ sd = si.sort_data_list->pdata[i]; Sort_Data_destroy(sd, sort_key_mask); } g_ptr_array_free(si.sort_data_list, TRUE); return; } int Argument_main(Argument *arg){ register FastaDB *fdb; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *query_path, *sort_key; gboolean check_order, reverse_order; register Sort_KeyMask sort_key_mask; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); ArgumentSet_add_option(as, 'c', "check", NULL, "Just check if file is already sorted (does not sort)", "FALSE", Argument_parse_boolean, &check_order); ArgumentSet_add_option(as, 'k', "key", "id | len | seq", "Sort key to apply to sequences", "id", Argument_parse_string, &sort_key); ArgumentSet_add_option(as, 'r', "reverse", NULL, "Reverse sort order", "FALSE", Argument_parse_boolean, &reverse_order); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastasort", "A utility for sorting fasta sequence databases\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); /**/ fdb = FastaDB_open(query_path, NULL); sort_key_mask = SortKey_Mask_create(sort_key, reverse_order); if(check_order) fasta_sort_check_order(fdb, sort_key_mask); else fasta_sort_sort_data(fdb, sort_key_mask); FastaDB_close(fdb); return 0; } exonerate-2.4.0/src/util/fastasplit.c0000644000175000017500000000732611162714344014525 00000000000000/****************************************************************\ * * * fastasplit: split a fasta format file into chunks * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For fstat() */ #include /* For fstat() */ #include /* For fstat() */ #include "argument.h" #include "fastadb.h" static void fasta_chunk_write(FastaDB *fdb, CompoundFile_Pos start, CompoundFile_Pos stop, gchar *path){ register FILE *fp; register CompoundFile_Pos pos; if(start == stop) /* Don't write an empty file */ return; fp = fopen(path, "w"); if(!fp) g_error("Could not open [%s] for writing fasta chunk", path); CompoundFile_seek(fdb->cf, start); for(pos = start; pos < stop; pos++) putc(CompoundFile_getc(fdb->cf), fp); fclose(fp); return; } static CompoundFile_Pos fasta_split_get_file_size(FILE *fp){ struct stat buf; fstat(fileno(fp), &buf); return buf.st_size; } static void fasta_split(FastaDB *fdb, gchar *output_stem, gint num_chunks){ register CompoundFile_Pos total = fasta_split_get_file_size(fdb->cf->fp), chunk_size = total/num_chunks; register CompoundFile_Pos *position = g_new(CompoundFile_Pos, num_chunks+1); register guint i; register gchar *chunk_path; for(i = 1; i < num_chunks; i++) position[i] = FastaDB_find_next_start(fdb, i*chunk_size); position[0] = 0; position[num_chunks] = total; for(i = 0; i < num_chunks; i++){ chunk_path = g_strdup_printf("%s_chunk_%07d", output_stem, i); fasta_chunk_write(fdb, position[i], position[i+1], chunk_path); g_free(chunk_path); } g_free(position); return; } int Argument_main(Argument *arg){ register FastaDB *fdb; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *query_path, *output_dir; gint num_chunks; register gchar *output_stem; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); ArgumentSet_add_option(as, 'o', "output", "dirpath", "Output directory", NULL, Argument_parse_string, &output_dir); ArgumentSet_add_option(as, 'c', "chunk", NULL, "Number of chunks to generate", "2", Argument_parse_int, &num_chunks); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastasplit", "A utility to split fasta format sequences\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); if(num_chunks < 1) g_error("Must request at least 1 chunk"); fdb = FastaDB_open(query_path, NULL); output_stem = g_strdup_printf("%s%c%s", output_dir, G_DIR_SEPARATOR, g_basename(query_path)); fasta_split(fdb, output_stem, num_chunks); FastaDB_close(fdb); return 0; } exonerate-2.4.0/src/util/fastasubseq.c0000644000175000017500000000460611162714344014672 00000000000000/****************************************************************\ * * * fastasubseq: extract subsequences from fasta format sequences * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "argument.h" #include "fastadb.h" int Argument_main(Argument *arg){ register FastaDB_Seq *fdbs; register Sequence *subseq; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *query_path; gint subseq_start, subseq_length; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); ArgumentSet_add_option(as, 's', "start", "start", "Sequence start position", NULL, Argument_parse_int, &subseq_start); ArgumentSet_add_option(as, 'l', "length", "length", "Subsequence length", NULL, Argument_parse_int, &subseq_length); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastasubseq", "A utility to extract fasta format subsequences\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); fdbs = FastaDB_get_single(query_path, NULL); if(subseq_start < 0) g_error("Subsequence must start after sequence start"); if(subseq_length < 0) g_error("Subsequence length must be greater than zero"); if((subseq_start+subseq_length) > fdbs->seq->len) g_error("Subsequence must end before end of [%s](%d)", fdbs->seq->id, fdbs->seq->len); subseq = Sequence_subseq(fdbs->seq, subseq_start, subseq_length); Sequence_print_fasta(subseq, stdout, FALSE); Sequence_destroy(subseq); FastaDB_Seq_destroy(fdbs); return 0; } exonerate-2.4.0/src/util/fastatranslate.c0000644000175000017500000000654011162714344015364 00000000000000/****************************************************************\ * * * fastatranslate: a utility to translate fasta format sequences * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "argument.h" #include "fastadb.h" #include "translate.h" static void fasta_translate_seq(FastaDB_Seq *fdbs, Translate *translate, gint frame, Alphabet *protein_alphabet){ register Sequence *rc_seq, *aa_seq; if(!frame){ fasta_translate_seq(fdbs, translate, -3, protein_alphabet); fasta_translate_seq(fdbs, translate, -2, protein_alphabet); fasta_translate_seq(fdbs, translate, -1, protein_alphabet); fasta_translate_seq(fdbs, translate, 1, protein_alphabet); fasta_translate_seq(fdbs, translate, 2, protein_alphabet); fasta_translate_seq(fdbs, translate, 3, protein_alphabet); return; } if(frame < 1){ rc_seq = Sequence_revcomp(fdbs->seq); aa_seq = Sequence_translate(rc_seq, translate, -frame); Sequence_destroy(rc_seq); } else { aa_seq = Sequence_translate(fdbs->seq, translate, frame); } if(aa_seq->len) Sequence_print_fasta(aa_seq, stdout, FALSE); Sequence_destroy(aa_seq); return; } int Argument_main(Argument *arg){ register FastaDB *fdb; register FastaDB_Seq *fdbs; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); register Translate *translate; register Alphabet *dna_alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE); register Alphabet *protein_alphabet = Alphabet_create(Alphabet_Type_PROTEIN, FALSE); gchar *query_path; gint frame; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); ArgumentSet_add_option(as, 'F', "frame", NULL, "Reading frame to translate", "0", Argument_parse_int, &frame); Argument_absorb_ArgumentSet(arg, as); Translate_ArgumentSet_create(arg); Argument_process(arg, "fastatranslate", "A utility to translate fasta format sequences\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2003.\n", NULL); fdb = FastaDB_open(query_path, dna_alphabet); translate = Translate_create(FALSE); while((fdbs = FastaDB_next(fdb, FastaDB_Mask_ALL))){ fasta_translate_seq(fdbs, translate, frame, protein_alphabet); FastaDB_Seq_destroy(fdbs); } FastaDB_close(fdb); Translate_destroy(translate); Alphabet_destroy(dna_alphabet); Alphabet_destroy(protein_alphabet); return 0; } exonerate-2.4.0/src/util/fastavalidcds.c0000644000175000017500000001117611162714347015164 00000000000000/****************************************************************\ * * * fastavalidcds: a utility to check for valid CDS sequences * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For toupper() */ #include "argument.h" #include "fastadb.h" static gboolean is_start_codon(gchar *seq){ if((toupper(seq[0]) == 'A') && (toupper(seq[1]) == 'T') && (toupper(seq[2]) == 'G')) return TRUE; return FALSE; } static gboolean is_stop_codon(gchar *seq){ if(toupper(seq[0]) != 'T') return FALSE; if(toupper(seq[1]) == 'A'){ if((toupper(seq[2]) != 'G') && (toupper(seq[2]) != 'A')) return FALSE; } else if(toupper(seq[1]) == 'G'){ if(toupper(seq[2]) != 'A') return FALSE; } else { return FALSE; } return TRUE; } static gboolean fasta_valid_cds_traverse_func(FastaDB_Seq *fdbs, gpointer user_data){ register gint i; register gboolean explain = *((gboolean*)user_data); register gchar *seq; if(fdbs->seq->len % 3){ if(explain) g_warning("%s length (%d) not divisible by 3", fdbs->seq->id, fdbs->seq->len); return FALSE; } if(fdbs->seq->len < 6){ if(explain) g_warning("%s length (%d) too short ( <2 codons )", fdbs->seq->id, fdbs->seq->len); return FALSE; } seq = Sequence_get_str(fdbs->seq); if(!is_start_codon(seq)){ if(explain) g_warning("%s missing start codon (has:%c%c%c)", fdbs->seq->id, seq[0], seq[1], seq[2]); g_free(seq); return FALSE; } if(!is_stop_codon(seq+fdbs->seq->len-3)){ if(explain) g_warning("%s missing stop codon (has:%c%c%c)", fdbs->seq->id, seq[fdbs->seq->len-3], seq[fdbs->seq->len-2], seq[fdbs->seq->len-1]); g_free(seq); return FALSE; } for(i = 3; i < (fdbs->seq->len-3); i+=3) if(is_stop_codon(seq+i)){ if(explain) g_warning("%s contains in-frame stop codon" " (pos:%d, codon:%c%c%c)", fdbs->seq->id, i, seq[i], seq[i+1], seq[i+2]); g_free(seq); return FALSE; } for(i = 0; i < fdbs->seq->len; i++){ if((toupper(seq[i]) != 'A') && (toupper(seq[i]) != 'C') && (toupper(seq[i]) != 'G') && (toupper(seq[i]) != 'T')){ if(explain){ g_warning("%s contains non-ACGT base:" " [pos: %d base: %c]", fdbs->seq->id, i, seq[i]); } g_free(seq); return FALSE; } } FastaDB_Seq_print(fdbs, stdout, FastaDB_Mask_ID |FastaDB_Mask_DEF |FastaDB_Mask_SEQ); g_free(seq); return FALSE; } int Argument_main(Argument *arg){ register FastaDB *fdb; register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *query_path; gboolean explain; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta input file", NULL, Argument_parse_string, &query_path); ArgumentSet_add_option(as, 'e', "explain", NULL, "Explain reasons for rejection", "FALSE", Argument_parse_boolean, &explain); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fastavalidcds", "A utility to check for valid CDS sequences\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2004.\n", NULL); fdb = FastaDB_open(query_path, NULL); FastaDB_traverse(fdb, FastaDB_Mask_ALL, fasta_valid_cds_traverse_func, &explain); FastaDB_close(fdb); return 0; } /**/ exonerate-2.4.0/src/util/fasta2esd.c0000644000175000017500000000520011162714343014213 00000000000000/****************************************************************\ * * * fasta2esd - generate an exonerate sequence database file * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "argument.h" #include "dataset.h" #include "alphabet.h" #include "fastadb.h" int Argument_main(Argument *arg){ register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); GPtrArray *input_path_list; gchar *output_path; gboolean softmask_input; Alphabet_Type alphabet_type; register Dataset *dataset; register FILE *fp; ArgumentSet_add_option(as, 'f', "fasta", "path", "Fasta format input sequences", NULL, NULL, &input_path_list); ArgumentSet_add_option(as, 'o', "output", "path", "Output path for .esd file", NULL, Argument_parse_string, &output_path); ArgumentSet_add_option(as, '\0', "alphabet", NULL, "Specify sequence alphabet type", "unknown", Alphabet_Argument_parse_alphabet_type, &alphabet_type); ArgumentSet_add_option(as, 's', "softmask", NULL, "Treat input sequences as softmasked", "TRUE", Argument_parse_boolean, &softmask_input); Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "fasta2esd", "generate an exonerate sequence database file\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2006.\n", NULL); /* Check output absent before creating dataset */ fp = fopen(output_path, "r"); if(fp) g_error("Output file [%s] already exists", output_path); /**/ g_message("Creating Dataset (alphabet:[%s] softmasked:[%s])", Alphabet_Type_get_name(alphabet_type), softmask_input?"yes":"no"); dataset = Dataset_create(input_path_list, alphabet_type, softmask_input); g_message("Writing Dataset"); Dataset_write(dataset, output_path); Dataset_destroy(dataset); g_message("-- completed"); return 0; } exonerate-2.4.0/src/util/fastaannotatecdna.c0000644000175000017500000001230211162714343016016 00000000000000/****************************************************************\ * * * fastaannotatecdna : a utility to annotate cDNA with a protein * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include /* For strstr() */ #include "argument.h" #include "fastadb.h" #include "translate.h" static gint find_translation(gchar *protein_str, gint protein_len, Sequence *cdna, Translate *translate){ register Sequence *trans_seq; register gchar *trans_str, *result; register gint i, pos, count = 0; for(i = 1; i <= 3; i++){ trans_seq = Sequence_translate(cdna, translate, i); trans_str = Sequence_get_str(trans_seq); result = trans_str; while((result = strstr(result, protein_str))){ pos = result-trans_str; g_print("annotation: %s %c %d %d\n", cdna->id, Sequence_get_strand_as_char(cdna), (pos*3)+i, protein_len*3); count++; result++; } g_free(trans_str); Sequence_destroy(trans_seq); } return count; } static gint fastaannotatecdna(gchar *cdna_path, gchar *protein_path){ register FastaDB *cdna_fdb, *protein_fdb; register FastaDB_Seq *cdna = NULL, *protein = NULL; register gint ret_val = 0, total; register FastaDB_Mask mask = FastaDB_Mask_ID|FastaDB_Mask_SEQ; register Sequence *rc_seq; register gchar *protein_str; register Translate *translate = Translate_create(FALSE); register Alphabet *cdna_alphabet = Alphabet_create(Alphabet_Type_DNA, FALSE), *protein_alphabet = Alphabet_create(Alphabet_Type_PROTEIN, FALSE); cdna_fdb = FastaDB_open(cdna_path, cdna_alphabet); protein_fdb = FastaDB_open(protein_path, protein_alphabet); while((cdna = FastaDB_next(cdna_fdb, mask))){ protein = FastaDB_next(protein_fdb, mask); if(!protein){ g_print("ERROR: fastaannotatecdna: %s: protein: %s is absent\n", protein_path, cdna->seq->id); ret_val = 1; break; } if((protein->seq->len*3) > cdna->seq->len) g_print("ERROR: fastaannoatecdna: protein [%s](%d) too long for cdna [%s](%d)\n", protein->seq->id, protein->seq->len, cdna->seq->id, cdna->seq->len); protein_str = Sequence_get_str(protein->seq); /* forward */ total = find_translation(protein_str, protein->seq->len, cdna->seq, translate); /* revcomp */ rc_seq = Sequence_revcomp(cdna->seq); total += find_translation(protein_str, protein->seq->len, rc_seq, translate); if(total != 1){ g_print("ERROR: fastaannoatecdna: Found %d locations for protein [%s] in [%s]\n", total, protein->seq->id, cdna->seq->id); ret_val = 1; break; } Sequence_destroy(rc_seq); /**/ g_free(protein_str); FastaDB_Seq_destroy(cdna); FastaDB_Seq_destroy(protein); cdna = protein = NULL; } if(cdna) FastaDB_Seq_destroy(cdna); if(protein) FastaDB_Seq_destroy(protein); if(ret_val == 0){ protein = FastaDB_next(protein_fdb, mask); if(protein){ g_print("ERROR: fastaannoatecdna: %s: cdna: %s absent\n", cdna_path, protein->seq->id); ret_val = 1; FastaDB_Seq_destroy(protein); } } FastaDB_close(cdna_fdb); FastaDB_close(protein_fdb); Translate_destroy(translate); Alphabet_destroy(cdna_alphabet); Alphabet_destroy(protein_alphabet); return ret_val; } int Argument_main(Argument *arg){ register ArgumentSet *as = ArgumentSet_create("Sequence Input Options"); gchar *cdna_path, *protein_path; ArgumentSet_add_option(as, 'c', "cdna", "path", "cDNA input file", NULL, Argument_parse_string, &cdna_path); ArgumentSet_add_option(as, 'p', "protein", "path", "Protein input file", NULL, Argument_parse_string, &protein_path); Argument_absorb_ArgumentSet(arg, as); Translate_ArgumentSet_create(arg); Argument_process(arg, "fastaannotatecdna", "Annotate a cDNA using the corresponding protein\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2000-2006.\n", NULL); return fastaannotatecdna(cdna_path, protein_path); } exonerate-2.4.0/src/util/esd2esi.c0000644000175000017500000000745711162714343013715 00000000000000/****************************************************************\ * * * esd2esi - generate an exonerate index file from a dataset * * * * Guy St.C. Slater.. mailto:guy@ebi.ac.uk * * Copyright (C) 2000-2009. All Rights Reserved. * * * * This source code is distributed under the terms of the * * GNU General Public License, version 3. See the file COPYING * * or http://www.gnu.org/licenses/gpl.txt for details * * * * If you use this code, please keep this notice intact. * * * \****************************************************************/ #include "argument.h" #include "dataset.h" #include "index.h" int Argument_main(Argument *arg){ register ArgumentSet *as = ArgumentSet_create("Input and Output Options"); gchar *dataset_path, *index_path; register Dataset *dataset; register Index *index; gboolean is_translated = FALSE; register gint word_length; gint dna_word_length, protein_word_length, word_jump, word_ambiguity, saturate_threshold, memory_limit; /**/ ArgumentSet_add_option(as, 'd', "dataset", "path", "Exonerate dataset file", NULL, Argument_parse_string, &dataset_path); ArgumentSet_add_option(as, 'o', "output", "path", "Output path for .esi file", NULL, Argument_parse_string, &index_path); ArgumentSet_add_option(as, '\0', "translate", NULL, "Translate the dataset for comparison with protein", "FALSE", Argument_parse_boolean, &is_translated); /**/ /* FIXME: tidy duplicated arguments also in libraries */ ArgumentSet_add_option(as, 0, "dnawordlen", "bp", "Wordlength for DNA words", "12", Argument_parse_int, &dna_word_length); ArgumentSet_add_option(as, 0, "proteinwordlen", "aa", "Wordlength for protein words", "5", Argument_parse_int, &protein_word_length); ArgumentSet_add_option(as, 0, "wordjump", NULL, "Jump between database words", "1", Argument_parse_int, &word_jump); ArgumentSet_add_option(as, 0, "wordambiguity", NULL, "Number of ambiguous words to index", "1", Argument_parse_int, &word_ambiguity); ArgumentSet_add_option(as, 0, "saturatethreshold", NULL, "Word saturation threshold", "10", Argument_parse_int, &saturate_threshold); /**/ ArgumentSet_add_option(as, 0, "memorylimit", NULL, "Memory limit for database indexing", "1024", Argument_parse_int, &memory_limit); /**/ Argument_absorb_ArgumentSet(arg, as); Argument_process(arg, "esd2esi", "generate an exonerate sequence index file\n" "Guy St.C. Slater. guy@ebi.ac.uk. 2007.\n", NULL); dataset = Dataset_read(dataset_path); word_length = (is_translated || (dataset->alphabet->type == Alphabet_Type_PROTEIN)) ? protein_word_length : dna_word_length; if(word_ambiguity < 1) g_error("Word ambiguity cannot be less than one."); if((word_ambiguity > 1) && (dataset->alphabet->type == Alphabet_Type_PROTEIN)) g_error("Protein ambuigity symbols not implemented"); g_message("Building index"); index = Index_create(dataset, is_translated, word_length, word_jump, word_ambiguity, saturate_threshold, index_path, dataset_path, memory_limit); Index_destroy(index); Dataset_destroy(dataset); g_message("-- completed"); return 0; } /**/ exonerate-2.4.0/test/0000777000175000017500000000000011222703704011471 500000000000000exonerate-2.4.0/test/Makefile.am0000644000175000017500000000011410371635214013441 00000000000000 SUBDIRS = data exonerate ipcress util MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/test/Makefile.in0000644000175000017500000002102011222703653013450 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = .. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ SUBDIRS = data exonerate ipcress util MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best all: all-redirect .SUFFIXES: $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps test/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status # This directory's subdirectories are mostly independent; you can cd # into them and run `make' without going through this Makefile. # To change the values of `make' variables: instead of editing Makefiles, # (1) if the variable is set in `config.status', edit `config.status' # (which will cause the Makefiles to be regenerated when you run `make'); # (2) otherwise, pass the desired values on the `make' command line. @SET_MAKE@ all-recursive install-data-recursive install-exec-recursive \ installdirs-recursive install-recursive uninstall-recursive \ check-recursive installcheck-recursive info-recursive dvi-recursive: @set fnord $(MAKEFLAGS); amf=$$2; \ dot_seen=no; \ target=`echo $@ | sed s/-recursive//`; \ list='$(SUBDIRS)'; for subdir in $$list; do \ echo "Making $$target in $$subdir"; \ if test "$$subdir" = "."; then \ dot_seen=yes; \ local_target="$$target-am"; \ else \ local_target="$$target"; \ fi; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \ || case "$$amf" in *=*) exit 1;; *k*) fail=yes;; *) exit 1;; esac; \ done; \ if test "$$dot_seen" = "no"; then \ $(MAKE) $(AM_MAKEFLAGS) "$$target-am" || exit 1; \ fi; test -z "$$fail" mostlyclean-recursive clean-recursive distclean-recursive \ maintainer-clean-recursive: @set fnord $(MAKEFLAGS); amf=$$2; \ dot_seen=no; \ rev=''; list='$(SUBDIRS)'; for subdir in $$list; do \ rev="$$subdir $$rev"; \ test "$$subdir" != "." || dot_seen=yes; \ done; \ test "$$dot_seen" = "no" && rev=". $$rev"; \ target=`echo $@ | sed s/-recursive//`; \ for subdir in $$rev; do \ echo "Making $$target in $$subdir"; \ if test "$$subdir" = "."; then \ local_target="$$target-am"; \ else \ local_target="$$target"; \ fi; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \ || case "$$amf" in *=*) exit 1;; *k*) fail=yes;; *) exit 1;; esac; \ done && test -z "$$fail" tags-recursive: list='$(SUBDIRS)'; for subdir in $$list; do \ test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) tags); \ done tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: tags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SUBDIRS)'; for subdir in $$list; do \ if test "$$subdir" = .; then :; else \ test -f $$subdir/TAGS && tags="$$tags -i $$here/$$subdir/TAGS"; \ fi; \ done; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = test distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done for subdir in $(SUBDIRS); do \ if test "$$subdir" = .; then :; else \ test -d $(distdir)/$$subdir \ || mkdir $(distdir)/$$subdir \ || exit 1; \ chmod 777 $(distdir)/$$subdir; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) top_distdir=../$(top_distdir) distdir=../$(distdir)/$$subdir distdir) \ || exit 1; \ fi; \ done info-am: info: info-recursive dvi-am: dvi: dvi-recursive check-am: all-am check: check-recursive installcheck-am: installcheck: installcheck-recursive install-exec-am: install-exec: install-exec-recursive install-data-am: install-data: install-data-recursive install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-recursive uninstall-am: uninstall: uninstall-recursive all-am: Makefile all-redirect: all-recursive install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: installdirs-recursive installdirs-am: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-recursive clean-am: clean-tags clean-generic mostlyclean-am clean: clean-recursive distclean-am: distclean-tags distclean-generic clean-am distclean: distclean-recursive maintainer-clean-am: maintainer-clean-tags maintainer-clean-generic \ distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-recursive .PHONY: install-data-recursive uninstall-data-recursive \ install-exec-recursive uninstall-exec-recursive installdirs-recursive \ uninstalldirs-recursive all-recursive check-recursive \ installcheck-recursive info-recursive dvi-recursive \ mostlyclean-recursive distclean-recursive clean-recursive \ maintainer-clean-recursive tags tags-recursive mostlyclean-tags \ distclean-tags clean-tags maintainer-clean-tags distdir info-am info \ dvi-am dvi check check-am installcheck-am installcheck install-exec-am \ install-exec install-data-am install-data install-am install \ uninstall-am uninstall all-redirect all-am all installdirs-am \ installdirs mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/test/data/0000777000175000017500000000000011222703704012402 500000000000000exonerate-2.4.0/test/data/Makefile.am0000644000175000017500000000007510371635214014360 00000000000000 SUBDIRS = cdna protein MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/test/data/Makefile.in0000644000175000017500000002101611222703653014366 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ SUBDIRS = cdna protein MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best all: all-redirect .SUFFIXES: $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps test/data/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status # This directory's subdirectories are mostly independent; you can cd # into them and run `make' without going through this Makefile. # To change the values of `make' variables: instead of editing Makefiles, # (1) if the variable is set in `config.status', edit `config.status' # (which will cause the Makefiles to be regenerated when you run `make'); # (2) otherwise, pass the desired values on the `make' command line. @SET_MAKE@ all-recursive install-data-recursive install-exec-recursive \ installdirs-recursive install-recursive uninstall-recursive \ check-recursive installcheck-recursive info-recursive dvi-recursive: @set fnord $(MAKEFLAGS); amf=$$2; \ dot_seen=no; \ target=`echo $@ | sed s/-recursive//`; \ list='$(SUBDIRS)'; for subdir in $$list; do \ echo "Making $$target in $$subdir"; \ if test "$$subdir" = "."; then \ dot_seen=yes; \ local_target="$$target-am"; \ else \ local_target="$$target"; \ fi; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \ || case "$$amf" in *=*) exit 1;; *k*) fail=yes;; *) exit 1;; esac; \ done; \ if test "$$dot_seen" = "no"; then \ $(MAKE) $(AM_MAKEFLAGS) "$$target-am" || exit 1; \ fi; test -z "$$fail" mostlyclean-recursive clean-recursive distclean-recursive \ maintainer-clean-recursive: @set fnord $(MAKEFLAGS); amf=$$2; \ dot_seen=no; \ rev=''; list='$(SUBDIRS)'; for subdir in $$list; do \ rev="$$subdir $$rev"; \ test "$$subdir" != "." || dot_seen=yes; \ done; \ test "$$dot_seen" = "no" && rev=". $$rev"; \ target=`echo $@ | sed s/-recursive//`; \ for subdir in $$rev; do \ echo "Making $$target in $$subdir"; \ if test "$$subdir" = "."; then \ local_target="$$target-am"; \ else \ local_target="$$target"; \ fi; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) $$local_target) \ || case "$$amf" in *=*) exit 1;; *k*) fail=yes;; *) exit 1;; esac; \ done && test -z "$$fail" tags-recursive: list='$(SUBDIRS)'; for subdir in $$list; do \ test "$$subdir" = . || (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) tags); \ done tags: TAGS ID: $(HEADERS) $(SOURCES) $(LISP) list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ here=`pwd` && cd $(srcdir) \ && mkid -f$$here/ID $$unique $(LISP) TAGS: tags-recursive $(HEADERS) $(SOURCES) $(TAGS_DEPENDENCIES) $(LISP) tags=; \ here=`pwd`; \ list='$(SUBDIRS)'; for subdir in $$list; do \ if test "$$subdir" = .; then :; else \ test -f $$subdir/TAGS && tags="$$tags -i $$here/$$subdir/TAGS"; \ fi; \ done; \ list='$(SOURCES) $(HEADERS)'; \ unique=`for i in $$list; do echo $$i; done | \ awk ' { files[$$0] = 1; } \ END { for (i in files) print i; }'`; \ test -z "$(ETAGS_ARGS)$$unique$(LISP)$$tags" \ || (cd $(srcdir) && etags -o $$here/TAGS $(ETAGS_ARGS) $$tags $$unique $(LISP)) mostlyclean-tags: clean-tags: distclean-tags: -rm -f TAGS ID maintainer-clean-tags: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = test/data distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done for subdir in $(SUBDIRS); do \ if test "$$subdir" = .; then :; else \ test -d $(distdir)/$$subdir \ || mkdir $(distdir)/$$subdir \ || exit 1; \ chmod 777 $(distdir)/$$subdir; \ (cd $$subdir && $(MAKE) $(AM_MAKEFLAGS) top_distdir=../$(top_distdir) distdir=../$(distdir)/$$subdir distdir) \ || exit 1; \ fi; \ done info-am: info: info-recursive dvi-am: dvi: dvi-recursive check-am: all-am check: check-recursive installcheck-am: installcheck: installcheck-recursive install-exec-am: install-exec: install-exec-recursive install-data-am: install-data: install-data-recursive install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-recursive uninstall-am: uninstall: uninstall-recursive all-am: Makefile all-redirect: all-recursive install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: installdirs-recursive installdirs-am: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-tags mostlyclean-generic mostlyclean: mostlyclean-recursive clean-am: clean-tags clean-generic mostlyclean-am clean: clean-recursive distclean-am: distclean-tags distclean-generic clean-am distclean: distclean-recursive maintainer-clean-am: maintainer-clean-tags maintainer-clean-generic \ distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-recursive .PHONY: install-data-recursive uninstall-data-recursive \ install-exec-recursive uninstall-exec-recursive installdirs-recursive \ uninstalldirs-recursive all-recursive check-recursive \ installcheck-recursive info-recursive dvi-recursive \ mostlyclean-recursive distclean-recursive clean-recursive \ maintainer-clean-recursive tags tags-recursive mostlyclean-tags \ distclean-tags clean-tags maintainer-clean-tags distdir info-am info \ dvi-am dvi check check-am installcheck-am installcheck install-exec-am \ install-exec install-data-am install-data install-am install \ uninstall-am uninstall all-redirect all-am all installdirs-am \ installdirs mostlyclean-generic distclean-generic clean-generic \ maintainer-clean-generic clean mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/test/data/cdna/0000777000175000017500000000000011222703704013307 500000000000000exonerate-2.4.0/test/data/cdna/Makefile.am0000644000175000017500000000022510371635214015262 00000000000000 EXTRA_DIST = calm.human.dna.fasta htrt.human.dna.fasta \ p53.human.dna.fasta tube.drome.dna.fasta MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/test/data/cdna/Makefile.in0000644000175000017500000001137011222703653015275 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ EXTRA_DIST = calm.human.dna.fasta htrt.human.dna.fasta p53.human.dna.fasta tube.drome.dna.fasta MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best all: all-redirect .SUFFIXES: $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps test/data/cdna/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status tags: TAGS TAGS: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = test/data/cdna distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall: uninstall-am all-am: Makefile all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-generic mostlyclean-am clean: clean-am distclean-am: distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: tags distdir info-am info dvi-am dvi check check-am \ installcheck-am installcheck install-exec-am install-exec \ install-data-am install-data install-am install uninstall-am uninstall \ all-redirect all-am all installdirs mostlyclean-generic \ distclean-generic clean-generic maintainer-clean-generic clean \ mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/test/data/cdna/calm.human.dna.fasta0000644000175000017500000000433210371635214017035 00000000000000>EMBL:J04046 CALM_HUMAN Complete mRNA (CDS: 104->553) TGAGTGTGGAGGCGCGGACGCGCGGCGGAGCTGGAACTGCTGCAGCTGCTGCCGCCGCCG GAGGAACCTTGATCCCCGTGCTCCGGACACCCCGGGCCTCGCCATGGCTGACCAGCTGAC TGAGGAGCAGATTGCAGAGTTCAAGGAGGCCTTCTCCCTCTTTGACAAGGATGGAGATGG CACTATCACCACCAAGGAGTTGGGGACAGTGATGAGATCCCTGGGACAGAACCCCACTGA AGCAGAGCTGCAGGATATGATCAATGAGGTGGATGCAGATGGGAACGGGACCATTGACTT CCCGGAGTTCCTGACCATGATGGCCAGAAAGATGAAGGACACAGACAGTGAGGAGGAGAT CCGAGAGGCGTTCCGTGTCTTTGACAAGGATGGGAATGGCTACATCAGCGCCGCAGAGCT GCGTCACGTAATGACGAACCTGGGGGAGAAGCTGACCGATGAGGAGGTGGATGAGATGAT CAGGGAGGCTGACATCGATGGAGATGGCCAGGTCAATTATGAAGAGTTTGTACAGATGAT GACTGCAAAGTGAAGGCCCCCCGGGCAGCTGGCGATGCCCGTTCTCTTGATCTCTCTCTT CTCGCGCGCGCACTCTCTCTTCAACACTCCCCTGCGTACCCCGGTTCTAGCAAACACCAA TTGATTGACTGAGAATCTGATAAAGCAACAAAAGATTTGTCCCAAGCTGCATGATTGCTC TTTCTCCTTCTTCCCTGAGTCTCTCTCCATGCCCCTCATCTCTTCCTTTTGCCCTCGCCT CTTCCATCCACGTCTTCCAAGGCCTGATGCATTCATAAGTTGAAGCCCTCCCCAGATCCC CTTGGAGCCTCTGCCCTCCTCCAGCCCGGATGGCTCTCCTTCATTTTGGTTTGTTTCCTC TTGTTTGTCATCTTATTTTGGGTGCTGGGGTGGCTGCCAGCCTGTCCCGGGACCTGCTGG GAGGGACAAGAGGCCCTCCCCAGGCAGAAGAGCATGCCCTTTGCCGTTGCATGCAACCAG CCCTGTGATTCCACGTGCAGATCCCAGCAGCCTGTTGGGGCAGGGGTGCCAAGAGAGGCA TTCCAGAAGGACTGAGGGGGCGTTGAGGAATTGTGGCGTTGACTGGATGTGGCCCAGGAC TGGGTCGAGGGGGCCAACTCACAGAAGGGGACTGACAGTGGGCAACACTCACATCCCACT GGCTGCTGTTCTGAAACCATCTGATTGGCTTTCTGAGGTTTGGCTGGGTGGGGACTGCTC ATTTGGCCACTCTGCAGATTGGACTTGCCCGCGTTCCTGAAGCGCTCTCGAGCTGTTCTG TAAATACCTGGTGCTAACATCCCATGCCGCTCCCTCCTCACGATGCACCCACCGCCCTGA GGGCCCGTCCTAGGAATGGATGTGGGGATGGTCGCTTTGTAATGTGCTGGTTCTCTTTTT TTTTCTTTCCCCTCTATGGCCCTTAAGACTTTCATTTTGTTCAGAACCATGCTGGGCTAG CTAAAGGGTGGGGAGAGGGAAGATGGGCCCCACCAGCTCTCAAGAGAAACGCACCTGCAA TAAAACAGTCTTGTCGGCCAGCTGCCCAGGGACGGCAGCTACAGCAGCCTCTGCGTCCTG GTCCGCCAGCACCTCCCGCTTCTCCGTGGTGACTTGGCGCCGCTTCCTCACATCTGTGCT CCGTGCCCTCTTCCCTGCCTCTTCCCTCGCCCACCTGCCTGCCCCCATACTCCCCAGCGG AGAGCATGATCCGTGCCCTTGCTTCTGACTTTCGCCTCTGGGACAAGTAAGTCAATGTGG GCAGTTCAGTCGTCTGGGTTTTTTCCCCTTTTCTGTTCATTTCATCTGGCTCCCCCCACC ACCTCCCCACCCCACCCCCCACCCCCTGCTTCCCCTCACTGCCCAGGTCGATCAAGTGGC TTTTCCTGGGACCTGCCCAGCTTTGAGAATCTCTTCTCATCCACCCTCTGGCACCCAGCC TCTGAGGGAAGGAGGGATGGGGCATAGTGGGAGACCCAGCCAAGAGCTGAGGGTAAGGGC AGGTAGGCGTGAGGCTGTGGACATTTTCGGAATGTTTTGGTTTTGTTTTTTTTAAACCGG GCAATATTGTGTTCAGTTCAAGCTGTGAAGAAAAATATATATCAATGTTTTCCAATAAAA TACAGTGACTACCTG exonerate-2.4.0/test/data/cdna/htrt.human.dna.fasta0000644000175000017500000001007410371635214017102 00000000000000>AF01595 Human telomerase reverse transcriptase (hTRT) mRNA GCAGCGCTGCGTCCTGCTGCGCACGTGGGAAGCCCTGGCCCCGGCCACCC CCGCGATGCCGCGCGCTCCCCGCTGCCGAGCCGTGCGCTCCCTGCTGCGC AGCCACTACCGCGAGGTGCTGCCGCTGGCCACGTTCGTGCGGCGCCTGGG GCCCCAGGGCTGGCGGCTGGTGCAGCGCGGGGACCCGGCGGCTTTCCGCG CGCTGGTGGCCCAGTGCCTGGTGTGCGTGCCCTGGGACGCACGGCCGCCC CCCGCCGCCCCCTCCTTCCGCCAGGTGTCCTGCCTGAAGGAGCTGGTGGC CCGAGTGCTGCAGAGGCTGTGCGAGCGCGGCGCGAAGAACGTGCTGGCCT TCGGCTTCGCGCTGCTGGACGGGGCCCGCGGGGGCCCCCCCGAGGCCTTC ACCACCAGCGTGCGCAGCTACCTGCCCAACACGGTGACCGACGCACTGCG GGGGAGCGGGGCGTGGGGGCTGCTGCTGCGCCGCGTGGGCGACGACGTGC TGGTTCACCTGCTGGCACGCTGCGCGCTCTTTGTGCTGGTGGCTCCCAGC TGCGCCTACCAGGTGTGCGGGCCGCCGCTGTACCAGCTCGGCGCTGCCAC TCAGGCCCGGCCCCCGCCACACGCTAGTGGACCCCGAAGGCGTCTGGGAT GCGAACGGGCCTGGAACCATAGCGTCAGGGAGGCCGGGGTCCCCCTGGGC CTGCCAGCCCCGGGTGCGAGGAGGCGCGGGGGCAGTGCCAGCCGAAGTCT GCCGTTGCCCAAGAGGCCCAGGCGTGGCGCTGCCCCTGAGCCGGAGCGGA CGCCCGTTGGGCAGGGGTCCTGGGCCCACCCGGGCAGGACGCGTGGACCG AGTGACCGTGGTTTCTGTGTGGTGTCACCTGCCAGACCCGCCGAAGAAGC CACCTCTTTGGAGGGTGCGCTCTCTGGCACGCGCCACTCCCACCCATCCG TGGGCCGCCAGCACCACGCGGGCCCCCCATCCACATCGCGGCCACCACGT CCCTGGGACACGCCTTGTCCCCCGGTGTACGCCGAGACCAAGCACTTCCT CTACTCCTCAGGCGACAAGGAGCAGCTGCGGCCCTCCTTCCTACTCAGCT CTCTGAGGCCCAGCCTGACTGGCGCTCGGAGGCTCGTGGAGACCATCTTT CTGGGTTCCAGGCCCTGGATGCCAGGGACTCCCCGCAGGTTGCCCCGCCT GCCCCAGCGCTACTGGCAAATGCGGCCCCTGTTTCTGGAGCTGCTTGGGA ACCACGCGCAGTGCCCCTACGGGGTGCTCCTCAAGACGCACTGCCCGCTG CGAGCTGCGGTCACCCCAGCAGCCGGTGTCTGTGCCCGGGAGAAGCCCCA GGGCTCTGTGGCGGCCCCCGAGGAGGAGGACACAGACCCCCGTCGCCTGG TGCAGCTGCTCCGCCAGCACAGCAGCCCCTGGCAGGTGTACGGCTTCGTG CGGGCCTGCCTGCGCCGGCTGGTGCCCCCAGGCCTCTGGGGCTCCAGGCA CAACGAACGCCGCTTCCTCAGGAACACCAAGAAGTTCATCTCCCTGGGGA AGCATGCCAAGCTCTCGCTGCAGGAGCTGACGTGGAAGATGAGCGTGCGG GACTGCGCTTGGCTGCGCAGGAGCCCAGGGGTTGGCTGTGTTCCGGCCGC AGAGCACCGTCTGCGTGAGGAGATCCTGGCCAAGTTCCTGCACTGGCTGA TGAGTGTGTACGTCGTCGAGCTGCTCAGGTCTTTCTTTTATGTCACGGAG ACCACGTTTCAAAAGAACAGGCTCTTTTTCTACCGGAAGAGTGTCTGGAG CAAGTTGCAAAGCATTGGAATCAGACAGCACTTGAAGAGGGTGCAGCTGC GGGAGCTGTCGGAAGCAGAGGTCAGGCAGCATCGGGAAGCCAGGCCCGCC CTGCTGACGTCCAGACTCCGCTTCATCCCCAAGCCTGACGGGCTGCGGCC GATTGTGAACATGGACTACGTCGTGGGAGCCAGAACGTTCCGCAGAGAAA AGAGGGCCGAGCGTCTCACCTCGAGGGTGAAGGCACTGTTCAGCGTGCTC AACTACGAGCGGGCGCGGCGCCCCGGCCTCCTGGGCGCCTCTGTGCTGGG CCTGGACGATATCCACAGGGCCTGGCGCACCTTCGTGCTGCGTGTGCGGG CCCAGGACCCGCCGCCTGAGCTGTACTTTGTCAAGGTGGATGTGACGGGC GCGTACGACACCATCCCCCAGGACAGGCTCACGGAGGTCATCGCCAGCAT CATCAAACCCCAGAACACGTACTGCGTGCGTCGGTATGCCGTGGTCCAGA AGGCCGCCCATGGGCACGTCCGCAAGGCCTTCAAGAGCCACGTCTCTACC TTGACAGACCTCCAGCCGTACATGCGACAGTTCGTGGCTCACCTGCAGGA GACCAGCCCGCTGAGGGATGCCGTCGTCATCGAGCAGAGCTCCTCCCTGA ATGAGGCCAGCAGTGGCCTCTTCGACGTCTTCCTACGCTTCATGTGCCAC CACGCCGTGCGCATCAGGGGCAAGTCCTACGTCCAGTGCCAGGGGATCCC GCAGGGCTCCATCCTCTCCACGCTGCTCTGCAGCCTGTGCTACGGCGACA TGGAGAACAAGCTGTTTGCGGGGATTCGGCGGGACGGGCTGCTCCTGCGT TTGGTGGATGATTTCTTGTTGGTGACACCTCACCTCACCCACGCGAAAAC CTTCCTCAGGACCCTGGTCCGAGGTGTCCCTGAGTATGGCTGCGTGGTGA ACTTGCGGAAGACAGTGGTGAACTTCCCTGTAGAAGACGAGGCCCTGGGT GGCACGGCTTTTGTTCAGATGCCGGCCCACGGCCTATTCCCCTGGTGCGG CCTGCTGCTGGATACCCGGACCCTGGAGGTGCAGAGCGACTACTCCAGCT ATGCCCGGACCTCCATCAGAGCCAGTCTCACCTTCAACCGCGGCTTCAAG GCTGGGAGGAACATGCGTCGCAAACTCTTTGGGGTCTTGCGGCTGAAGTG TCACAGCCTGTTTCTGGATTTGCAGGTGAACAGCCTCCAGACGGTGTGCA CCAACATCTACAAGATCCTCCTGCTGCAGGCGTACAGGTTTCACGCATGT GTGCTGCAGCTCCCATTTCATCAGCAAGTTTGGAAGAACCCCACATTTTT CCTGCGCGTCATCTCTGACACGGCCTCCCTCTGCTACTCCATCCTGAAAG CCAAGAACGCAGGGATGTCGCTGGGGGCCAAGGGCGCCGCCGGCCCTCTG CCCTCCGAGGCCGTGCAGTGGCTGTGCCACCAAGCATTCCTGCTCAAGCT GACTCGACACCGTGTCACCTACGTGCCACTCCTGGGGTCACTCAGGACAG CCCAGACGCAGCTGAGTCGGAAGCTCCCGGGGACGACGCTGACTGCCCTG GAGGCCGCAGCCAACCCGGCACTGCCCTCAGACTTCAAGACCATCCTGGA CTGATGGCCACCCGCCCACAGCCAGGCCGAGAGCAGACACCAGCAGCCCT GTCACGCCGGGCTCTACGTCCCAGGGAGGGAGGGGCGGCCCACACCCAGG CCCGCACCGCTGGGAGTCTGAGGCCTGAGTGAGTGTTTGGCCGAGGCCTG CATGTCCGGCTGAAGGCTGAGTGTCCGGCTGAGGCCTGAGCGAGTGTCCA GCCAAGGGCTGAGTGTCCAGCACACCTGCCGTCTTCACTTCCCCACAGGC TGGCGCTCGGCTCCACCCCAGGGCCAGCTTTTCCTCACCAGGAGCCCGGC TTCCACTCCCCACATAGGAATAGTCCATCCCCAGATTCGCCATTGTTCAC CCCTCGCCCTGCCCTCCTTTGCCTTCCACCCCCACCATCCAGGTGGAGAC CCTGAGAAGGACCCTGGGAGCTCTGGGAATTTGGAGTGACCAAAGGTGTG CCCTGTACACAGGCGAGGACCCTGCACCTGGATGGGGGTCCCTGTGGGTC AAATTGGGGGGAGGTGCTGTGGGAGTAAAATACTGAATATATGAGTTTTT CAGTTTTGAAAAAAA exonerate-2.4.0/test/data/cdna/p53.human.dna.fasta0000644000175000017500000000346410371635214016535 00000000000000>EMBL:K03199 P53_HUMAN Complete mRNA (CDS: 215->1396) GTCGACCCTTTCCACCCCTGGAAGATGGAAATAAACCTGCGTGTGGGTGGAGTGTTAGGA CAAAAAAAAAAAAAAAAAAGTCTAGAGCCACCGTCCAGGGAGCAGGTAGCTGCTGGGCTC CGGGGACACTTTGCGTTCGGGCTGGGAGCGTGCTTTCCACGACGGTGACACGCTTCCCTG GATTGGCAGCCAGACTGCCTTCCGGGTCACTGCCATGGAGGAGCCGCAGTCAGATCCTAG CGTCGAGCCCCCTCTGAGTCAGGAAACATTTTCAGACCTATGGAAACTACTTCCTGAAAA CAACGTTCTGTCCCCCTTGCCGTCCCAAGCAATGGATGATTTGATGCTGTCCCCGGACGA TATTGAACAATGGTTCACTGAAGACCCAGGTCCAGATGAAGCTCCCAGAATGCCAGAGGC TGCTCCCCCCGTGGCCCCTGCACCAGCAGCTCCTACACCGGCGGCCCCTGCACCAGCCCC CTCCTGGCCCCTGTCATCTTCTGTCCCTTCCCAGAAAACCTACCAGGGCAGCTACGGTTT CCGTCTGGGCTTCTTGCATTCTGGGACAGCCAAGTCTGTGACTTGCACGTACTCCCCTGC CCTCAACAAGATGTTTTGCCAACTGGCCAAGACCTGCCCTGTGCAGCTGTGGGTTGATTC CACACCCCCGCCCGGCACCCGCGTCCGCGCCATGGCCATCTACAAGCAGTCACAGCACAT GACGGAGGTTGTGAGGCGCTGCCCCCACCATGAGCGCTGCTCAGATAGCGATGGTCTGGC CCCTCCTCAGCATCTTATCCGAGTGGAAGGAAATTTGCGTGTGGAGTATTTGGATGACAG AAACACTTTTCGACATAGTGTGGTGGTGCCCTATGAGCCGCCTGAGGTTGGCTCTGACTG TACCACCATCCACTACAACTACATGTGTAACAGTTCCTGCATGGGCGGCATGAACCGGAG GCCCATCCTCACCATCATCACACTGGAAGACTCCAGTGGTAATCTACTGGGACGGAACAG CTTTGAGGTGCATGTTTGTGCCTGTCCTGGGAGAGACCGGCGCACAGAGGAAGAGAATCT CCGCAAGAAAGGGGAGCCTCACCACGAGCTGCCCCCAGGGAGCACTAAGCGAGCACTGCC CAACAACACCAGCTCCTCTCCCCAGCCAAAGAAGAAACCACTGGATGGAGAATATTTCAC CCTTCAGATCCGTGGGCGTGAGCGCTTCGAGATGTTCCGAGAGCTGAATGAGGCCTTGGA ACTCAAGGATGCCCAGGCTGGGAAGGAGCCAGGGGGGAGCAGGGCTCACTCCAGCCACCT GAAGTCCAAAAAGGGTCAGTCTACCTCCCGCCATAAAAAACTCATGTTCAAGACAGAAGG GCCTGACTCAGACTGACATTCTCCACTTCTTGTTCCCCACTGACAGCCTCCCACCCCCAT CTCTCCCTCCCCTGCCATTTTGGGTTTTGGGTCTTTGAACCCTTGCTTGCAATAGGTGTG CGTCAGAAGCACCCAGGACTTCCATTTGCTTTGTCCCGGGGCTCCACTGAACAAGTTGGC CTGCACTGGTGTTTTGTTGTGGGGAGGAGGATGGGGAGTAGGACATACCAGCTTAGATTT TAAGGTTTTTACTGTGAGGGATGTTTGGGAGATGTAAGAAATGTTCTTGCAGTTAAGGGT TAGTTTACAATCAGCCACATTCTAGGTAGGGACCCACTTCACCGTACTAACCAGGGAAGC TGTCCCTCACTGTTGAATTC exonerate-2.4.0/test/data/cdna/tube.drome.dna.fasta0000644000175000017500000000406310371635214017057 00000000000000>EMBL:M59501 TUBE_DROME Complete mRNA (CDS:194->1582) ACACTGAAATACTGCGAGCAGCGAGCAGTAGCTGTGGGCATTGGCAAAAACATAGAGAGA GAAGATAGATGCAAACATGGTTTTATAGCGGAAATCTCATACAATCGCACTGGACATCTG GGAACTTGAACCATCGACCTGCCACAGGCAAAGGCTTTCCCGTAAGGTTATCCTGGGTAG CTAAGGGAACACCATGGCGTATGGCTGGAACGGATGCGGAATGGGCGTGCAGGTGAACGG AAGCAACGGCGCCATAGGTTTATCCTCGAAGTATTCTCGCAACACGGAGCTGCGACGCGT TGAGGACAATGACATTTACCGGCTTGCCAAAATACTAGATGAGAACTCATGCTGGCGCAA ACTGATGTCGATCATACCCAAGGGCATGGATGTGCAGGCCTGCAGCGGAGCCGGATGCTT GAATTTTCCGGCGGAAATCAAAAAGGGATTTAAGTACACTGCGCAGGACGTGTTCCAGAT TGACGAGGCTGCAAACCGACTACCGCCGGACCAAAGCAAGTCGCAGATGATGATCGACGA GTGGAAGACCTCTGGCAAGCTCAACGAGCGGCCCACGGTTGGGGTATTGCTCCAACTTCT GGTGCAGGCAGAGCTCTTCAGTGCGGCAGACTTTGTGGCACTAGACTTCCTAAATGAGTC CACCCCTGCCCGGCCTGTTGACGGTCCCGGTGCGCTCATAAGCCTTGAGCTGCTGGAAGA GGAAATGGAAGTGGACAACGAGGGGCTGAGCCTCAAATACCAGTCAAGCACTGCCACTCT TGGGGCAGACGCTCAAGGTAGCGTTGGCCTGAACCTGGACAACTTTGAAAAAGACATTGT GAGGCGGGACAAGAGTGTGCCCCAGCCGTCAGGAAACACCCCTCCCATAGCGCCACCTCG ACGACAGCAAAGGAGTACTACTAACTCCAACTTTGCAACATTGACAGGCACCGGAACTAC GTCGACGACCATCCCGAATGTGCCCAATCTGACAATTCTCAATCCCAGCGAACAAATTCA GGAGCCAGTGCTGCAGCCACGACCGATGAACATCCCCGATCTGTCTATACTCATTTCGAA CTCTGGTGATTTGCGGGCAACTGTGTCGGACAACCCAAGCAACAGAACTAGCTCAACGGA TCCGCCTAACATTCCGCGAATCACTCTTTTAATTGATAATTCTGGAGATGTAAACTCTCG ACCAAATCACGCTCCAGCGAAAGCTTCAACGGCTACCACACCAACGGCGAGCTCAAACAA CTTGCCAATGATAAGCGCCTTGAATATAAGTAAAGGCAGCAAGGAGACCTTGCGTCCTGA GAGCAGAAGCAGCTCCAGCAGCTTGAGCAAAGATGATGACGATGATAACGATGGTGAGGA GGATGGAGAGGAAGAGTACCCAGATGCCTTCTTGCCAAATCTGAGTAACTCAGAGCAGCA GAGCTCAAACAATGACTCCAGCCTGACAACGGTTACTGGCACCAGTGGTGATAACAGCTT TGAGCTAACCAACGACTCCAGCTCCACCTCAAACGACGATTACGCTTGCAACATTCCAGA CTTGAGTGAGCTGCAGCAATAGATAAAAGCACCGACCATCGGCGGAGATGACATTCCGTT CGATAACTTATGCAACCCAGATGTATTCATTTGGCAGCTGGGTGTGGACCTTCTTATGTT ATGCATGAGGTCCAACTAGAATCTAAGATTCAGTGCACGATTTGTTCATATCACGTCTGT AATCGTGACAGTACATAACAAGTACGGCAATATCAAATATTTCCGTTTCAAGAAAGCCAT AACCCAATACCAAAATTTAACAAATTTTGTATACTTACGAAATACTCCTTCATTTTCTAG TATAAATTTCCTTAAACGCATAATGGGTTTCTGTGACAGTTTTGTCTTGGGACAAACTGC GATCTTATGACATTTTTCGTTATTAATTGATTGTTGCCTAAAATATTTCACAAGTACGAT ATAATGGCTAAATACAAATAAAATATTTTTG exonerate-2.4.0/test/data/protein/0000777000175000017500000000000011222703704014062 500000000000000exonerate-2.4.0/test/data/protein/Makefile.am0000644000175000017500000000024510371635214016037 00000000000000 EXTRA_DIST = calm.human.protein.fasta htrt.human.protein.fasta \ p53.human.protein.fasta tube.drome.protein.fasta MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/test/data/protein/Makefile.in0000644000175000017500000001141611222703654016052 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ EXTRA_DIST = calm.human.protein.fasta htrt.human.protein.fasta p53.human.protein.fasta tube.drome.protein.fasta MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best all: all-redirect .SUFFIXES: $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps test/data/protein/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status tags: TAGS TAGS: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = test/data/protein distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall: uninstall-am all-am: Makefile all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-generic mostlyclean-am clean: clean-am distclean-am: distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: tags distdir info-am info dvi-am dvi check check-am \ installcheck-am installcheck install-exec-am install-exec \ install-data-am install-data install-am install uninstall-am uninstall \ all-redirect all-am all installdirs mostlyclean-generic \ distclean-generic clean-generic maintainer-clean-generic clean \ mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/test/data/protein/calm.human.protein.fasta0000644000175000017500000000024410371635214020524 00000000000000>CALM_HUMAN MADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDAD GNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYISAAELRHVMTNLGEKLT DEEVDEMIREADIDGDGQVNYEEFVQMMTAK exonerate-2.4.0/test/data/protein/htrt.human.protein.fasta0000644000175000017500000000227410371635214020576 00000000000000>AF01595 Human telomerase reverse transcriptase (hTRT) mRNA MPRAPRCRAVRSLLRSHYREVLPLATFVRRLGPQGWRLVQRGDPAAFRALVAQCLVCVP WDARPPPAAPSFRQVSCLKELVARVLQRLCERGAKNVLAFGFALLDGARGGPPEAFTTS VRSYLPNTVTDALRGSGAWGLLLRRVGDDVLVHLLARCALFVLVAPSCAYQVCGPPLYQ LGAATQARPPPHASGPRRRLGCERAWNHSVREAGVPLGLPAPGARRRGGSASRSLPLPK RPRRGAAPEPERTPVGQGSWAHPGRTRGPSDRGFCVVSPARPAEEATSLEGALSGTRHS HPSVGRQHHAGPPSTSRPPRPWDTPCPPVYAETKHFLYSSGDKEQLRPSFLLSSLRPSL TGARRLVETIFLGSRPWMPGTPRRLPRLPQRYWQMRPLFLELLGNHAQCPYGVLLKTHC PLRAAVTPAAGVCAREKPQGSVAAPEEEDTDPRRLVQLLRQHSSPWQVYGFVRACLRRL VPPGLWGSRHNERRFLRNTKKFISLGKHAKLSLQELTWKMSVRDCAWLRRSPGVGCVPA AEHRLREEILAKFLHWLMSVYVVELLRSFFYVTETTFQKNRLFFYRKSVWSKLQSIGIR QHLKRVQLRELSEAEVRQHREARPALLTSRLRFIPKPDGLRPIVNMDYVVGARTFRREK RAERLTSRVKALFSVLNYERARRPGLLGASVLGLDDIHRAWRTFVLRVRAQDPPPELYF VKVDVTGAYDTIPQDRLTEVIASIIKPQNTYCVRRYAVVQKAAHGHVRKAFKSHVSTLT DLQPYMRQFVAHLQETSPLRDAVVIEQSSSLNEASSGLFDVFLRFMCHHAVRIRGKSYV QCQGIPQGSILSTLLCSLCYGDMENKLFAGIRRDGLLLRLVDDFLLVTPHLTHAKTFLR TLVRGVPEYGCVVNLRKTVVNFPVEDEALGGTAFVQMPAHGLFPWCGLLLDTRTLEVQS DYSSYARTSIRASLTFNRGFKAGRNMRRKLFGVLRLKCHSLFLDLQVNSLQTVCTNIYK ILLLQAYRFHACVLQLPFHQQVWKNPTFFLRVISDTASLCYSILKAKNAGMSLGAKGAA GPLPSEAVQWLCHQAFLLKLTRHRVTYVPLLGSLRTAQTQLSRKLPGTTLTALEAAANP ALPSDFKTILD exonerate-2.4.0/test/data/protein/p53.human.protein.fasta0000644000175000017500000000063310371635214020221 00000000000000>P53_HUMAN MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP DEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAK SVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHE RCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS SCMGGMNRRPILTIITLEDSSGNLLGRNSFEVHVCACPGRDRRTEEENLRKKGEPHHELP PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPG GSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD exonerate-2.4.0/test/data/protein/tube.drome.protein.fasta0000644000175000017500000000074210371635214020550 00000000000000>TUBE_DROME MAYGWNGCGMGVQVNGSNGAIGLSSKYSRNTELRRVEDNDIYRLAKILDENSCWRKLMS IIPKGMDVQACSGAGCLNFPAEIKKGFKYTAQDVFQIDEAANRLPPDQSKSQMMIDEWK TSGKLNERPTVGVLLQLLVQAELFSAADFVALDFLNESTPARPVDGPGALISLELLEEE MEVDNEGLSLKYQSSTATLGADAQGSVGLNLDNFEKDIVRRDKSVPQPSGNTPPIAPPR RQQRSTTNSNFATLTGTGTTSTTIPNVPNLTILNPSEQIQEPVLQPRPMNIPDLSILIS NSGDLRATVSDNPSNRTSSTDPPNIPRITLLIDNSGDVNSRPNHAPAKASTATTPTASS NNLPMISALNISKGSKETLRPESRSSSSSLSKDDDDDNDGEEDGEEEYPDAFLPNLSNS EQQSSNNDSSLTTVTGTSGDNSFELTNDSSSTSNDDYACNIPDLSELQQ exonerate-2.4.0/test/exonerate/0000777000175000017500000000000011222703704013463 500000000000000exonerate-2.4.0/test/exonerate/Makefile.am0000644000175000017500000000016510371635214015441 00000000000000 TESTS = exonerate.simple.test.sh EXTRA_DIST = $(TESTS) # Files to clear away MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/test/exonerate/Makefile.in0000644000175000017500000001265211222703654015456 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ TESTS = exonerate.simple.test.sh EXTRA_DIST = $(TESTS) # Files to clear away MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best all: all-redirect .SUFFIXES: $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps test/exonerate/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status tags: TAGS TAGS: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = test/exonerate distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done check-TESTS: $(TESTS) @failed=0; all=0; \ srcdir=$(srcdir); export srcdir; \ for tst in $(TESTS); do \ if test -f $$tst; then dir=.; \ else dir="$(srcdir)"; fi; \ if $(TESTS_ENVIRONMENT) $$dir/$$tst; then \ all=`expr $$all + 1`; \ echo "PASS: $$tst"; \ elif test $$? -ne 77; then \ all=`expr $$all + 1`; \ failed=`expr $$failed + 1`; \ echo "FAIL: $$tst"; \ fi; \ done; \ if test "$$failed" -eq 0; then \ banner="All $$all tests passed"; \ else \ banner="$$failed of $$all tests failed"; \ fi; \ dashes=`echo "$$banner" | sed s/./=/g`; \ echo "$$dashes"; \ echo "$$banner"; \ echo "$$dashes"; \ test "$$failed" -eq 0 info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am $(MAKE) $(AM_MAKEFLAGS) check-TESTS check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall: uninstall-am all-am: Makefile all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-generic mostlyclean-am clean: clean-am distclean-am: distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: tags distdir check-TESTS info-am info dvi-am dvi check check-am \ installcheck-am installcheck install-exec-am install-exec \ install-data-am install-data install-am install uninstall-am uninstall \ all-redirect all-am all installdirs mostlyclean-generic \ distclean-generic clean-generic maintainer-clean-generic clean \ mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/test/exonerate/exonerate.simple.test.sh0000755000175000017500000000120410371635214020177 00000000000000#!/bin/sh # simple test for exonerate EXONERATE="../../src/program/exonerate" SEQUENCEFILE="../data/cdna/calm.human.dna.fasta" OUTPUTFILE="exonerate.simple.test.out" clean_exit(){ rm -f $OUTPUTFILE exit $? } $EXONERATE --bestn 1 --showvulgar yes \ $SEQUENCEFILE $SEQUENCEFILE > $OUTPUTFILE if [ $? -eq 0 ] then echo "Exonerate simple test OK" else echo "Problem running simple test for exonerate" clean_exit 1 fi SCORE=`grep '^vulgar:' $OUTPUTFILE | cut -d' ' -f10` if [ $SCORE -eq 10875 ] then echo Score as expected: $SCORE else echo Unexpected score: $SCORE clean_exit 1 fi clean_exit 0 exonerate-2.4.0/test/ipcress/0000777000175000017500000000000011222703704013141 500000000000000exonerate-2.4.0/test/ipcress/Makefile.am0000644000175000017500000000016310371635214015115 00000000000000 TESTS = ipcress.simple.test.sh EXTRA_DIST = $(TESTS) # Files to clear away MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/test/ipcress/Makefile.in0000644000175000017500000001264411222703654015135 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ TESTS = ipcress.simple.test.sh EXTRA_DIST = $(TESTS) # Files to clear away MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best all: all-redirect .SUFFIXES: $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps test/ipcress/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status tags: TAGS TAGS: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = test/ipcress distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done check-TESTS: $(TESTS) @failed=0; all=0; \ srcdir=$(srcdir); export srcdir; \ for tst in $(TESTS); do \ if test -f $$tst; then dir=.; \ else dir="$(srcdir)"; fi; \ if $(TESTS_ENVIRONMENT) $$dir/$$tst; then \ all=`expr $$all + 1`; \ echo "PASS: $$tst"; \ elif test $$? -ne 77; then \ all=`expr $$all + 1`; \ failed=`expr $$failed + 1`; \ echo "FAIL: $$tst"; \ fi; \ done; \ if test "$$failed" -eq 0; then \ banner="All $$all tests passed"; \ else \ banner="$$failed of $$all tests failed"; \ fi; \ dashes=`echo "$$banner" | sed s/./=/g`; \ echo "$$dashes"; \ echo "$$banner"; \ echo "$$dashes"; \ test "$$failed" -eq 0 info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am $(MAKE) $(AM_MAKEFLAGS) check-TESTS check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall: uninstall-am all-am: Makefile all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-generic mostlyclean-am clean: clean-am distclean-am: distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: tags distdir check-TESTS info-am info dvi-am dvi check check-am \ installcheck-am installcheck install-exec-am install-exec \ install-data-am install-data install-am install uninstall-am uninstall \ all-redirect all-am all installdirs mostlyclean-generic \ distclean-generic clean-generic maintainer-clean-generic clean \ mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/test/ipcress/ipcress.simple.test.sh0000755000175000017500000000131010371635214017331 00000000000000#!/bin/sh # simple test for ipcress IPCRESS="../../src/program/ipcress" SEQUENCEFILE="../data/cdna/calm.human.dna.fasta" IPCRESSFILE="test.ipcress" OUTPUTFILE="test.ipcress.out" clean_exit(){ rm -f $IPCRESSFILE $OUTPUTFILE exit $? } cat > $IPCRESSFILE << EOF test_primer CGCGGACGCGCG GTATTTTATTGG 2000 2500 EOF $IPCRESS $IPCRESSFILE $SEQUENCEFILE > $OUTPUTFILE if [ $? -eq 0 ] then echo Ipcress run successfully else echo Problem running ipcress clean_exit 1 fi TOTAL_PRODUCTS=`grep -c '^ipcress:' $OUTPUTFILE` if [ $TOTAL_PRODUCTS -eq 1 ] then echo Found 1 product, as expected else echo Error: unexpectely found $TOTAL_PRODUCTS products clean_exit 1 fi clean_exit 0 exonerate-2.4.0/test/util/0000777000175000017500000000000011222703704012446 500000000000000exonerate-2.4.0/test/util/Makefile.am0000644000175000017500000000115210371635214014421 00000000000000 TESTS = fastaclean.test.sh fastaclip.test.sh fastacomposition.test.sh \ fastadiff.test.sh fastaexpode.test.sh \ fastaindex.fastafetch.test.sh fastalength.test.sh \ fastanrdb.test.sh fastaoverlap.test.sh fastarefomat.test.sh \ fastaremove.test.sh fastarevcomp.test.sh \ fastasoftmask.fastahardmask.test.sh fastasort.test.sh \ fastasplit.test.sh fastasubseq.test.sh fastatranslate.test.sh \ fastavalidcds.test.sh EXTRA_DIST = $(TESTS) # Files to clear away MAINTAINERCLEANFILES = Makefile.in exonerate-2.4.0/test/util/Makefile.in0000644000175000017500000001361011222703655014435 00000000000000# Makefile.in generated automatically by automake 1.4-p6 from Makefile.am # Copyright (C) 1994, 1995-8, 1999, 2001 Free Software Foundation, Inc. # This Makefile.in is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, # with or without modifications, as long as this notice is preserved. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY, to the extent permitted by law; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR A # PARTICULAR PURPOSE. SHELL = @SHELL@ srcdir = @srcdir@ top_srcdir = @top_srcdir@ VPATH = @srcdir@ prefix = @prefix@ exec_prefix = @exec_prefix@ bindir = @bindir@ sbindir = @sbindir@ libexecdir = @libexecdir@ datadir = @datadir@ sysconfdir = @sysconfdir@ sharedstatedir = @sharedstatedir@ localstatedir = @localstatedir@ libdir = @libdir@ infodir = @infodir@ mandir = @mandir@ includedir = @includedir@ oldincludedir = /usr/include DESTDIR = pkgdatadir = $(datadir)/@PACKAGE@ pkglibdir = $(libdir)/@PACKAGE@ pkgincludedir = $(includedir)/@PACKAGE@ top_builddir = ../.. ACLOCAL = @ACLOCAL@ AUTOCONF = @AUTOCONF@ AUTOMAKE = @AUTOMAKE@ AUTOHEADER = @AUTOHEADER@ INSTALL = @INSTALL@ INSTALL_PROGRAM = @INSTALL_PROGRAM@ $(AM_INSTALL_PROGRAM_FLAGS) INSTALL_DATA = @INSTALL_DATA@ INSTALL_SCRIPT = @INSTALL_SCRIPT@ transform = @program_transform_name@ NORMAL_INSTALL = : PRE_INSTALL = : POST_INSTALL = : NORMAL_UNINSTALL = : PRE_UNINSTALL = : POST_UNINSTALL = : host_alias = @host_alias@ host_triplet = @host@ CC = @CC@ GLIB_CONFIG = @GLIB_CONFIG@ HAVE_LIB = @HAVE_LIB@ LIB = @LIB@ LTLIB = @LTLIB@ MAKEINFO = @MAKEINFO@ PACKAGE = @PACKAGE@ PKG_CONFIG = @PKG_CONFIG@ VERSION = @VERSION@ codegen_extra_ldadd = @codegen_extra_ldadd@ codegen_extra_sources = @codegen_extra_sources@ custom_guint64_format = @custom_guint64_format@ datarootdir = @datarootdir@ glib_cflags = @glib_cflags@ glib_libs = @glib_libs@ host = @host@ installed_util_list = @installed_util_list@ source_root_dir = @source_root_dir@ use_pthreads = @use_pthreads@ TESTS = fastaclean.test.sh fastaclip.test.sh fastacomposition.test.sh fastadiff.test.sh fastaexpode.test.sh fastaindex.fastafetch.test.sh fastalength.test.sh fastanrdb.test.sh fastaoverlap.test.sh fastarefomat.test.sh fastaremove.test.sh fastarevcomp.test.sh fastasoftmask.fastahardmask.test.sh fastasort.test.sh fastasplit.test.sh fastasubseq.test.sh fastatranslate.test.sh fastavalidcds.test.sh EXTRA_DIST = $(TESTS) # Files to clear away MAINTAINERCLEANFILES = Makefile.in mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs CONFIG_CLEAN_FILES = DIST_COMMON = Makefile.am Makefile.in DISTFILES = $(DIST_COMMON) $(SOURCES) $(HEADERS) $(TEXINFOS) $(EXTRA_DIST) TAR = tar GZIP_ENV = --best all: all-redirect .SUFFIXES: $(srcdir)/Makefile.in: Makefile.am $(top_srcdir)/configure.in $(ACLOCAL_M4) cd $(top_srcdir) && $(AUTOMAKE) --gnu --include-deps test/util/Makefile Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status cd $(top_builddir) \ && CONFIG_FILES=$(subdir)/$@ CONFIG_HEADERS= $(SHELL) ./config.status tags: TAGS TAGS: distdir = $(top_builddir)/$(PACKAGE)-$(VERSION)/$(subdir) subdir = test/util distdir: $(DISTFILES) @for file in $(DISTFILES); do \ d=$(srcdir); \ if test -d $$d/$$file; then \ cp -pr $$d/$$file $(distdir)/$$file; \ else \ test -f $(distdir)/$$file \ || ln $$d/$$file $(distdir)/$$file 2> /dev/null \ || cp -p $$d/$$file $(distdir)/$$file || :; \ fi; \ done check-TESTS: $(TESTS) @failed=0; all=0; \ srcdir=$(srcdir); export srcdir; \ for tst in $(TESTS); do \ if test -f $$tst; then dir=.; \ else dir="$(srcdir)"; fi; \ if $(TESTS_ENVIRONMENT) $$dir/$$tst; then \ all=`expr $$all + 1`; \ echo "PASS: $$tst"; \ elif test $$? -ne 77; then \ all=`expr $$all + 1`; \ failed=`expr $$failed + 1`; \ echo "FAIL: $$tst"; \ fi; \ done; \ if test "$$failed" -eq 0; then \ banner="All $$all tests passed"; \ else \ banner="$$failed of $$all tests failed"; \ fi; \ dashes=`echo "$$banner" | sed s/./=/g`; \ echo "$$dashes"; \ echo "$$banner"; \ echo "$$dashes"; \ test "$$failed" -eq 0 info-am: info: info-am dvi-am: dvi: dvi-am check-am: all-am $(MAKE) $(AM_MAKEFLAGS) check-TESTS check: check-am installcheck-am: installcheck: installcheck-am install-exec-am: install-exec: install-exec-am install-data-am: install-data: install-data-am install-am: all-am @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am install: install-am uninstall-am: uninstall: uninstall-am all-am: Makefile all-redirect: all-am install-strip: $(MAKE) $(AM_MAKEFLAGS) AM_INSTALL_PROGRAM_FLAGS=-s install installdirs: mostlyclean-generic: clean-generic: distclean-generic: -rm -f Makefile $(CONFIG_CLEAN_FILES) -rm -f config.cache config.log stamp-h stamp-h[0-9]* maintainer-clean-generic: -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) mostlyclean-am: mostlyclean-generic mostlyclean: mostlyclean-am clean-am: clean-generic mostlyclean-am clean: clean-am distclean-am: distclean-generic clean-am distclean: distclean-am maintainer-clean-am: maintainer-clean-generic distclean-am @echo "This command is intended for maintainers to use;" @echo "it deletes files that may require special tools to rebuild." maintainer-clean: maintainer-clean-am .PHONY: tags distdir check-TESTS info-am info dvi-am dvi check check-am \ installcheck-am installcheck install-exec-am install-exec \ install-data-am install-data install-am install uninstall-am uninstall \ all-redirect all-am all installdirs mostlyclean-generic \ distclean-generic clean-generic maintainer-clean-generic clean \ mostlyclean distclean maintainer-clean # Tell versions [3.59,3.63) of GNU make to not export all variables. # Otherwise a system limit (for SysV at least) may be exceeded. .NOEXPORT: exonerate-2.4.0/test/util/fastaclean.test.sh0000755000175000017500000000025110371635214016002 00000000000000#!/bin/sh # test for fastaclean utility PROGRAM="../../src/util/fastaclean" INPUTFILE="../data/protein/calm.human.protein.fasta" $PROGRAM -p TRUE $INPUTFILE exit $? exonerate-2.4.0/test/util/fastaclip.test.sh0000755000175000017500000000131010371635214015644 00000000000000#!/bin/sh # test for fastasoft utility FASTACLIP="../../src/util/fastaclip" FASTALENGTH="../../src/util/fastalength" UNCLIPPEDFILE="fastaclip.test.fasta" CLIPPEDFILE="fastaclip.clipped.test.fasta" clean_exit(){ rm -f $UNCLIPPEDFILE $CLIPPEDFILE exit $1 } cat << EOF > $UNCLIPPEDFILE >test ACGTNNNN EOF $FASTACLIP $UNCLIPPEDFILE > $CLIPPEDFILE if [ $? -eq 0 ] then echo Clipped OK else echo Problem clipping $CLIPPEDFILE clean_exit 1 fi cat $CLIPPEDFILE $FASTALENGTH $CLIPPEDFILE | (read LENGTH IDENTIFIER if [ $LENGTH -eq 4 ] then echo Clipped input correctly else echo "Clipping error (length $LENGTH)" clean_exit 1 fi ) clean_exit 0 exonerate-2.4.0/test/util/fastacomposition.test.sh0000755000175000017500000000074310371635214017271 00000000000000#!/bin/sh # test for fastacomposition utilility PROGRAM="../../src/util/fastacomposition" INPUTFILE="../data/cdna/calm.human.dna.fasta" $PROGRAM $INPUTFILE | (read FILENAME A ACOMP C CCOMP G GCOMP T TCOMP if [ $ACOMP -eq 430 ] \ && [ $CCOMP -eq 626 ] \ && [ $GCOMP -eq 592 ] \ && [ $TCOMP -eq 527 ] then echo Calmodulin cDNA composition correct else echo Calmodulin cDNA composition wrong $ACOMP $CCOMP $GCOMP $TCOMP exit 1 fi ) exit $? exonerate-2.4.0/test/util/fastadiff.test.sh0000755000175000017500000000070310371635214015632 00000000000000#!/bin/sh # test for fastadiff utility FASTADIFF="../../src/util/fastadiff" CALM="../data/protein/calm.human.protein.fasta" P53="../data/protein/p53.human.protein.fasta" $FASTADIFF $CALM $P53 if [ $? -eq 0 ] then echo Diff test on different files failed exit 1 else echo Different seqs recognised as different fi $FASTADIFF $CALM $CALM if [ $? -eq 0 ] then echo Identity test OK else echo Identity failed exit 1 fi exit 0 exonerate-2.4.0/test/util/fastaexpode.test.sh0000755000175000017500000000212410371635214016205 00000000000000#!/bin/sh # test for fastaexpode utility FASTAEXPLODE="../../src/util/fastaexplode" INPUTDIR="../data/protein" INPUTFILE="fastaexplode.test.fasta" OUTPUTDIR="fastaexplode.out_dir.test.fasta" clean_exit(){ rm -f $INPUTFILE rm -rf $OUTPUTDIR exit $1 } cat ${INPUTDIR}/*.fasta > $INPUTFILE if [ $? -eq 0 ] then echo Made input file for fastaexplode else echo Problem making input file for fastaexplode clean_exit 1 fi mkdir $OUTPUTDIR if [ $? -eq 0 ] then echo Make output directory for fastaexplode else echo Problem making output directory for fastaexplode clean_exit 1 fi $FASTAEXPLODE $INPUTFILE --directory $OUTPUTDIR if [ $? -eq 0 ] then echo Successfully ran fastaexplode on: $INPUTFILE else echo Problem running fastaexplode on: $INPUTFILE clean_exit 1 fi TOTALINPUT=`grep -c '^>' $INPUTFILE` TOTALOUTPUT=`ls -1 ${OUTPUTDIR}/*.fa | wc -l` if [ $TOTALINPUT -eq $TOTALOUTPUT ] then echo Expected number of seqs in output: $TOTALINPUT else echo Mismathed number of seqs in output $TOTALINPUT,$TOTALOUTPUT clean_exit 1 fi clean_exit 0 exonerate-2.4.0/test/util/fastaindex.fastafetch.test.sh0000755000175000017500000000224410401111051020120 00000000000000#!/bin/sh # test for fastaindex and fastafetch utilities FASTAINDEX="../../src/util/fastaindex" FASTAFETCH="../../src/util/fastafetch" INPUTDIR="../data/protein" INPUTFILE="fastafetch.test.fasta" INDEXFILE="fastafetch.test.idx" rm -f $INDEXFILE cat $INPUTDIR/*.fasta > $INPUTFILE # test fastaindex $FASTAINDEX $INPUTFILE $INDEXFILE if [ $? -ne 0 ] then echo Error with $FASTAINDEX exit $? else echo Made index $INDEXFILE fi clean_exit(){ rm -f $INPUTFILE $INDEXFILE exit $1 } run_fastafetch(){ IDENTIFIER=$1 ANSWER=$2 echo $FASTAFETCH $INPUTFILE $INDEXFILE $IDENTIFIER echo expect $ANSWER $FASTAFETCH $INPUTFILE $INDEXFILE $IDENTIFIER RESULT=$? if [ $RESULT -ne $ANSWER ] then echo Error running $FASTAINDEX $INPUTFILE $INDEXFILE clean_exit 1 fi return $RESULT } # Prevent fatal debugging errors unset EXONERATE_DEBUG # look for IDs which are present run_fastafetch CALM_HUMAN 0 run_fastafetch P53_HUMAN 0 # look for IDs which are missing run_fastafetch A_MISSING_FROM_START 1 run_fastafetch M_MISSING_FROM_MIDDLE 1 run_fastafetch z_MISSING_FROM_END 1 echo fastaindex fastafetch test OK clean_exit 0 exonerate-2.4.0/test/util/fastalength.test.sh0000755000175000017500000000101110377575761016214 00000000000000#!/bin/sh # test for fastalength utility PROGRAM="../../src/util/fastalength" INPUTFILE="../data/protein/calm.human.protein.fasta" $PROGRAM $INPUTFILE | (read LENGTH IDENTIFIER echo $LENGTH $IDENTIFIER if [ $LENGTH -eq 149 ] then echo Calmodulin length OK else echo Bad calmodulin length: $LENGTH exit 1 fi if [ $IDENTIFIER = "CALM_HUMAN" ] then echo Calmodulin identifier OK else echo bad calmodulin identifier: $IDENTIFIER exit 1 fi ) exit $? exonerate-2.4.0/test/util/fastanrdb.test.sh0000755000175000017500000000162510377612027015656 00000000000000#!/bin/sh # test for fastanrdb utility FASTANRDB="../../src/util/fastanrdb" INPUTDIR="../data/cdna" INPUTFILE="fastanrdb.input.test.fasta" OUTPUTFILE="fastanrdb.output.test.fasta" clean_exit(){ rm -f $INPUTFILE $OUTPUTFILE exit $1 } cat ${INPUTDIR}/*.fasta ${INPUTDIR}/*.fasta > $INPUTFILE if [ $? -eq 0 ] then echo Made input file for fastanrdb else echo Problem making input file for fastanrdb clean_exit 1 fi $FASTANRDB $INPUTFILE > $OUTPUTFILE if [ $? -eq 0 ] then echo Successfully ran fastanrdb on: $INPUTFILE else echo Problem running fastanrdb on: $INPUTFILE clean_exit 1 fi TOTALINPUT=`ls -1 ${INPUTDIR}/*.fasta | wc -l` TOTALOUTPUT=`grep -c '^>' $OUTPUTFILE` if [ $TOTALINPUT -eq $TOTALOUTPUT ] then echo Expected number of seqs in output: $TOTALINPUT else echo Mismatched number of seqs in output $TOTALINPUT,$TOTALOUTPUT clean_exit 1 fi clean_exit 0 exonerate-2.4.0/test/util/fastaoverlap.test.sh0000755000175000017500000000110210371635214016364 00000000000000#!/bin/sh # test for fastaoverlap utility PROGRAM="../../src/util/fastaoverlap" INPUTFILE="../data/protein/calm.human.protein.fasta" OUTPUTFILE="fastaoverlap.fasta" clean_exit(){ rm -f $OUTPUTFILE exit $1 } $PROGRAM $INPUTFILE --chunk 10 --jump 10 > $OUTPUTFILE if [ $? -eq 0 ] then echo $PROGRAM working OK else echo Error with $PROGRAM clean_exit 1 fi TOTAL_SEQS=`grep -c '^>' $OUTPUTFILE` if [ $TOTAL_SEQS -eq 15 ] then echo Generated $TOTAL_SEQS else echo Unexpected number of overlap seqs: $TOTAL_SEQS clean_exit 1 fi clean_exit 0 exonerate-2.4.0/test/util/fastarefomat.test.sh0000755000175000017500000000164510371635214016365 00000000000000#!/bin/sh # test for fastareformat utility FASTAREFORMAT="../../src/util/fastareformat" FASTALENGTH="../../src/util/fastalength" UNFORMATTEDFILE="fastareformat.unformatted.test.fasta" FORMATTEDFILE="fastareformat.formatted.test.fasta" clean_exit(){ rm -f $UNFORMATTEDFILE $FORMATTEDFILE exit $1 } cat << EOF > $UNFORMATTEDFILE >test ACGT AACCGGTT AAACCCGGGTTT AAAACCCCGGGGTTTT EOF $FASTAREFORMAT $UNFORMATTEDFILE > $FORMATTEDFILE if [ $? -eq 0 ] then echo Reformatted OK else echo Problem reformatting $UNFORMATTEDFILE clean_exit 1 fi cat $FORMATTEDFILE $FASTALENGTH $UNFORMATTEDFILE | ( read UNFORMATTEDLEN ID $FASTALENGTH $FORMATTEDFILE | (read FORMATTEDLEN ID if [ $UNFORMATTEDLEN -eq $FORMATTEDLEN ] then echo Reformatted length consistent else echo Reformatted length different clean_exit 1 fi ) ) clean_exit 0 exonerate-2.4.0/test/util/fastaremove.test.sh0000755000175000017500000000137410371635214016224 00000000000000#!/bin/sh # test for fastaremove utilility FASTAREMOVE="../../src/util/fastaremove" INPUTDIR="../data/protein/" INPUTFILE="fastaremove.test.in.fasta" OUTPUTFILE="fastaremove.test.out.fasta" cat $INPUTDIR/*.fasta > $INPUTFILE clean_exit(){ rm -f $INPUTFILE $OUTPUTFILE exit $1 } HAVE_CALM_HUMAN=`grep -c '^>CALM_HUMAN' $INPUTFILE` if [ $HAVE_CALM_HUMAN -eq 1 ] then echo Have calmodulin in input else echo Calmodulin missing from input clean_exit 1 fi $FASTAREMOVE $INPUTFILE CALM_HUMAN > $OUTPUTFILE HAVE_CALM_HUMAN=`grep -c '^>CALM_HUMAN' $OUTPUTFILE` if [ $HAVE_CALM_HUMAN -eq 0 ] then echo Calmodulin successfully removed else echo Failed to remove calmodulin from input $HAVE_CALM_HUMAN clean_exit 1 fi clean_exit 0 exonerate-2.4.0/test/util/fastarevcomp.test.sh0000755000175000017500000000214210401077355016374 00000000000000#!/bin/sh # test for fastadiff utility FASTAREVCOMP="../../src/util/fastarevcomp" FASTADIFF="../../src/util/fastadiff" FASTALENGTH="../../src/util/fastalength" CALM="../data/cdna/calm.human.dna.fasta" CALM_RC="fastarevcomp.rc.test.fasta" CALM_RCRC="fastarevcomp.rc.rc.test.fasta" clean_exit(){ rm -f $CALM_RC $CALM_RCRC exit $1 } run_revcomp(){ $FASTAREVCOMP $1 > $2 if [ $? -eq 0 ] then echo Reverse complement created : $1 else echo Problem generating reverse complement : $1 clean_exit 1 fi } run_revcomp $CALM $CALM_RC run_revcomp $CALM_RC $CALM_RCRC check_length(){ $FASTALENGTH $1 | (read LENGTH IDENTIFIER if [ $LENGTH -eq 2175 ] then echo RC length is the same else echo RC length changed clean_exit 1 fi ) return } check_length $CALM_RC check_length $CALM_RCRC $FASTADIFF -c no $CALM $CALM_RCRC if [ $? -eq 0 ] then echo 2nd reverse complement same as original else echo 2nd reverse complement not same as original clean_exit 1 fi clean_exit 0 exonerate-2.4.0/test/util/fastasoftmask.fastahardmask.test.sh0000755000175000017500000000237510401077551021366 00000000000000#!/bin/sh # test for fastasoftmask and fastahardmask utilities FASTASOFTMASK="../../src/util/fastasoftmask" FASTAHARDMASK="../../src/util/fastahardmask" FASTADIFF="../../src/util/fastadiff" UNMASKEDFILE="fastasoftmask.fastahardmask.unmasked.test.fasta" MASKEDFILE="fastasoftmask.fastahardmask.masked.test.fasta" SOFTMASKEDFILE="fastasoftmask.fastahardmask.softmasked.test.fasta" HARDMASKEDFILE="fastasoftmask.fastahardmask.hardmasked.test.fasta" clean_exit(){ rm -f $UNMASKEDFILE $MASKEDFILE $SOFTMASKEDFILE $HARDMASKEDFILE exit $1 } cat << EOF > $UNMASKEDFILE >test ACGTAACCGGTTAAACCCGGGTTTAAAACCCCGGGGTTTT EOF cat << EOF > $MASKEDFILE >test ACGNAACCGGNNAAACCCGGGNNNAAAACCCCGGGGNNNN EOF $FASTASOFTMASK $UNMASKEDFILE $MASKEDFILE > $SOFTMASKEDFILE if [ $? -eq 0 ] then echo Softmasked OK else echo Problem softmasking $UNMASKEDFILE clean_exit 1 fi cat $SOFTMASKEDFILE $FASTAHARDMASK $SOFTMASKEDFILE > $HARDMASKEDFILE if [ $? -eq 0 ] then echo Hardmasked OK else echo Problem hardmasking $SOFTMASKEDFILE clean_exit 1 fi $FASTADIFF -c no $MASKEDFILE $HARDMASKEDFILE if [ $? -eq 0 ] then echo Hardmasked file is same as original masked file else echo Hardmasked file is not the same as masked clean_exit 1 fi clean_exit 0 exonerate-2.4.0/test/util/fastasort.test.sh0000755000175000017500000000131610371635214015712 00000000000000#!/bin/sh # test for fastasoft utility FASTASORT="../../src/util/fastasort" FASTALENGTH="../../src/util/fastalength" INPUTDIR="../data/protein" INPUTFILE="fastasort.test.fasta" OUTPUTFILE="fastasort.sorted.test.fasta" cat $INPUTDIR/*.fasta > $INPUTFILE $FASTASORT $INPUTFILE --key len > $OUTPUTFILE PREV_ID="NONE" PREV_LEN=0 clean_exit(){ rm -f $INPUTFILE $INDEXFILE $OUTPUTFILE exit $1 } $FASTALENGTH $OUTPUTFILE | while read LENGTH IDENTIFIER do if [ $LENGTH -ge $PREV_LEN ] then echo Sorted $IDENTIFIER $LENGTH else echo Sorting failure: $PREV_ID $PREV_LEN : $IDENTIFIER $LENGTH clean_exit 1 fi PREV_ID=$IDENTIFIER PREV_LEN=$LENGTH done clean_exit 0 exonerate-2.4.0/test/util/fastasplit.test.sh0000755000175000017500000000261710371635214016063 00000000000000#!/bin/sh # test for fastasplit utility FASTASPLIT="../../src/util/fastasplit" FASTALENGTH="../../src/util/fastalength" INPUTDIR="../data/protein" INPUTFILE="fastasplit.test.fasta" SPLITDIR="fastasplit.test.dir" OUTPUTFILE_0="$SPLITDIR/fastasplit.test.fasta_chunk_0000000" OUTPUTFILE_1="$SPLITDIR/fastasplit.test.fasta_chunk_0000001" clean_exit(){ rm -f $INPUTFILE rm -rf $SPLITDIR exit $1 } cat $INPUTDIR/* > $INPUTFILE rm -rf $SPLITDIR mkdir $SPLITDIR $FASTASPLIT $INPUTFILE $SPLITDIR --chunk 2 if [ $? -eq 0 ] then echo Split $INPUTFILE OK else echo Problem splitting $INPUTFILE clean_exit 1 fi TOTALFILES=`ls -1 $SPLITDIR | wc -l` if [ $TOTALFILES -eq 2 ] then echo Split into two files as expected else echo Did not expect $TOTALFILES files clean_exit 1 fi total_seq_len(){ TOTAL_LEN=0 for LEN in `$FASTALENGTH $1 | cut -d' ' -f1` do TOTAL_LEN=`expr $TOTAL_LEN + $LEN` done echo $TOTAL_LEN return } # check total lengths INPUT_LEN=`total_seq_len $INPUTFILE` echo Input len $INPUT_LEN OUTPUT_LEN_0=`total_seq_len $OUTPUTFILE_0` OUTPUT_LEN_1=`total_seq_len $OUTPUTFILE_1` OUTPUT_LEN=`expr $OUTPUT_LEN_0 + $OUTPUT_LEN_1` echo Output len $OUTPUT_LEN if [ $INPUT_LEN -eq $OUTPUT_LEN ] then echo Input and output lengths match else echo Output length $OUTPUT_LEN length, but expected $INPUT_LEN clean_exit 1 fi clean_exit 0 exonerate-2.4.0/test/util/fastasubseq.test.sh0000755000175000017500000000051710371635214016227 00000000000000#!/bin/sh # test for fastasubseq utilility PROGRAM="../../src/util/fastasubseq" INPUTFILE="../data/protein/calm.human.protein.fasta" SUBSEQ=`$PROGRAM $INPUTFILE --start 10 --length 10 | tail -1` if [ $SUBSEQ = "AEFKEAFSLF" ] then echo Generated correct subseq else echo Wrong subseq generated $SUBSEQ exit 1 fi exit $? exonerate-2.4.0/test/util/fastatranslate.test.sh0000755000175000017500000000164610371635214016726 00000000000000#!/bin/sh # test for fastatranslate utility FASTATRANSLATE="../../src/util/fastatranslate" FASTASUBSEQ="../../src/util/fastasubseq" FASTADIFF="../../src/util/fastadiff" CDNA="../data/cdna/calm.human.dna.fasta" PROTEIN="../data/protein/calm.human.protein.fasta" CDS="fastatranslate.test.cds.fasta" TRANSLATED="fastatranslate.test.translated.fasta" clean_exit(){ rm -rf $CDS $TRANSLATED exit $1 } $FASTASUBSEQ $CDNA --start 103 --length 447 > $CDS if [ $? -eq 0 ] then echo Extraced CDS for translation else echo Problem extracting CDS for tranlation clean_exit 1 fi $FASTATRANSLATE $CDS --frame 1 > $TRANSLATED if [ $? -eq 0 ] then echo Tranlated sequence else echo Failed to tranlate sequence clean_exit 1 fi $FASTADIFF --checkids no $TRANSLATED $PROTEIN if [ $? -eq 0 ] then echo Tranlsated sequence correct else echo Translated sequence is wrong clean_exit 1 fi clean_exit 0 exonerate-2.4.0/test/util/fastavalidcds.test.sh0000755000175000017500000000151410371635214016514 00000000000000#!/bin/sh # test for fastareformat utility FASTAVALIDCDS="../../src/util/fastavalidcds" INPUTFILE="fastavalidcds.input.test.fasta" OUTPUTFILE="fastavalidcds.output.test.fasta" clean_exit(){ rm -f $INPUTFILE $OUTPUTFILE exit $1 } cat << EOF > $INPUTFILE >odd_length ATGATAA >too_short ATAA >no_start AAACCCGGGTGA >no_end ATGCCCGGGTTT >in_frame_stop ATGAAACCCTAGGGGTTTTAG >non_acgt_base ATGAANTGA >valid_seq ATGAAACCCGGGTTTTAA EOF $FASTAVALIDCDS -e yes $INPUTFILE > $OUTPUTFILE if [ $? -eq 0 ] then echo Validiated CDS sequences OK else echo Problem validating CDS in $INPUTFILE clean_exit 1 fi cat $OUTPUTFILE TOTAL=`grep -c '^>' $OUTPUTFILE` if [ $TOTAL -eq 1 ] then echo Validated correctly a single seq from input else echo "Missed bad CDS $INPUTFILE (found: $TOTAL)" clean_exit 1 fi clean_exit 0